logical changeset generation v5

Started by Andres Freundover 12 years ago90 messages
#1Andres Freund
andres@2ndquadrant.com

Hi!

I am rather pleased to announce the next version of the changeset
extraction patchset. Thanks to help from a large number of people I
think we are slowly getting to the point where it is getting
committable.

Since the last submitted version
(20121115002746.GA7692@awork2.anarazel.de) a large number of fixes and
the result of good amount of review has been added to the tree. All
bugs known to me have been fixed.

Fixes include:
* synchronous replication support
* don't peg the xmin for user tables, do it only for catalog ones.
* arbitrarily large transaction support by spilling large transactions
to disk
* spill snapshots to disk, so we can restart without waiting for a new
snapshot to be built
* Don't read all WAL from the establishment of a logical slot
* tests via SQL interface to changeset extraction

The todo list includes:
* morph the "logical slot" interface into being "replication slots" that
can also be used by streaming replication
* move some more code from snapbuild.c to decode.c to remove a largely
duplicated switch
* do some more header/comment cleanup & clarification
* move pg_receivellog into its own directory in src/bin or contrib/.
* user/developer level documentation

The patch series currently has two interfaces to logical decoding. One -
which is primarily useful for pg_regress style tests and playing around
- is SQL based, the other one uses a walsender replication connection.

A quick demonstration of the SQL interface (server needs to be started
with wal_level = logical and max_logical_slots > 0):
=# CREATE EXTENSION test_logical_decoding;
=# SELECT * FROM init_logical_replication('regression_slot', 'test_decoding');
slotname | xlog_position
-----------------+---------------
regression_slot | 0/17D5908
(1 row)

=# CREATE TABLE foo(id serial primary key, data text);

=# INSERT INTO foo(data) VALUES(1);

=# UPDATE foo SET id = -id, data = ':'||data;

=# DELETE FROM foo;

=# DROP TABLE foo;

=# SELECT * FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '0');
location | xid | data
-----------+-----+--------------------------------------------------------------------------------
0/17D59B8 | 695 | BEGIN
0/17D59B8 | 695 | COMMIT
0/17E8B58 | 696 | BEGIN
0/17E8B58 | 696 | table "foo": INSERT: id[int4]:1 data[text]:1
0/17E8B58 | 696 | COMMIT
0/17E8CA8 | 697 | BEGIN
0/17E8CA8 | 697 | table "foo": UPDATE: old-pkey: id[int4]:1 new-tuple: id[int4]:-1 data[text]::1
0/17E8CA8 | 697 | COMMIT
0/17E8E50 | 698 | BEGIN
0/17E8E50 | 698 | table "foo": DELETE: id[int4]:-1
0/17E8E50 | 698 | COMMIT
0/17E9058 | 699 | BEGIN
0/17E9058 | 699 | COMMIT
(13 rows)

=# SELECT * FROM pg_stat_logical_decoding ;
slot_name | plugin | database | active | xmin | restart_decoding_lsn
-----------------+---------------+----------+--------+------+----------------------
regression_slot | test_decoding | 12042 | f | 695 | 0/17D58D0
(1 row)

=# SELECT * FROM stop_logical_replication('regression_slot');
stop_logical_replication
--------------------------
0

The walsender interface has the same calls
INIT_LOGICAL_REPLICATION 'slot' 'plugin';
START_LOGICAL_REPLICATION 'slot' restart_lsn [(option value)*];
STOP_LOGICAL_REPLICATION 'slot';

The only difference is that START_LOGICAL_REPLICATION can stream changes
and it can support synchronous replication.

The output seen in the 'data' column is produced by a so called 'output
plugin' which users of the facility can write to suit their needs. They
can be written by implementing 5 functions in the shared object that's
passed to init_logical_replication() above:
* pg_decode_init (optional)
* pg_decode_begin_txn
* pg_decode_change
* pg_decode_commit_txn
* pg_decode_cleanup (optional)

The most interesting function pg_decode_change get's passed a structure
containing old/new versions of the row, the 'struct Relation' belonging
to it and metainformation about the transaction.

The output plugin can rely on syscache lookups et al. to decode the
changed tuple in whatever fashion it wants.

I'd like to invite reviewers to first look at:
* the output plugin interface
* the walsender/SRF interface
* patch 12 which contains most of the code

When reading the code, the information flow during decoding might be
interesting:
---------------
+---------------+
| XLogReader |
+---------------+
|
XLOG Records
|
v
+---------------+
| decode.c |
+---------------+
| |
| |
v |
+---------------+ |
| snapbuild.c | HeapTupleData
+---------------+ |
| |
catalog snapshots |
| |
v v
+---------------+
|reorderbuffer.c|
+---------------+
|
HeapTuple & Metadata
|
v
+---------------+
| Output Plugin |
+---------------+
|
Whatever you want
|
v
+---------------+
| Output Handler|
| |
|WalSnd or SRF |
+---------------+
---------------

Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains
the tests for the feature. Uses 0014
0017: Snapshot building docs; not required

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#1)
17 attachment(s)
Re: changeset generation v5-01 - Patches & git tree

The git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4

On 2013-06-15 00:48:17 +0200, Andres Freund wrote:

Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains
the tests for the feature. Uses 0014
0017: Snapshot building docs; not required

Version v5-01 attached

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0010-wal_decoding-Log-xl_running_xact-s-at-a-higher-frequ.patchtext/x-patch; charset=us-asciiDownload
>From a691315e7bc4523fc743a826049daa0680c50933 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 10/17] wal_decoding: Log xl_running_xact's at a higher
 frequency than checkpoints are done

Do so in the background writer which seems to be the best choice as its
regularly running and shouldn't be busy for too long without getting back into
its main loop.

Also mark xl_standby records as being relevant for async commit so the wal
writer writes them out soonish.

This might also be beneficial for HS as it would make it faster to hit a spot
where no (old) transactions are running anymroe.
---
 src/backend/postmaster/bgwriter.c | 47 +++++++++++++++++++++++++++++++++++++++
 src/backend/storage/ipc/standby.c | 22 +++++++++++++++---
 src/include/storage/standby.h     |  2 +-
 3 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 286ae86..2adb36f 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -54,9 +54,11 @@
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
+#include "storage/standby.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
+#include "utils/timestamp.h"
 
 
 /*
@@ -76,6 +78,10 @@ int			BgWriterDelay = 200;
 static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t shutdown_requested = false;
 
+static TimestampTz last_logged_snap_ts;
+static XLogRecPtr last_logged_snap_recptr = InvalidXLogRecPtr;
+static uint32 log_snap_interval_ms = 15000;
+
 /* Signal handlers */
 
 static void bg_quickdie(SIGNAL_ARGS);
@@ -142,6 +148,12 @@ BackgroundWriterMain(void)
 	CurrentResourceOwner = ResourceOwnerCreate(NULL, "Background Writer");
 
 	/*
+	 * We just started, assume there has been either a shutdown or
+	 * end-of-recovery snapshot.
+	 */
+	last_logged_snap_ts = GetCurrentTimestamp();
+
+	/*
 	 * Create a memory context that we will do all our work in.  We do this so
 	 * that we can reset the context during error recovery and thereby avoid
 	 * possible memory leaks.  Formerly this code just ran in
@@ -276,6 +288,41 @@ BackgroundWriterMain(void)
 		}
 
 		/*
+		 * Log a new xl_running_xacts every now and then so replication can get
+		 * into a consistent state faster and clean up resources more
+		 * frequently. The costs of this are relatively low, so doing it 4
+		 * times a minute seems fine.
+		 *
+		 * We assume the interval for writing xl_running_xacts is significantly
+		 * bigger than BgWriterDelay, so we don't complicate the overall
+		 * timeout handling but just assume we're going to get called often
+		 * enough even if hibernation mode is active. It's not that important
+		 * that log_snap_interval_ms is met strictly.
+		 *
+		 * We do this logging in the bgwriter as its the only process thats run
+		 * regularly and returns to its mainloop all the
+		 * time. E.g. Checkpointer, when active, is barely every in its
+		 * mainloop.
+		 */
+		if (XLogStandbyInfoActive() && !RecoveryInProgress())
+		{
+			TimestampTz timeout = 0;
+			timeout = TimestampTzPlusMilliseconds(last_logged_snap_ts,
+												  log_snap_interval_ms);
+
+			/*
+			 * only log if enough time has passed and some xlog record has been
+			 * inserted.
+			 */
+			if (GetCurrentTimestamp() >= timeout &&
+				last_logged_snap_recptr != GetXLogInsertRecPtr())
+			{
+				last_logged_snap_recptr = LogStandbySnapshot();
+				last_logged_snap_ts = GetCurrentTimestamp();
+			}
+		}
+
+		/*
 		 * Sleep until we are signaled or BgWriterDelay has elapsed.
 		 *
 		 * Note: the feedback control loop in BgBufferSync() expects that we
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index c704412..e85733b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -42,7 +42,7 @@ static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlis
 									   ProcSignalReason reason);
 static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
 static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
-static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
+static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
 static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
 
 
@@ -853,10 +853,13 @@ standby_redo(XLogRecPtr lsn, XLogRecord *record)
  * currently running xids, performed by StandbyReleaseOldLocks().
  * Zero xids should no longer be possible, but we may be replaying WAL
  * from a time when they were possible.
+ *
+ * Returns the RecPtr of the last inserted record.
  */
-void
+XLogRecPtr
 LogStandbySnapshot(void)
 {
+	XLogRecPtr recptr;
 	RunningTransactions running;
 	xl_standby_lock *locks;
 	int			nlocks;
@@ -877,8 +880,11 @@ LogStandbySnapshot(void)
 	 */
 	running = GetRunningTransactionData();
 	LogCurrentRunningXacts(running);
+
 	/* GetRunningTransactionData() acquired XidGenLock, we must release it */
 	LWLockRelease(XidGenLock);
+
+	return recptr;
 }
 
 /*
@@ -889,7 +895,7 @@ LogStandbySnapshot(void)
  * is a contiguous chunk of memory and never exists fully until it is
  * assembled in WAL.
  */
-static void
+static XLogRecPtr
 LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 {
 	xl_running_xacts xlrec;
@@ -939,6 +945,16 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 			 CurrRunningXacts->oldestRunningXid,
 			 CurrRunningXacts->latestCompletedXid,
 			 CurrRunningXacts->nextXid);
+
+	/*
+	 * Ensure running xact information is synced to disk not too far in the
+	 * future, logical standby's need this soon after initialization. We don't
+	 * want to stall anything though, so we let the wal writer do it during
+	 * normal operation.
+	 */
+	XLogSetAsyncXactLSN(recptr);
+
+	return recptr;
 }
 
 /*
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 7f3f051..d4a8fe4 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -113,6 +113,6 @@ typedef RunningTransactionsData *RunningTransactions;
 extern void LogAccessExclusiveLock(Oid dbOid, Oid relOid);
 extern void LogAccessExclusiveLockPrepare(void);
 
-extern void LogStandbySnapshot(void);
+extern XLogRecPtr LogStandbySnapshot(void);
 
 #endif   /* STANDBY_H */
-- 
1.8.2.rc2.4.g7799588.dirty

0011-wal_decoding-copydir-make-fsync_fname-public.patchtext/x-patch; charset=us-asciiDownload
>From 302aa05b8f4501cccde2ee909349b04b4469e093 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 11/17] wal_decoding: copydir: make fsync_fname public

This probably should be somewhere else, its a generally useful function, not
really related to copying directories. fd.[ch]?
---
 src/backend/storage/file/copydir.c | 5 +----
 src/include/storage/copydir.h      | 1 +
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 391359c..93ca13f 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -27,9 +27,6 @@
 #include "miscadmin.h"
 
 
-static void fsync_fname(char *fname, bool isdir);
-
-
 /*
  * copydir: copy a directory
  *
@@ -215,7 +212,7 @@ copy_file(char *fromfile, char *tofile)
  * Try to fsync directories but ignore errors that indicate the OS
  * just doesn't allow/require fsyncing directories.
  */
-static void
+void
 fsync_fname(char *fname, bool isdir)
 {
 	int			fd;
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index a087cce..3bccf3b 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -15,5 +15,6 @@
 
 extern void copydir(char *fromdir, char *todir, bool recurse);
 extern void copy_file(char *fromfile, char *tofile);
+extern void fsync_fname(char *fname, bool isdir);
 
 #endif   /* COPYDIR_H */
-- 
1.8.2.rc2.4.g7799588.dirty

0012-wal_decoding-Add-information-about-a-tables-primary-.patchtext/x-patch; charset=us-asciiDownload
>From 5f5072e4abf92e33a2629ab86766dbe48da141f6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 12/17] wal_decoding: Add information about a tables primary
 key to struct RelationData

'rd_primary' now contains the Oid of an index over uniquely identifying
columns. Several types of indexes are interesting and are collected in that
order:
* Primary Key
* oid index
* the first (OID order) unique, immediate, non-partial and
  non-expression index over one or more NOT NULL'ed columns

To gather rd_primary value RelationGetIndexList() needs to have been called.

This is helpful because for logical replication we frequently - on the sending
and receiving side - need to lookup that index and RelationGetIndexList already
gathers all the necessary information.

This could be used to replace tablecmd.c's transformFkeyGetPrimaryKey, but
would change the meaning of that, so it seems to require additional discussion.
---
 src/backend/utils/cache/relcache.c | 52 +++++++++++++++++++++++++++++++++++---
 src/include/utils/rel.h            | 12 +++++++++
 2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index f114038..3f7386e 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3463,7 +3463,9 @@ RelationGetIndexList(Relation relation)
 	ScanKeyData skey;
 	HeapTuple	htup;
 	List	   *result;
-	Oid			oidIndex;
+	Oid			oidIndex = InvalidOid;
+	Oid			pkeyIndex = InvalidOid;
+	Oid			candidateIndex = InvalidOid;
 	MemoryContext oldcxt;
 
 	/* Quick exit if we already computed the list. */
@@ -3520,17 +3522,61 @@ RelationGetIndexList(Relation relation)
 		Assert(!isnull);
 		indclass = (oidvector *) DatumGetPointer(indclassDatum);
 
+		if (!IndexIsValid(index))
+			continue;
+
 		/* Check to see if it is a unique, non-partial btree index on OID */
-		if (IndexIsValid(index) &&
-			index->indnatts == 1 &&
+		if (index->indnatts == 1 &&
 			index->indisunique && index->indimmediate &&
 			index->indkey.values[0] == ObjectIdAttributeNumber &&
 			indclass->values[0] == OID_BTREE_OPS_OID &&
 			heap_attisnull(htup, Anum_pg_index_indpred))
 			oidIndex = index->indexrelid;
+
+		if (index->indisunique &&
+			index->indimmediate &&
+			heap_attisnull(htup, Anum_pg_index_indpred))
+		{
+			/* always prefer primary keys */
+			if (index->indisprimary)
+				pkeyIndex = index->indexrelid;
+			else if (!OidIsValid(pkeyIndex)
+					&& !OidIsValid(oidIndex)
+					&& !OidIsValid(candidateIndex))
+			{
+				int key;
+				bool found = true;
+				for (key = 0; key < index->indnatts; key++)
+				{
+					int16 attno = index->indkey.values[key];
+					Form_pg_attribute attr;
+					/* internal column, like oid */
+					if (attno <= 0)
+						continue;
+
+					attr = relation->rd_att->attrs[attno - 1];
+					if (!attr->attnotnull)
+					{
+						found = false;
+						break;
+					}
+				}
+				if (found)
+					candidateIndex = index->indexrelid;
+			}
+		}
 	}
 
 	systable_endscan(indscan);
+
+	if (OidIsValid(pkeyIndex))
+		relation->rd_primary = pkeyIndex;
+	/* prefer oid indexes over normal candidate ones */
+	else if (OidIsValid(oidIndex))
+		relation->rd_primary = oidIndex;
+	else if (OidIsValid(candidateIndex))
+		relation->rd_primary = candidateIndex;
+
 	heap_close(indrel, AccessShareLock);
 
 	/* Now save a copy of the completed list in the relcache entry. */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 58cc3f7..bd2466e 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -111,6 +111,18 @@ typedef struct RelationData
 	TriggerDesc *trigdesc;		/* Trigger info, or NULL if rel has none */
 
 	/*
+	 * The 'best' primary or candidate key that has been found, only set
+	 * correctly if RelationGetIndexList has been called/rd_indexvalid > 0.
+	 *
+	 * Indexes are chosen in the following order:
+	 * * Primary Key
+	 * * oid index
+	 * * the first (OID order) unique, immediate, non-partial and
+	 *   non-expression index over one or more NOT NULL'ed columns
+	 */
+	Oid rd_primary;
+
+	/*
 	 * rd_options is set whenever rd_rel is loaded into the relcache entry.
 	 * Note that you can NOT look into rd_rel for this data.  NULL means "use
 	 * defaults".
-- 
1.8.2.rc2.4.g7799588.dirty

0013-wal_decoding-Introduce-wal-decoding-via-catalog-time.patchtext/x-patch; charset=us-asciiDownload
>From f13829d20b493a3642082ea9119444495ac75996 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 13/17] wal_decoding: Introduce wal decoding via catalog
 timetravel

This introduces several things:
* 'reorderbuffer' module which reassembles transactions from a stream of interspersed changes
* 'snapbuilder' which builds catalog snapshots so that tuples from wal can be understood
* logging more data into wal to facilitate logical decoding
* wal decoding into an reorderbuffer
* shared library output plugins with 5 callbacks
 * init
 * begin
 * change
 * commit
* walsender infrastructur to stream out changes and to keep the global xmin low enough
 * INIT_LOGICAL_REPLICATION $plugin; waits till a consistent snapshot is built and returns
   * initial LSN
   * replication slot identifier
   * id of a pg_export() style snapshot
 * START_LOGICAL_REPLICATION $id $lsn; streams out changes
 * uses named output plugins for output specification

Todo:
* better integrated testing infrastructure
* more docs about the internals

Lowlevel:
* resource owner handling is suboptimal
* invalidations from uninteresting transactions (e.g. from other databases, old ones)
  need to be processed anyway
* error handling in walsender is suboptimal
* pg_receivellog needs to send a reply immediately when postgres is shutting down

Input, Testing and Review by:
Heikki Linnakangas
Kevin Grittner
Michael Paquier
Abhijit Menon-Sen
Peter Gheogegan
Robert Haas
Simon Riggs
Steve Singer

Code By:
Andres Freund

With code contributions by:
Abhijit Menon-Sen
Craig Ringer
Alvaro Herrera
---
 src/backend/access/common/reloptions.c          |   10 +
 src/backend/access/heap/heapam.c                |  466 ++++-
 src/backend/access/heap/pruneheap.c             |    2 +
 src/backend/access/index/indexam.c              |   14 +-
 src/backend/access/rmgrdesc/heapdesc.c          |    9 +
 src/backend/access/rmgrdesc/xlogdesc.c          |    1 +
 src/backend/access/transam/twophase.c           |    4 +-
 src/backend/access/transam/xact.c               |   22 +-
 src/backend/access/transam/xlog.c               |   12 +-
 src/backend/catalog/catalog.c                   |   14 +-
 src/backend/catalog/index.c                     |   14 +-
 src/backend/catalog/system_views.sql            |   10 +
 src/backend/commands/analyze.c                  |    2 +-
 src/backend/commands/cluster.c                  |    2 +
 src/backend/commands/trigger.c                  |    3 +-
 src/backend/commands/vacuum.c                   |    5 +-
 src/backend/commands/vacuumlazy.c               |    5 +-
 src/backend/postmaster/postmaster.c             |    7 +-
 src/backend/replication/Makefile                |    2 +
 src/backend/replication/logical/Makefile        |   19 +
 src/backend/replication/logical/decode.c        |  556 +++++
 src/backend/replication/logical/logical.c       | 1047 ++++++++++
 src/backend/replication/logical/logicalfuncs.c  |  361 ++++
 src/backend/replication/logical/reorderbuffer.c | 2449 +++++++++++++++++++++++
 src/backend/replication/logical/snapbuild.c     | 1930 ++++++++++++++++++
 src/backend/replication/repl_gram.y             |   75 +-
 src/backend/replication/repl_scanner.l          |   55 +-
 src/backend/replication/walreceiver.c           |    2 +-
 src/backend/replication/walsender.c             |  738 ++++++-
 src/backend/storage/ipc/ipci.c                  |    3 +
 src/backend/storage/ipc/procarray.c             |   58 +-
 src/backend/storage/ipc/standby.c               |   17 +-
 src/backend/utils/cache/inval.c                 |    4 +-
 src/backend/utils/cache/relcache.c              |  113 +-
 src/backend/utils/misc/guc.c                    |   12 +
 src/backend/utils/misc/postgresql.conf.sample   |   11 +-
 src/backend/utils/time/snapmgr.c                |    5 +-
 src/backend/utils/time/tqual.c                  |  251 ++-
 src/bin/initdb/initdb.c                         |    4 +-
 src/bin/pg_controldata/pg_controldata.c         |    2 +
 src/include/access/heapam_xlog.h                |   59 +-
 src/include/access/transam.h                    |    5 +
 src/include/access/xlog.h                       |    8 +-
 src/include/access/xlogreader.h                 |   12 +-
 src/include/catalog/catalog.h                   |    1 +
 src/include/catalog/pg_proc.h                   |    6 +
 src/include/commands/vacuum.h                   |    2 +-
 src/include/nodes/nodes.h                       |    3 +
 src/include/nodes/replnodes.h                   |   35 +
 src/include/replication/decode.h                |   20 +
 src/include/replication/logical.h               |  198 ++
 src/include/replication/logicalfuncs.h          |   19 +
 src/include/replication/output_plugin.h         |   73 +
 src/include/replication/reorderbuffer.h         |  320 +++
 src/include/replication/snapbuild.h             |   75 +
 src/include/replication/walsender_private.h     |    6 +-
 src/include/storage/itemptr.h                   |    3 +
 src/include/storage/lwlock.h                    |    1 +
 src/include/storage/procarray.h                 |    2 +-
 src/include/storage/sinval.h                    |    2 +
 src/include/utils/inval.h                       |    2 +-
 src/include/utils/rel.h                         |   30 +-
 src/include/utils/relcache.h                    |   11 +-
 src/include/utils/snapmgr.h                     |    3 +
 src/include/utils/tqual.h                       |   33 +-
 src/test/regress/expected/logical.out           |    7 +
 src/test/regress/expected/rules.out             |    9 +-
 src/test/regress/sql/logical.sql                |    3 +
 src/tools/pgindent/typedefs.list                |   40 +
 69 files changed, 9101 insertions(+), 203 deletions(-)
 create mode 100644 src/backend/replication/logical/Makefile
 create mode 100644 src/backend/replication/logical/decode.c
 create mode 100644 src/backend/replication/logical/logical.c
 create mode 100644 src/backend/replication/logical/logicalfuncs.c
 create mode 100644 src/backend/replication/logical/reorderbuffer.c
 create mode 100644 src/backend/replication/logical/snapbuild.c
 create mode 100644 src/include/replication/decode.h
 create mode 100644 src/include/replication/logical.h
 create mode 100644 src/include/replication/logicalfuncs.h
 create mode 100644 src/include/replication/output_plugin.h
 create mode 100644 src/include/replication/reorderbuffer.h
 create mode 100644 src/include/replication/snapbuild.h
 create mode 100644 src/test/regress/expected/logical.out
 create mode 100644 src/test/regress/sql/logical.sql

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c439702..a406979 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -62,6 +62,14 @@ static relopt_bool boolRelOpts[] =
 	},
 	{
 		{
+			"treat_as_catalog_table",
+			"Treat table as a catalog table for the purpose of logical replication",
+			RELOPT_KIND_HEAP
+		},
+		false
+	},
+	{
+		{
 			"fastupdate",
 			"Enables \"fast update\" feature for this GIN index",
 			RELOPT_KIND_GIN
@@ -1152,6 +1160,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_scale_factor)},
 		{"security_barrier", RELOPT_TYPE_BOOL,
 		offsetof(StdRdOptions, security_barrier)},
+		{"treat_as_catalog_table", RELOPT_TYPE_BOOL,
+		 offsetof(StdRdOptions, treat_as_catalog_table)},
 	};
 
 	options = parseRelOptions(reloptions, validate, kind, &numoptions);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fdf0ccd..e3213fa 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -85,12 +85,14 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup,
-				HeapTuple newtup, bool all_visible_cleared,
-				bool new_all_visible_cleared);
+				HeapTuple newtup, HeapTuple old_idx_tup,
+				bool all_visible_cleared, bool new_all_visible_cleared);
 static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
-							 Bitmapset *hot_attrs, Bitmapset *key_attrs,
-							 bool *satisfies_hot, bool *satisfies_key,
-							 HeapTuple oldtup, HeapTuple newtup);
+						  Bitmapset *hot_attrs,
+						  Bitmapset *key_attrs, Bitmapset *ckey_attrs,
+						  bool *satisfies_hot, bool *satisfies_key,
+						  bool *satisfies_ckey,
+						  HeapTuple oldtup, HeapTuple newtup);
 static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 						  uint16 old_infomask2, TransactionId add_to_xmax,
 						  LockTupleMode mode, bool is_update,
@@ -108,6 +110,8 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
 static bool ConditionalMultiXactIdWait(MultiXactId multi,
 						   MultiXactStatus status, int *remaining,
 						   uint16 infomask);
+static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
+static HeapTuple ExtractKeyTuple(Relation rel, HeapTuple tup);
 
 
 /*
@@ -339,8 +343,10 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-	heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
+	if (IsSystemRelation(scan->rs_rd) || RelationIsDoingTimetravel(scan->rs_rd))
+		heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
+	else
+		heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalDataXmin);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1726,10 +1732,16 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 		 */
 		if (!skip)
 		{
+			/* setup the redirected t_self for the benefit of timetravel access */
+			ItemPointerSet(&(heapTuple->t_self), BufferGetBlockNumber(buffer), offnum);
+
 			/* If it's visible per the snapshot, we must return it */
 			valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);
 			CheckForSerializableConflictOut(valid, relation, heapTuple,
 											buffer, snapshot);
+			/* reset original, non-redirected, tid */
+			heapTuple->t_self = *tid;
+
 			if (valid)
 			{
 				ItemPointerSetOffsetNumber(tid, offnum);
@@ -2084,11 +2096,24 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		XLogRecData rdata[3];
+		XLogRecData rdata[4];
 		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
+		bool		need_tuple_data;
+
+		/*
+		 * For logical replication, we need the tuple even if we're doing a
+		 * full page write, so make sure to log it separately. (XXX We could
+		 * alternatively store a pointer into the FPW).
+		 *
+		 * Also, if this is a catalog, we need to transmit combocids to
+		 * properly decode, so log that as well.
+		 */
+		need_tuple_data = RelationIsLogicallyLogged(relation);
+		if (RelationIsDoingTimetravel(relation))
+			log_heap_new_cid(relation, heaptup);
 
-		xlrec.all_visible_cleared = all_visible_cleared;
+		xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
 		xlrec.target.node = relation->rd_node;
 		xlrec.target.tid = heaptup->t_self;
 		rdata[0].data = (char *) &xlrec;
@@ -2107,18 +2132,35 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		 */
 		rdata[1].data = (char *) &xlhdr;
 		rdata[1].len = SizeOfHeapHeader;
-		rdata[1].buffer = buffer;
+		rdata[1].buffer = need_tuple_data ? InvalidBuffer : buffer;
 		rdata[1].buffer_std = true;
 		rdata[1].next = &(rdata[2]);
 
 		/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
 		rdata[2].data = (char *) heaptup->t_data + offsetof(HeapTupleHeaderData, t_bits);
 		rdata[2].len = heaptup->t_len - offsetof(HeapTupleHeaderData, t_bits);
-		rdata[2].buffer = buffer;
+		rdata[2].buffer = need_tuple_data ? InvalidBuffer : buffer;
 		rdata[2].buffer_std = true;
 		rdata[2].next = NULL;
 
 		/*
+		 * add record for the buffer without actual content thats removed if
+		 * fpw is done for that buffer
+		 */
+		if (need_tuple_data)
+		{
+			rdata[2].next = &(rdata[3]);
+
+			rdata[3].data = NULL;
+			rdata[3].len = 0;
+			rdata[3].buffer = buffer;
+			rdata[3].buffer_std = true;
+			rdata[3].next = NULL;
+
+			xlrec.flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+		}
+
+		/*
 		 * If this is the single and first tuple on page, we can reinit the
 		 * page instead of restoring the whole thing.  Set flag, and hide
 		 * buffer references from XLogInsert.
@@ -2127,7 +2169,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 			PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
 		{
 			info |= XLOG_HEAP_INIT_PAGE;
-			rdata[1].buffer = rdata[2].buffer = InvalidBuffer;
+			rdata[1].buffer = rdata[2].buffer = rdata[3].buffer = InvalidBuffer;
 		}
 
 		recptr = XLogInsert(RM_HEAP_ID, info, rdata);
@@ -2253,6 +2295,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	Page		page;
 	bool		needwal;
 	Size		saveFreeSpace;
+	bool        need_tuple_data = RelationIsLogicallyLogged(relation);
+	bool        need_cids = RelationIsDoingTimetravel(relation);
 
 	needwal = !(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation);
 	saveFreeSpace = RelationGetTargetPageFreeSpace(relation,
@@ -2339,7 +2383,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 		{
 			XLogRecPtr	recptr;
 			xl_heap_multi_insert *xlrec;
-			XLogRecData rdata[2];
+			XLogRecData rdata[3];
 			uint8		info = XLOG_HEAP2_MULTI_INSERT;
 			char	   *tupledata;
 			int			totaldatalen;
@@ -2369,7 +2413,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->all_visible_cleared = all_visible_cleared;
+			xlrec->flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
 			xlrec->node = relation->rd_node;
 			xlrec->blkno = BufferGetBlockNumber(buffer);
 			xlrec->ntuples = nthispage;
@@ -2401,6 +2445,13 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 					   datalen);
 				tuphdr->datalen = datalen;
 				scratchptr += datalen;
+
+				/*
+				 * We don't use heap_multi_insert for catalog tuples yet, but
+				 * better be prepared...
+				 */
+				if (need_cids)
+					log_heap_new_cid(relation, heaptup);
 			}
 			totaldatalen = scratchptr - tupledata;
 			Assert((scratchptr - scratch) < BLCKSZ);
@@ -2412,17 +2463,33 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 
 			rdata[1].data = tupledata;
 			rdata[1].len = totaldatalen;
-			rdata[1].buffer = buffer;
+			rdata[1].buffer = need_tuple_data ? InvalidBuffer : buffer;
 			rdata[1].buffer_std = true;
 			rdata[1].next = NULL;
 
 			/*
+			 * add record for the buffer without actual content thats removed if
+			 * fpw is done for that buffer
+			 */
+			if (need_tuple_data)
+			{
+				rdata[1].next = &(rdata[2]);
+
+				rdata[2].data = NULL;
+				rdata[2].len = 0;
+				rdata[2].buffer = buffer;
+				rdata[2].buffer_std = true;
+				rdata[2].next = NULL;
+				xlrec->flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+			}
+
+			/*
 			 * If we're going to reinitialize the whole page using the WAL
 			 * record, hide buffer reference from XLogInsert.
 			 */
 			if (init)
 			{
-				rdata[1].buffer = InvalidBuffer;
+				rdata[1].buffer = rdata[2].buffer = InvalidBuffer;
 				info |= XLOG_HEAP_INIT_PAGE;
 			}
 
@@ -2542,6 +2609,9 @@ heap_delete(Relation relation, ItemPointer tid,
 	bool		have_tuple_lock = false;
 	bool		iscombo;
 	bool		all_visible_cleared = false;
+	bool		need_tuple_data = RelationNeedsWAL(relation) &&
+		RelationIsLogicallyLogged(relation);
+	HeapTuple idx_tuple = NULL; /* primary key of the tuple */
 
 	Assert(ItemPointerIsValid(tid));
 
@@ -2715,6 +2785,15 @@ l1:
 	/* replace cid with a combo cid if necessary */
 	HeapTupleHeaderAdjustCmax(tp.t_data, &cid, &iscombo);
 
+	/*
+	 * Compute primary key tuple before entering the critical section so we
+	 * don't PANIC uppon a memory allocation failure.
+	 */
+	if (need_tuple_data)
+	{
+		idx_tuple = ExtractKeyTuple(relation, &tp);
+	}
+
 	START_CRIT_SECTION();
 
 	/*
@@ -2767,9 +2846,13 @@ l1:
 	{
 		xl_heap_delete xlrec;
 		XLogRecPtr	recptr;
-		XLogRecData rdata[2];
+		XLogRecData rdata[4];
 
-		xlrec.all_visible_cleared = all_visible_cleared;
+		/* For logical decode we need combocids to properly decode the catalog */
+		if (RelationIsDoingTimetravel(relation))
+			log_heap_new_cid(relation, &tp);
+
+		xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
 		xlrec.infobits_set = compute_infobits(tp.t_data->t_infomask,
 											  tp.t_data->t_infomask2);
 		xlrec.target.node = relation->rd_node;
@@ -2786,6 +2869,34 @@ l1:
 		rdata[1].buffer_std = true;
 		rdata[1].next = NULL;
 
+		/*
+		 * Log primary key of the deleted tuple
+		 */
+		if (need_tuple_data && idx_tuple != NULL)
+		{
+			xl_heap_header xlhdr;
+
+			xlhdr.t_infomask2 = idx_tuple->t_data->t_infomask2;
+			xlhdr.t_infomask = idx_tuple->t_data->t_infomask;
+			xlhdr.t_hoff = idx_tuple->t_data->t_hoff;
+
+			rdata[1].next = &(rdata[2]);
+			rdata[2].data = (char*)&xlhdr;
+			rdata[2].len = SizeOfHeapHeader;
+			rdata[2].buffer = InvalidBuffer;
+			rdata[2].next = NULL;
+
+			rdata[2].next = &(rdata[3]);
+			rdata[3].data = (char *) idx_tuple->t_data
+				+ offsetof(HeapTupleHeaderData, t_bits);
+			rdata[3].len = idx_tuple->t_len
+				- offsetof(HeapTupleHeaderData, t_bits);
+			rdata[3].buffer = InvalidBuffer;
+			rdata[3].next = NULL;
+
+			xlrec.flags |= XLOG_HEAP_CONTAINS_OLD_KEY;
+		}
+
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE, rdata);
 
 		PageSetLSN(page, recptr);
@@ -2915,9 +3026,11 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	TransactionId xid = GetCurrentTransactionId();
 	Bitmapset  *hot_attrs;
 	Bitmapset  *key_attrs;
+	Bitmapset  *ckey_attrs;
 	ItemId		lp;
 	HeapTupleData oldtup;
 	HeapTuple	heaptup;
+	HeapTuple	old_idx_tuple = NULL;
 	Page		page;
 	BlockNumber block;
 	MultiXactStatus mxact_status;
@@ -2933,6 +3046,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	bool		iscombo;
 	bool		satisfies_hot;
 	bool		satisfies_key;
+	bool		satisfies_ckey;
 	bool		use_hot_update = false;
 	bool		key_intact;
 	bool		all_visible_cleared = false;
@@ -2960,8 +3074,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	 * Note that we get a copy here, so we need not worry about relcache flush
 	 * happening midway through.
 	 */
-	hot_attrs = RelationGetIndexAttrBitmap(relation, false);
-	key_attrs = RelationGetIndexAttrBitmap(relation, true);
+	hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL);
+	key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
+	ckey_attrs = RelationGetIndexAttrBitmap(relation,
+										   INDEX_ATTR_BITMAP_CANDIDATE_KEY);
 
 	block = ItemPointerGetBlockNumber(otid);
 	buffer = ReadBuffer(relation, block);
@@ -3019,9 +3135,9 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	 * is updates that don't manipulate key columns, not those that
 	 * serendipitiously arrive at the same key values.
 	 */
-	HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs,
+	HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs, ckey_attrs,
 								 &satisfies_hot, &satisfies_key,
-								 &oldtup, newtup);
+								 &satisfies_ckey, &oldtup, newtup);
 	if (satisfies_key)
 	{
 		*lockmode = LockTupleNoKeyExclusive;
@@ -3491,6 +3607,12 @@ l2:
 		PageSetFull(page);
 	}
 
+	/* compute tuple for loggical logging */
+	if (!satisfies_ckey && RelationIsLogicallyLogged(relation))
+	{
+		old_idx_tuple = ExtractKeyTuple(relation, &oldtup);
+	}
+
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
 
@@ -3566,11 +3688,20 @@ l2:
 	/* XLOG stuff */
 	if (RelationNeedsWAL(relation))
 	{
-		XLogRecPtr	recptr = log_heap_update(relation, buffer,
-											 newbuf, &oldtup, heaptup,
-											 all_visible_cleared,
-											 all_visible_cleared_new);
+		XLogRecPtr	recptr;
 
+		/* For logical decode we need combocids to properly decode the catalog */
+		if (RelationIsDoingTimetravel(relation))
+		{
+			log_heap_new_cid(relation, &oldtup);
+			log_heap_new_cid(relation, heaptup);
+		}
+
+		recptr = log_heap_update(relation, buffer,
+								 newbuf, &oldtup, heaptup,
+								 old_idx_tuple,
+								 all_visible_cleared,
+								 all_visible_cleared_new);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -3722,18 +3853,23 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
  * modify columns used in the key.
  */
 static void
-HeapSatisfiesHOTandKeyUpdate(Relation relation,
-							 Bitmapset *hot_attrs, Bitmapset *key_attrs,
+HeapSatisfiesHOTandKeyUpdate(Relation relation, Bitmapset *hot_attrs,
+							 Bitmapset *key_attrs, Bitmapset *ckey_attrs,
 							 bool *satisfies_hot, bool *satisfies_key,
+							 bool *satisfies_ckey,
 							 HeapTuple oldtup, HeapTuple newtup)
 {
 	int			next_hot_attnum;
 	int			next_key_attnum;
+	int			next_ckey_attnum;
 	bool		hot_result = true;
 	bool		key_result = true;
-	bool		key_done = false;
+	bool		ckey_result = true;
 	bool		hot_done = false;
 
+	Assert(bms_is_subset(ckey_attrs, key_attrs));
+	Assert(bms_is_subset(key_attrs, hot_attrs));
+
 	next_hot_attnum = bms_first_member(hot_attrs);
 	if (next_hot_attnum == -1)
 		hot_done = true;
@@ -3742,28 +3878,25 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
 		next_hot_attnum += FirstLowInvalidHeapAttributeNumber;
 
 	next_key_attnum = bms_first_member(key_attrs);
-	if (next_key_attnum == -1)
-		key_done = true;
-	else
+	if (next_key_attnum != -1)
 		/* Adjust for system attributes */
 		next_key_attnum += FirstLowInvalidHeapAttributeNumber;
 
+	next_ckey_attnum = bms_first_member(ckey_attrs);
+	if (next_ckey_attnum != -1)
+		/* Adjust for system attributes */
+		next_ckey_attnum += FirstLowInvalidHeapAttributeNumber;
+
 	for (;;)
 	{
 		int			check_now;
 		bool		changed;
 
-		/* both bitmapsets are now empty */
-		if (key_done && hot_done)
+		/* bitmapsets are now empty, hot includes others */
+		if (hot_done)
 			break;
 
-		/* XXX there's probably an easier way ... */
-		if (hot_done)
-			check_now = next_key_attnum;
-		if (key_done)
-			check_now = next_hot_attnum;
-		else
-			check_now = Min(next_hot_attnum, next_key_attnum);
+		check_now = next_hot_attnum;
 
 		changed = !heap_tuple_attr_equals(RelationGetDescr(relation),
 										  check_now, oldtup, newtup);
@@ -3773,11 +3906,15 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
 				hot_result = false;
 			if (check_now == next_key_attnum)
 				key_result = false;
+			if (check_now == next_ckey_attnum)
+				ckey_result = false;
 		}
 
 		/* if both are false now, we can stop checking */
-		if (!hot_result && !key_result)
+		if (!hot_result && !key_result && !ckey_result)
+		{
 			break;
+		}
 
 		if (check_now == next_hot_attnum)
 		{
@@ -3791,16 +3928,22 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
 		if (check_now == next_key_attnum)
 		{
 			next_key_attnum = bms_first_member(key_attrs);
-			if (next_key_attnum == -1)
-				key_done = true;
-			else
+			if (next_key_attnum != -1)
 				/* Adjust for system attributes */
 				next_key_attnum += FirstLowInvalidHeapAttributeNumber;
 		}
+		if (check_now == next_ckey_attnum)
+		{
+			next_ckey_attnum = bms_first_member(ckey_attrs);
+			if (next_ckey_attnum != -1)
+				/* Adjust for system attributes */
+				next_ckey_attnum += FirstLowInvalidHeapAttributeNumber;
+		}
 	}
 
 	*satisfies_hot = hot_result;
 	*satisfies_key = key_result;
+	*satisfies_ckey = ckey_result;
 }
 
 /*
@@ -5822,15 +5965,21 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
+				HeapTuple idx_tuple,
 				bool all_visible_cleared, bool new_all_visible_cleared)
 {
 	xl_heap_update xlrec;
-	xl_heap_header xlhdr;
+	xl_heap_header_len xlhdr;
 	uint8		info;
 	XLogRecPtr	recptr;
 	XLogRecData rdata[4];
 	Page		page = BufferGetPage(newbuf);
 
+	/*
+	 * Just as for XLOG_HEAP_INSERT we need to make sure the tuple
+	 */
+	bool        need_tuple_data = RelationIsLogicallyLogged(reln);
+
 	/* Caller should not call me on a non-WAL-logged relation */
 	Assert(RelationNeedsWAL(reln));
 
@@ -5845,9 +5994,12 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	xlrec.old_infobits_set = compute_infobits(oldtup->t_data->t_infomask,
 											  oldtup->t_data->t_infomask2);
 	xlrec.new_xmax = HeapTupleHeaderGetRawXmax(newtup->t_data);
-	xlrec.all_visible_cleared = all_visible_cleared;
+	xlrec.flags = 0;
+	if (all_visible_cleared)
+		xlrec.flags |= XLOG_HEAP_ALL_VISIBLE_CLEARED;
 	xlrec.newtid = newtup->t_self;
-	xlrec.new_all_visible_cleared = new_all_visible_cleared;
+	if (new_all_visible_cleared)
+		xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
 
 	rdata[0].data = (char *) &xlrec;
 	rdata[0].len = SizeOfHeapUpdate;
@@ -5860,33 +6012,80 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	rdata[1].buffer_std = true;
 	rdata[1].next = &(rdata[2]);
 
-	xlhdr.t_infomask2 = newtup->t_data->t_infomask2;
-	xlhdr.t_infomask = newtup->t_data->t_infomask;
-	xlhdr.t_hoff = newtup->t_data->t_hoff;
+	xlhdr.header.t_infomask2 = newtup->t_data->t_infomask2;
+	xlhdr.header.t_infomask = newtup->t_data->t_infomask;
+	xlhdr.header.t_hoff = newtup->t_data->t_hoff;
+	xlhdr.t_len = newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
 
-	/*
-	 * As with insert records, we need not store the rdata[2] segment if we
-	 * decide to store the whole buffer instead.
-	 */
 	rdata[2].data = (char *) &xlhdr;
-	rdata[2].len = SizeOfHeapHeader;
-	rdata[2].buffer = newbuf;
+	rdata[2].len = SizeOfHeapHeaderLen;
+	rdata[2].buffer = need_tuple_data ? InvalidBuffer : newbuf;
 	rdata[2].buffer_std = true;
 	rdata[2].next = &(rdata[3]);
 
 	/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
-	rdata[3].data = (char *) newtup->t_data + offsetof(HeapTupleHeaderData, t_bits);
+	rdata[3].data = (char *) newtup->t_data
+		+ offsetof(HeapTupleHeaderData, t_bits);
 	rdata[3].len = newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
-	rdata[3].buffer = newbuf;
+	rdata[3].buffer = need_tuple_data ? InvalidBuffer : newbuf;
 	rdata[3].buffer_std = true;
 	rdata[3].next = NULL;
 
+	/*
+	 * separate storage for the buffer reference of the new page in the
+	 * wal_level >= logical case
+	*/
+	if(need_tuple_data)
+	{
+		XLogRecData rdata_logical[4];
+
+		rdata[3].next = &(rdata_logical[0]);
+
+		rdata_logical[0].data = NULL,
+		rdata_logical[0].len = 0;
+		rdata_logical[0].buffer = newbuf;
+		rdata_logical[0].buffer_std = true;
+		rdata_logical[0].next = NULL;
+		xlrec.flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+
+		/* candidate key changed and we have a candidate key */
+		if (idx_tuple)
+		{
+			/* don't really need this, but its more comfy */
+			xl_heap_header_len xlhdr_idx;
+			xlhdr_idx.header.t_infomask2 = idx_tuple->t_data->t_infomask2;
+			xlhdr_idx.header.t_infomask = idx_tuple->t_data->t_infomask;
+			xlhdr_idx.header.t_hoff = idx_tuple->t_data->t_hoff;
+			xlhdr_idx.t_len = idx_tuple->t_len;
+
+			rdata_logical[0].next = &(rdata_logical[1]);
+			rdata_logical[1].data = (char *) &xlhdr_idx;
+			rdata_logical[1].len = SizeOfHeapHeaderLen;
+			rdata_logical[1].buffer = InvalidBuffer;
+			rdata_logical[1].next = &(rdata_logical[2]);
+
+			/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
+			rdata_logical[2].data = (char *) idx_tuple->t_data
+				+ offsetof(HeapTupleHeaderData, t_bits);
+			rdata_logical[2].len = idx_tuple->t_len
+				- offsetof(HeapTupleHeaderData, t_bits);
+			rdata_logical[2].buffer = InvalidBuffer;
+			rdata_logical[2].next = NULL;
+			xlrec.flags |= XLOG_HEAP_CONTAINS_OLD_KEY;
+		}
+	}
+
 	/* If new tuple is the single and first tuple on page... */
 	if (ItemPointerGetOffsetNumber(&(newtup->t_self)) == FirstOffsetNumber &&
 		PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
 	{
+		XLogRecData *rcur = &rdata[0];
 		info |= XLOG_HEAP_INIT_PAGE;
-		rdata[2].buffer = rdata[3].buffer = InvalidBuffer;
+		while (rcur != NULL)
+		{
+			rcur->buffer = InvalidBuffer;
+			rcur = rcur->next;
+		}
 	}
 
 	recptr = XLogInsert(RM_HEAP_ID, info, rdata);
@@ -5993,6 +6192,114 @@ log_newpage_buffer(Buffer buffer)
 }
 
 /*
+ * Perform XLogInsert of a XLOG_HEAP2_NEW_CID record
+ *
+ * The HeapTuple really needs to already have a ComboCid set otherwise we
+ * cannot detect combocid/cmin/cmax.
+ *
+ * This is only used in wal_level >= WAL_LEVEL_LOGICAL
+ */
+static XLogRecPtr
+log_heap_new_cid(Relation relation, HeapTuple tup)
+{
+	xl_heap_new_cid xlrec;
+
+	XLogRecPtr	recptr;
+	XLogRecData rdata[1];
+	HeapTupleHeader hdr = tup->t_data;
+
+	Assert(ItemPointerIsValid(&tup->t_self));
+	Assert(tup->t_tableOid != InvalidOid);
+
+	xlrec.top_xid = GetTopTransactionId();
+	xlrec.target.node = relation->rd_node;
+	xlrec.target.tid = tup->t_self;
+
+	/*
+	 * if the tuple got inserted & deleted in the same TX we definitely have a
+	 * combocid.
+	 */
+	if (hdr->t_infomask & HEAP_COMBOCID)
+	{
+		xlrec.cmin = HeapTupleHeaderGetCmin(hdr);
+		xlrec.cmax = HeapTupleHeaderGetCmax(hdr);
+		xlrec.combocid = HeapTupleHeaderGetRawCommandId(hdr);
+	}
+	else
+	{
+		/* tuple inserted */
+		if (hdr->t_infomask & HEAP_XMAX_INVALID)
+		{
+			xlrec.cmin = HeapTupleHeaderGetRawCommandId(hdr);
+			xlrec.cmax = InvalidCommandId;
+		}
+		/* tuple from a different tx updated or deleted */
+		else
+		{
+			xlrec.cmin = InvalidCommandId;
+			xlrec.cmax = HeapTupleHeaderGetRawCommandId(hdr);
+
+		}
+		xlrec.combocid = InvalidCommandId;
+	}
+
+	rdata[0].data = (char *) &xlrec;
+	rdata[0].len = SizeOfHeapNewCid;
+	rdata[0].buffer = InvalidBuffer;
+	rdata[0].next = NULL;
+
+	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_NEW_CID, rdata);
+
+	return recptr;
+}
+
+static HeapTuple
+ExtractKeyTuple(Relation relation, HeapTuple tp)
+{
+	HeapTuple idx_tuple = NULL;
+	TupleDesc desc = RelationGetDescr(relation);
+	Relation idx_rel;
+	TupleDesc idx_desc;
+	Datum idx_vals[INDEX_MAX_KEYS];
+	bool idx_isnull[INDEX_MAX_KEYS];
+	int natt;
+
+	/* needs to already have been fetched? */
+	if (relation->rd_indexvalid == 0)
+		RelationGetIndexList(relation);
+
+	if (!OidIsValid(relation->rd_primary))
+	{
+		elog(DEBUG1, "Could not find primary key for table with oid %u",
+			 RelationGetRelid(relation));
+	}
+	else
+	{
+		idx_rel = RelationIdGetRelation(relation->rd_primary);
+		idx_desc = RelationGetDescr(idx_rel);
+
+		for (natt = 0; natt < idx_desc->natts; natt++)
+		{
+			int attno = idx_rel->rd_index->indkey.values[natt];
+			if (attno == ObjectIdAttributeNumber)
+			{
+				idx_vals[natt] = HeapTupleGetOid(tp);
+				idx_isnull[natt] = false;
+			}
+			else
+			{
+				idx_vals[natt] =
+					fastgetattr(tp, attno, desc, &idx_isnull[natt]);
+			}
+			Assert(!idx_isnull[natt]);
+		}
+		idx_tuple = heap_form_tuple(idx_desc, idx_vals, idx_isnull);
+		RelationClose(idx_rel);
+	}
+	return idx_tuple;
+}
+
+/*
  * Handles CLEANUP_INFO
  */
 static void
@@ -6353,7 +6660,7 @@ heap_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->all_visible_cleared)
+	if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(xlrec->target.node);
 		Buffer		vmbuffer = InvalidBuffer;
@@ -6402,7 +6709,7 @@ heap_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
 	/* Mark the page as a candidate for pruning */
 	PageSetPrunable(page, record->xl_xid);
 
-	if (xlrec->all_visible_cleared)
+	if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 		PageClearAllVisible(page);
 
 	/* Make sure there is no forward chain link in t_ctid */
@@ -6436,7 +6743,7 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->all_visible_cleared)
+	if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(xlrec->target.node);
 		Buffer		vmbuffer = InvalidBuffer;
@@ -6507,7 +6814,7 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
 
 	PageSetLSN(page, lsn);
 
-	if (xlrec->all_visible_cleared)
+	if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 		PageClearAllVisible(page);
 
 	MarkBufferDirty(buffer);
@@ -6570,7 +6877,7 @@ heap_xlog_multi_insert(XLogRecPtr lsn, XLogRecord *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->all_visible_cleared)
+	if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(xlrec->node);
 		Buffer		vmbuffer = InvalidBuffer;
@@ -6653,7 +6960,7 @@ heap_xlog_multi_insert(XLogRecPtr lsn, XLogRecord *record)
 
 	PageSetLSN(page, lsn);
 
-	if (xlrec->all_visible_cleared)
+	if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 		PageClearAllVisible(page);
 
 	MarkBufferDirty(buffer);
@@ -6692,7 +6999,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
 		HeapTupleHeaderData hdr;
 		char		data[MaxHeapTupleSize];
 	}			tbuf;
-	xl_heap_header xlhdr;
+	xl_heap_header_len xlhdr;
 	int			hsize;
 	uint32		newlen;
 	Size		freespace;
@@ -6701,7 +7008,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->all_visible_cleared)
+	if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(xlrec->target.node);
 		BlockNumber block = ItemPointerGetBlockNumber(&xlrec->target.tid);
@@ -6779,7 +7086,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
 	/* Mark the page as a candidate for pruning */
 	PageSetPrunable(page, record->xl_xid);
 
-	if (xlrec->all_visible_cleared)
+	if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 		PageClearAllVisible(page);
 
 	/*
@@ -6803,7 +7110,7 @@ newt:;
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->new_all_visible_cleared)
+	if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(xlrec->target.node);
 		BlockNumber block = ItemPointerGetBlockNumber(&xlrec->newtid);
@@ -6861,13 +7168,13 @@ newsame:;
 	if (PageGetMaxOffsetNumber(page) + 1 < offnum)
 		elog(PANIC, "heap_update_redo: invalid max offset number");
 
-	hsize = SizeOfHeapUpdate + SizeOfHeapHeader;
+	hsize = SizeOfHeapUpdate + SizeOfHeapHeaderLen;
 
-	newlen = record->xl_len - hsize;
-	Assert(newlen <= MaxHeapTupleSize);
 	memcpy((char *) &xlhdr,
 		   (char *) xlrec + SizeOfHeapUpdate,
-		   SizeOfHeapHeader);
+		   SizeOfHeapHeaderLen);
+	newlen = xlhdr.t_len;
+	Assert(newlen <= MaxHeapTupleSize);
 	htup = &tbuf.hdr;
 	MemSet((char *) htup, 0, sizeof(HeapTupleHeaderData));
 	/* PG73FORMAT: get bitmap [+ padding] [+ oid] + data */
@@ -6875,9 +7182,9 @@ newsame:;
 		   (char *) xlrec + hsize,
 		   newlen);
 	newlen += offsetof(HeapTupleHeaderData, t_bits);
-	htup->t_infomask2 = xlhdr.t_infomask2;
-	htup->t_infomask = xlhdr.t_infomask;
-	htup->t_hoff = xlhdr.t_hoff;
+	htup->t_infomask2 = xlhdr.header.t_infomask2;
+	htup->t_infomask = xlhdr.header.t_infomask;
+	htup->t_hoff = xlhdr.header.t_hoff;
 
 	HeapTupleHeaderSetXmin(htup, record->xl_xid);
 	HeapTupleHeaderSetCmin(htup, FirstCommandId);
@@ -6889,7 +7196,7 @@ newsame:;
 	if (offnum == InvalidOffsetNumber)
 		elog(PANIC, "heap_update_redo: failed to add tuple");
 
-	if (xlrec->new_all_visible_cleared)
+	if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED)
 		PageClearAllVisible(page);
 
 	freespace = PageGetHeapFreeSpace(page);		/* needed to update FSM below */
@@ -7140,6 +7447,9 @@ heap2_redo(XLogRecPtr lsn, XLogRecord *record)
 		case XLOG_HEAP2_LOCK_UPDATED:
 			heap_xlog_lock_updated(lsn, record);
 			break;
+		case XLOG_HEAP2_NEW_CID:
+			/* nothing to do on a real replay, only during logical decoding */
+			break;
 		default:
 			elog(PANIC, "heap2_redo: unknown op code %u", info);
 	}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3b68705..10587b8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -75,6 +75,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, TransactionId OldestXmin)
 	Page		page = BufferGetPage(buffer);
 	Size		minfree;
 
+	Assert(TransactionIdIsValid(OldestXmin));
+
 	/*
 	 * Let's see if we really need pruning.
 	 *
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index b878155..3bac4a5 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -67,7 +67,10 @@
 
 #include "access/relscan.h"
 #include "access/transam.h"
+#include "access/xlog.h"
+
 #include "catalog/index.h"
+#include "catalog/catalog.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
@@ -520,8 +523,15 @@ index_fetch_heap(IndexScanDesc scan)
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != scan->xs_cbuf)
-			heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
-								RecentGlobalXmin);
+		{
+			if (IsSystemRelation(scan->heapRelation)
+				|| RelationIsDoingTimetravel(scan->heapRelation))
+				heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
+									RecentGlobalXmin);
+			else
+				heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
+									RecentGlobalDataXmin);
+		}
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index bc8b985..c750fef 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -184,6 +184,15 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
 						 xlrec->infobits_set);
 		out_target(buf, &(xlrec->target));
 	}
+	else if (info == XLOG_HEAP2_NEW_CID)
+	{
+		xl_heap_new_cid *xlrec = (xl_heap_new_cid *) rec;
+
+		appendStringInfo(buf, "new_cid: ");
+		out_target(buf, &(xlrec->target));
+		appendStringInfo(buf, "; cmin: %u, cmax: %u, combo: %u",
+						 xlrec->cmin, xlrec->cmax, xlrec->combocid);
+	}
 	else
 		appendStringInfo(buf, "UNKNOWN");
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 2bad527..f1a75b4 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -28,6 +28,7 @@ const struct config_enum_entry wal_level_options[] = {
 	{"minimal", WAL_LEVEL_MINIMAL, false},
 	{"archive", WAL_LEVEL_ARCHIVE, false},
 	{"hot_standby", WAL_LEVEL_HOT_STANDBY, false},
+	{"logical", WAL_LEVEL_LOGICAL, false},
 	{NULL, 0, false}
 };
 
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e975f8d..d46a50e 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -47,6 +47,7 @@
 #include "access/twophase.h"
 #include "access/twophase_rmgr.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlogutils.h"
 #include "catalog/pg_type.h"
 #include "catalog/storage.h"
@@ -1920,7 +1921,8 @@ RecoverPreparedTransactions(void)
 			 * the prepared transaction generated xid assignment records. Test
 			 * here must match one used in AssignTransactionId().
 			 */
-			if (InHotStandby && hdr->nsubxacts >= PGPROC_MAX_CACHED_SUBXIDS)
+			if (InHotStandby && (hdr->nsubxacts >= PGPROC_MAX_CACHED_SUBXIDS ||
+			                     XLogLogicalInfoActive()))
 				overwriteOK = true;
 
 			/*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0591f3f..dc093e6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -431,6 +431,7 @@ AssignTransactionId(TransactionState s)
 {
 	bool		isSubXact = (s->parent != NULL);
 	ResourceOwner currentOwner;
+	bool log_unknown_top = false;
 
 	/* Assert that caller didn't screw up */
 	Assert(!TransactionIdIsValid(s->transactionId));
@@ -438,7 +439,7 @@ AssignTransactionId(TransactionState s)
 
 	/*
 	 * Ensure parent(s) have XIDs, so that a child always has an XID later
-	 * than its parent.  Musn't recurse here, or we might get a stack overflow
+	 * than its parent.  May not recurse here, or we might get a stack overflow
 	 * if we're at the bottom of a huge stack of subtransactions none of which
 	 * have XIDs yet.
 	 */
@@ -456,6 +457,17 @@ AssignTransactionId(TransactionState s)
 		}
 
 		/*
+		 * Force the toplevel xid to be logged before suxact's are logged. If
+		 * the uppermost level already has an xid that precondition already is
+		 * fulfilled.
+		 */
+		Assert(parentOffset);
+		if (XLogLogicalInfoActive() && parents[parentOffset - 1]->parent == NULL)
+		{
+			log_unknown_top = true;
+		}
+
+		/*
 		 * This is technically a recursive call, but the recursion will never
 		 * be more than one layer deep.
 		 */
@@ -519,6 +531,9 @@ AssignTransactionId(TransactionState s)
 	 * top-level transaction that each subxact belongs to. This is correct in
 	 * recovery only because aborted subtransactions are separately WAL
 	 * logged.
+	 *
+	 * This is correct even for the case where several levels above us didn't
+	 * have an xid assigned as we recursed up to them beforehand.
 	 */
 	if (isSubXact && XLogStandbyInfoActive())
 	{
@@ -529,7 +544,8 @@ AssignTransactionId(TransactionState s)
 		 * ensure this test matches similar one in
 		 * RecoverPreparedTransactions()
 		 */
-		if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS)
+		if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS ||
+		    log_unknown_top)
 		{
 			XLogRecData rdata[2];
 			xl_xact_assignment xlrec;
@@ -548,7 +564,7 @@ AssignTransactionId(TransactionState s)
 			rdata[0].next = &rdata[1];
 
 			rdata[1].data = (char *) unreportedXids;
-			rdata[1].len = PGPROC_MAX_CACHED_SUBXIDS * sizeof(TransactionId);
+			rdata[1].len = nUnreportedXids * sizeof(TransactionId);
 			rdata[1].buffer = InvalidBuffer;
 			rdata[1].next = NULL;
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ac51193..1ffacde 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
 #include "postmaster/startup.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
+#include "replication/logical.h"
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -5195,6 +5196,13 @@ StartupXLOG(void)
 	XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 	XLogCtl->ckptXid = checkPoint.nextXid;
 
+
+	/*
+	 * Startup logical state, needs to be setup now so we have proper data
+	 * during restore. XXX
+	 */
+	StartupLogicalReplication(checkPoint.redo);
+
 	/*
 	 * Initialize unlogged LSN. On a clean shutdown, it's restored from the
 	 * control file. On recovery, all unlogged relations are blown away, so
@@ -7165,7 +7173,7 @@ CreateCheckPoint(int flags)
 	 * StartupSUBTRANS hasn't been called yet.
 	 */
 	if (!RecoveryInProgress())
-		TruncateSUBTRANS(GetOldestXmin(true, false, false));
+		TruncateSUBTRANS(GetOldestXmin(true, true, false, false));
 
 	/* Real work is done, but log and update stats before releasing lock. */
 	LogCheckpointEnd(false);
@@ -7522,7 +7530,7 @@ CreateRestartPoint(int flags)
 	 * this because StartupSUBTRANS hasn't been called yet.
 	 */
 	if (EnableHotStandby)
-		TruncateSUBTRANS(GetOldestXmin(true, false, false));
+		TruncateSUBTRANS(GetOldestXmin(true, true, false, false));
 
 	/* Real work is done, but log and update before releasing lock. */
 	LogCheckpointEnd(true);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 41a5da0..48fd182 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -106,7 +106,6 @@ GetDatabasePath(Oid dbNode, Oid spcNode)
 	return path;
 }
 
-
 /*
  * IsSystemRelation
  *		True iff the relation is a system catalog relation.
@@ -123,8 +122,17 @@ GetDatabasePath(Oid dbNode, Oid spcNode)
 bool
 IsSystemRelation(Relation relation)
 {
-	return IsSystemNamespace(RelationGetNamespace(relation)) ||
-		IsToastNamespace(RelationGetNamespace(relation));
+	return IsSystemRelationId(RelationGetRelid(relation));
+}
+
+/*
+ * IsSystemRelationId
+ *		True iff the relation is a system catalog relation.
+ */
+bool
+IsSystemRelationId(Oid relid)
+{
+	return relid < FirstNormalObjectId;
 }
 
 /*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index bfad8b1..bcdd305 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2196,9 +2196,19 @@ IndexBuildHeapScan(Relation heapRelation,
 	}
 	else
 	{
+		/*
+		 * We can ignore a) pegged xmins b) shared relations if we don't scan
+		 * something acting as a catalog.
+		 */
+		bool include_systables =
+			IsSystemRelation(heapRelation) ||
+			RelationIsDoingTimetravel(heapRelation);
+
 		snapshot = SnapshotAny;
 		/* okay to ignore lazy VACUUMs here */
-		OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true,
+		OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared,
+								   include_systables,
+								   true,
 								   false);
 	}
 
@@ -3367,7 +3377,7 @@ reindex_relation(Oid relid, int flags)
 
 	/* Ensure rd_indexattr is valid; see comments for RelationSetIndexList */
 	if (is_pg_class)
-		(void) RelationGetIndexAttrBitmap(rel, false);
+		(void) RelationGetIndexAttrBitmap(rel, INDEX_ATTR_BITMAP_ALL);
 
 	PG_TRY();
 	{
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 81d7c4f..e16fcb7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -612,6 +612,16 @@ CREATE VIEW pg_stat_replication AS
     WHERE S.usesysid = U.oid AND
             S.pid = W.pid;
 
+CREATE VIEW pg_stat_logical_decoding AS
+    SELECT
+            L.slot_name,
+            L.plugin,
+            L.database,
+            L.active,
+            L.xmin,
+            L.restart_decoding_lsn
+    FROM pg_stat_get_logical_decoding_slots() AS L;
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7968319..7a05cea 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1081,7 +1081,7 @@ acquire_sample_rows(Relation onerel, int elevel,
 	totalblocks = RelationGetNumberOfBlocks(onerel);
 
 	/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
-	OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true, false);
+	OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true, true, false);
 
 	/* Prepare for sampling block numbers */
 	BlockSampler_Init(&bs, totalblocks, targrows);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 5064081..8c953e1 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -847,6 +847,8 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 	 */
 	vacuum_set_xid_limits(freeze_min_age, freeze_table_age,
 						  OldHeap->rd_rel->relisshared,
+						  IsSystemRelation(OldHeap)
+						  || RelationIsDoingTimetravel(OldHeap),
 						  &OldestXmin, &FreezeXid, NULL, &MultiXactFrzLimit);
 
 	/*
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ed65bab..d348e34 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2355,7 +2355,8 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	 * concurrency.
 	 */
 	modifiedCols = GetModifiedColumns(relinfo, estate);
-	keyCols = RelationGetIndexAttrBitmap(relinfo->ri_RelationDesc, true);
+	keyCols = RelationGetIndexAttrBitmap(relinfo->ri_RelationDesc,
+										 INDEX_ATTR_BITMAP_KEY);
 	if (bms_overlap(keyCols, modifiedCols))
 		lockmode = LockTupleExclusive;
 	else
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 924a12e..8aa384a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -381,6 +381,7 @@ void
 vacuum_set_xid_limits(int freeze_min_age,
 					  int freeze_table_age,
 					  bool sharedRel,
+					  bool catalogRel,
 					  TransactionId *oldestXmin,
 					  TransactionId *freezeLimit,
 					  TransactionId *freezeTableLimit,
@@ -399,7 +400,7 @@ vacuum_set_xid_limits(int freeze_min_age,
 	 * working on a particular table at any time, and that each vacuum is
 	 * always an independent transaction.
 	 */
-	*oldestXmin = GetOldestXmin(sharedRel, true, false);
+	*oldestXmin = GetOldestXmin(sharedRel, catalogRel, true, false);
 
 	Assert(TransactionIdIsNormal(*oldestXmin));
 
@@ -720,7 +721,7 @@ vac_update_datfrozenxid(void)
 	 * committed pg_class entries for new tables; see AddNewRelationTuple().
 	 * So we cannot produce a wrong minimum by starting with this.
 	 */
-	newFrozenXid = GetOldestXmin(true, true, false);
+	newFrozenXid = GetOldestXmin(true, true, true, false);
 
 	/*
 	 * Similarly, initialize the MultiXact "min" with the value that would be
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 2ea0590..b650eee 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -44,6 +44,7 @@
 #include "access/multixact.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
+#include "catalog/catalog.h"
 #include "catalog/storage.h"
 #include "commands/dbcommands.h"
 #include "commands/vacuum.h"
@@ -202,6 +203,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 
 	vacuum_set_xid_limits(vacstmt->freeze_min_age, vacstmt->freeze_table_age,
 						  onerel->rd_rel->relisshared,
+						  IsSystemRelation(onerel)
+						  || RelationIsDoingTimetravel(onerel),
 						  &OldestXmin, &FreezeLimit, &freezeTableLimit,
 						  &MultiXactFrzLimit);
 	scan_all = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
@@ -1722,7 +1725,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId *visibility_cut
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 86f0686..6c301b8 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -837,7 +837,7 @@ PostmasterMain(int argc, char *argv[])
 				(errmsg("WAL archival (archive_mode=on) requires wal_level \"archive\" or \"hot_standby\"")));
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
-				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"archive\" or \"hot_standby\"")));
+				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"archive\", \"logical\" or \"hot_standby\"")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -1958,9 +1958,8 @@ retry1:
 	/* Generic Walsender is not related to a particular database */
 	if (am_walsender && strcmp(port->database_name, "replication") == 0)
 		port->database_name[0] = '\0';
-
-	if (am_walsender)
-		elog(WARNING, "connecting to %s", port->database_name);
+	else if (am_walsender)
+		elog(DEBUG1, "WAL sender attaching to database %s", port->database_name);
 
 	/*
 	 * Done putting stuff in TopMemoryContext.
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 2dde011..2e13e27 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,8 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
 OBJS = walsender.o walreceiverfuncs.o walreceiver.o basebackup.o \
 	repl_gram.o syncrep.o
 
+SUBDIRS = logical
+
 include $(top_srcdir)/src/backend/common.mk
 
 # repl_scanner is compiled as part of repl_gram
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
new file mode 100644
index 0000000..310a45c
--- /dev/null
+++ b/src/backend/replication/logical/Makefile
@@ -0,0 +1,19 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/logical
+#
+# IDENTIFICATION
+#    src/backend/replication/logical/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/logical
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
+
+OBJS = decode.o logical.o logicalfuncs.o reorderbuffer.o snapbuild.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
new file mode 100644
index 0000000..a93e48d
--- /dev/null
+++ b/src/backend/replication/logical/decode.c
@@ -0,0 +1,556 @@
+/*-------------------------------------------------------------------------
+ *
+ * decode.c
+ *		Decodes wal records from an xlogreader.h callback into an reorderbuffer
+ *		while building an appropriate snapshots to decode those
+ *
+ * NOTE:
+ * Its possible that the separation between decode.c and snapbuild.c is a
+ * bit too strict, in the end they just about have the same switch.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/replication/logical/decode.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+
+#include "access/heapam.h"
+#include "access/heapam_xlog.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+#include "catalog/pg_control.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+#include "utils/lsyscache.h"
+
+static void DecodeHeapOp(ReorderBuffer *reorder, XLogRecordBuffer *buf,
+			 RmgrId rmgr, uint8 info);
+static void DecodeTransactionOp(LogicalDecodingContext *ctx,
+					XLogRecordBuffer *buf);
+static void DecodeXLogTuple(char *data, Size len,
+				ReorderBufferTupleBuf *tuple);
+static void DecodeInsert(ReorderBuffer *reorder, XLogRecordBuffer *buf);
+static void DecodeUpdate(ReorderBuffer *reorder, XLogRecordBuffer *buf);
+static void DecodeDelete(ReorderBuffer *reorder, XLogRecordBuffer *buf);
+static void DecodeMultiInsert(ReorderBuffer *reorder, XLogRecordBuffer *buf);
+static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+			 TransactionId xid, TransactionId *sub_xids, int nsubxacts);
+static void DecodeAbort(ReorderBuffer *reorder, XLogRecPtr lsn,
+			TransactionId xid, TransactionId *sub_xids, int nsubxacts);
+
+
+void
+DecodeRecordIntoReorderBuffer(LogicalDecodingContext *ctx,
+							  XLogRecordBuffer *buf)
+{
+	XLogRecord *r = &buf->record;
+	uint8		info = r->xl_info & ~XLR_INFO_MASK;
+	ReorderBuffer *reorder = ctx->reorder;
+	SnapBuildAction action;
+
+	/*---------
+	 * Call the snapshot builder. It needs to be called before we analyze
+	 * tuples for two reasons:
+	 *
+	 * * Only in the snapshot building logic we know whether we have enough
+	 *	 information to decode a particular tuple
+	 *
+	 * * The Snapshot/CommandIds computed by the SnapshotBuilder need to be
+	 *	 added to the ReorderBuffer before we add tuples using them
+	 *---------
+	 */
+	action = SnapBuildProcessRecord(ctx->snapshot_builder, buf);
+
+	if (action == SNAPBUILD_SKIP)
+		return;
+
+	switch (r->xl_rmid)
+	{
+		case RM_HEAP_ID:
+		case RM_HEAP2_ID:
+			DecodeHeapOp(reorder, buf, r->xl_rmid,
+						 r->xl_info & XLOG_HEAP_OPMASK);
+			break;
+
+		case RM_XACT_ID:
+			DecodeTransactionOp(ctx, buf);
+			break;
+
+		case RM_XLOG_ID:
+			switch (info)
+			{
+				/* this is also used in END_OF_RECOVERY checkpoints */
+				case XLOG_CHECKPOINT_SHUTDOWN:
+
+					/*
+					 * abort all transactions that still are in progress,
+					 * they aren't in progress anymore.  do not abort
+					 * prepared transactions that have been prepared for
+					 * commit.
+					 *
+					 * FIXME: implement.
+					 */
+					break;
+			}
+		default:
+			break;
+	}
+}
+
+static void
+DecodeHeapOp(ReorderBuffer *reorder, XLogRecordBuffer *buf, RmgrId rmgr,
+			 uint8 info)
+{
+	switch (rmgr)
+	{
+		case RM_HEAP_ID:
+			switch (info)
+			{
+				case XLOG_HEAP_INSERT:
+					DecodeInsert(reorder, buf);
+					break;
+
+					/*
+					 * no guarantee that we get an HOT update again, so
+					 * handle it as a normal update
+					 */
+				case XLOG_HEAP_HOT_UPDATE:
+				case XLOG_HEAP_UPDATE:
+					DecodeUpdate(reorder, buf);
+					break;
+
+				case XLOG_HEAP_NEWPAGE:
+
+					/*
+					 * XXX: There doesn't seem to be a usecase for
+					 * decoding HEAP_NEWPAGE's. Its only used in various
+					 * indexam's and CLUSTER, neither of which should be
+					 * relevant for the logical changestream.
+					 */
+					break;
+
+				case XLOG_HEAP_DELETE:
+					DecodeDelete(reorder, buf);
+					break;
+				default:
+					break;
+			}
+			break;
+		case RM_HEAP2_ID:
+			switch (info)
+			{
+				case XLOG_HEAP2_MULTI_INSERT:
+					DecodeMultiInsert(reorder, buf);
+					break;
+
+				default:
+
+					/*
+					 * everything else here is just physical stuff were
+					 * not interested in
+					 */
+					break;
+			}
+			break;
+	}
+}
+
+static void
+DecodeTransactionOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+	ReorderBuffer  *reorder = ctx->reorder;
+	XLogRecord	   *r = &buf->record;
+
+	switch (r->xl_info & ~XLR_INFO_MASK)
+	{
+		case XLOG_XACT_COMMIT:
+			{
+				TransactionId *sub_xids = NULL;
+				xl_xact_commit *xlrec;
+
+				xlrec = (xl_xact_commit *) buf->record_data;
+
+				if (xlrec->nsubxacts > 0)
+					sub_xids = (TransactionId *)
+						&(xlrec->xnodes[xlrec->nrels]);
+
+				DecodeCommit(ctx, buf, r->xl_xid, sub_xids, xlrec->nsubxacts);
+
+				break;
+			}
+		case XLOG_XACT_COMMIT_PREPARED:
+			{
+				TransactionId *sub_xids;
+				xl_xact_commit_prepared *xlrec;
+
+				xlrec = (xl_xact_commit_prepared *) buf->record_data;
+				sub_xids = (TransactionId *)
+					&(xlrec->crec.xnodes[xlrec->crec.nrels]);
+
+				/* r->xl_xid is committed in a separate record */
+				DecodeCommit(ctx, buf, xlrec->xid, sub_xids,
+							 xlrec->crec.nsubxacts);
+
+				break;
+			}
+		case XLOG_XACT_COMMIT_COMPACT:
+			{
+				xl_xact_commit_compact *xlrec;
+
+				xlrec = (xl_xact_commit_compact *) buf->record_data;
+
+				DecodeCommit(ctx, buf, r->xl_xid, xlrec->subxacts,
+							 xlrec->nsubxacts);
+				break;
+			}
+		case XLOG_XACT_ABORT:
+			{
+				TransactionId *sub_xids;
+				xl_xact_abort *xlrec;
+
+				xlrec = (xl_xact_abort *) buf->record_data;
+
+				sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+
+				DecodeAbort(reorder, buf->origptr, r->xl_xid,
+							sub_xids, xlrec->nsubxacts);
+				break;
+			}
+		case XLOG_XACT_ABORT_PREPARED:
+			{
+				TransactionId *sub_xids;
+				xl_xact_abort_prepared *xlrec;
+				xl_xact_abort *arec;
+
+				xlrec = (xl_xact_abort_prepared *) buf->record_data;
+				arec = &xlrec->arec;
+
+				sub_xids = (TransactionId *) &(arec->xnodes[arec->nrels]);
+				/* r->xl_xid is committed in a separate record */
+				DecodeAbort(reorder, buf->origptr, xlrec->xid,
+							sub_xids, arec->nsubxacts);
+				break;
+			}
+
+		case XLOG_XACT_ASSIGNMENT:
+			{
+				int			i;
+				TransactionId *sub_xid;
+				xl_xact_assignment *xlrec =
+					(xl_xact_assignment *) buf->record_data;
+
+				sub_xid = &xlrec->xsub[0];
+
+				for (i = 0; i < xlrec->nsubxacts; i++)
+				{
+					ReorderBufferAssignChild(reorder, r->xl_xid,
+											 *(sub_xid++), buf->origptr);
+				}
+				break;
+			}
+		case XLOG_XACT_PREPARE:
+
+			/*
+			 * XXX: we could replay the transaction and prepare it
+			 * as well.
+			 */
+			break;
+		default:
+			break;
+	}
+}
+
+static void
+DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, TransactionId xid,
+			 TransactionId *sub_xids, int nsubxacts)
+{
+	int			i;
+
+	/*
+	 * If we are not interested in anything up to this LSN convert the commit
+	 * into an ABORT to cleanup.
+	 *
+	 * FIXME: this needs to replay invalidations anyway!
+	 */
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr))
+	{
+		DecodeAbort(ctx->reorder, buf->origptr, xid,
+					sub_xids, nsubxacts);
+		return;
+	}
+
+	for (i = 0; i < nsubxacts; i++)
+	{
+		ReorderBufferCommitChild(ctx->reorder, xid, *sub_xids,
+								 buf->origptr);
+		sub_xids++;
+	}
+
+	/* replay actions of all transaction + subtransactions in order */
+	ReorderBufferCommit(ctx->reorder, xid, buf->origptr);
+}
+
+static void
+DecodeAbort(ReorderBuffer *reorder, XLogRecPtr lsn, TransactionId xid,
+			TransactionId *sub_xids, int nsubxacts)
+{
+	int			i;
+
+	for (i = 0; i < nsubxacts; i++)
+	{
+		ReorderBufferAbort(reorder, *sub_xids, lsn);
+		sub_xids++;
+	}
+
+	ReorderBufferAbort(reorder, xid, lsn);
+}
+
+static void
+DecodeInsert(ReorderBuffer *reorder, XLogRecordBuffer *buf)
+{
+	XLogRecord *r = &buf->record;
+	xl_heap_insert *xlrec;
+	ReorderBufferChange *change;
+
+	xlrec = (xl_heap_insert *) buf->record_data;
+
+	/* XXX: nicer */
+	if (xlrec->target.node.dbNode != MyDatabaseId)
+		return;
+
+	change = ReorderBufferGetChange(reorder);
+	change->action = REORDER_BUFFER_CHANGE_INSERT;
+	memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+	if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+	{
+		Assert(r->xl_len > (SizeOfHeapInsert + SizeOfHeapHeader));
+
+		change->newtuple = ReorderBufferGetTupleBuf(reorder);
+
+		DecodeXLogTuple((char *) xlrec + SizeOfHeapInsert,
+						r->xl_len - SizeOfHeapInsert,
+						change->newtuple);
+	}
+
+	ReorderBufferAddChange(reorder, r->xl_xid, buf->origptr, change);
+}
+
+static void
+DecodeUpdate(ReorderBuffer *reorder, XLogRecordBuffer *buf)
+{
+	XLogRecord *r = &buf->record;
+	xl_heap_update *xlrec;
+	xl_heap_header_len *xlhdr;
+	ReorderBufferChange *change;
+	char	   *data;
+
+	xlrec = (xl_heap_update *) buf->record_data;
+	xlhdr = (xl_heap_header_len *) (buf->record_data + SizeOfHeapUpdate);
+
+	/* XXX: nicer */
+	if (xlrec->target.node.dbNode != MyDatabaseId)
+		return;
+
+	change = ReorderBufferGetChange(reorder);
+	change->action = REORDER_BUFFER_CHANGE_UPDATE;
+	memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+	data = (char *) &xlhdr->header;
+
+	/*
+	 * FIXME: need to get/save the old tuple as well if we want primary key
+	 * changes to work.
+	 */
+	if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+	{
+		Assert(r->xl_len > (SizeOfHeapUpdate + SizeOfHeapHeaderLen));
+#if 0
+		elog(WARNING, "xl: %zu tp:%zu",
+			 (r->xl_len - SizeOfHeapUpdate - (SizeOfHeapHeaderLen - SizeOfHeapHeader)),
+			 xlhdr->t_len + SizeOfHeapHeader);
+#endif
+		change->newtuple = ReorderBufferGetTupleBuf(reorder);
+
+		DecodeXLogTuple(data,
+						xlhdr->t_len + SizeOfHeapHeader,
+						change->newtuple);
+		/* skip over the rest of the tuple header */
+		data += SizeOfHeapHeader;
+		/* skip over the tuple data */
+		data += xlhdr->t_len;
+	}
+	if (xlrec->flags & XLOG_HEAP_CONTAINS_OLD_KEY)
+	{
+		xlhdr = (xl_heap_header_len *) data;
+		change->oldtuple = ReorderBufferGetTupleBuf(reorder);
+		DecodeXLogTuple((char *) &xlhdr->header,
+						xlhdr->t_len + SizeOfHeapHeader,
+						change->oldtuple);
+		data = (char *) &xlhdr->header;
+		data += SizeOfHeapHeader;
+		data += xlhdr->t_len;
+	}
+
+	ReorderBufferAddChange(reorder, r->xl_xid, buf->origptr, change);
+}
+
+static void
+DecodeDelete(ReorderBuffer *reorder, XLogRecordBuffer *buf)
+{
+	XLogRecord *r = &buf->record;
+	xl_heap_delete *xlrec;
+	ReorderBufferChange *change;
+
+	xlrec = (xl_heap_delete *) buf->record_data;
+
+	/* XXX: nicer */
+	if (xlrec->target.node.dbNode != MyDatabaseId)
+		return;
+
+	change = ReorderBufferGetChange(reorder);
+	change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+	memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+	/* old primary key stored */
+	if (xlrec->flags & XLOG_HEAP_CONTAINS_OLD_KEY)
+	{
+		Assert(r->xl_len > (SizeOfHeapDelete + SizeOfHeapHeader));
+
+		change->oldtuple = ReorderBufferGetTupleBuf(reorder);
+
+		DecodeXLogTuple((char *) xlrec + SizeOfHeapDelete,
+						r->xl_len - SizeOfHeapDelete,
+						change->oldtuple);
+	}
+	ReorderBufferAddChange(reorder, r->xl_xid, buf->origptr, change);
+}
+
+/*
+ * Decode xl_heap_multi_insert record into multiple changes.
+ */
+static void
+DecodeMultiInsert(ReorderBuffer *reorder, XLogRecordBuffer *buf)
+{
+	XLogRecord *r = &buf->record;
+	xl_heap_multi_insert *xlrec;
+	int			i;
+	char	   *data;
+	bool		isinit = (r->xl_info & XLOG_HEAP_INIT_PAGE) != 0;
+
+	xlrec = (xl_heap_multi_insert *) buf->record_data;
+
+	/* XXX: nicer */
+	if (xlrec->node.dbNode != MyDatabaseId)
+		return;
+
+	data = buf->record_data + SizeOfHeapMultiInsert;
+
+	/*
+	 * OffsetNumbers (which are not of interest to us) are stored when
+	 * XLOG_HEAP_INIT_PAGE is not set -- skip over them.
+	 */
+	if (!isinit)
+		data += sizeof(OffsetNumber) * xlrec->ntuples;
+
+	for (i = 0; i < xlrec->ntuples; i++)
+	{
+		ReorderBufferChange *change;
+		xl_multi_insert_tuple *xlhdr;
+		int			datalen;
+		ReorderBufferTupleBuf *tuple;
+
+		change = ReorderBufferGetChange(reorder);
+		change->action = REORDER_BUFFER_CHANGE_INSERT;
+		memcpy(&change->relnode, &xlrec->node, sizeof(RelFileNode));
+
+		/*
+		 * CONTAINS_NEW_TUPLE will always be set currently as multi_insert
+		 * isn't used for catalogs, but better be future proof.
+		 *
+		 * We decode the tuple in pretty much the same way as DecodeXLogTuple,
+		 * but since the layout is slightly different, we can't use it here.
+		 */
+		if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+		{
+			change->newtuple = ReorderBufferGetTupleBuf(reorder);
+
+			tuple = change->newtuple;
+			/* not a disk based tuple */
+			ItemPointerSetInvalid(&tuple->tuple.t_self);
+
+			xlhdr = (xl_multi_insert_tuple *) SHORTALIGN(data);
+			data = ((char *) xlhdr) + SizeOfMultiInsertTuple;
+			datalen = xlhdr->datalen;
+
+			/* we can only figure this out after reassembling the transactions */
+			tuple->tuple.t_tableOid = InvalidOid;
+			tuple->tuple.t_data = &tuple->header;
+			tuple->tuple.t_len = datalen + offsetof(HeapTupleHeaderData, t_bits);
+
+			memset(&tuple->header, 0, sizeof(HeapTupleHeaderData));
+
+			memcpy((char *) &tuple->header + offsetof(HeapTupleHeaderData, t_bits),
+				   (char *) data,
+				   datalen);
+			data += datalen;
+
+			tuple->header.t_infomask = xlhdr->t_infomask;
+			tuple->header.t_infomask2 = xlhdr->t_infomask2;
+			tuple->header.t_hoff = xlhdr->t_hoff;
+		}
+
+		ReorderBufferAddChange(reorder, r->xl_xid, buf->origptr, change);
+	}
+}
+
+/*
+ * Read a tuple of size 'len' from 'data' into 'tuple'.
+ */
+static void
+DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
+{
+	xl_heap_header xlhdr;
+	int			datalen = len - SizeOfHeapHeader;
+
+	Assert(datalen >= 0);
+	Assert(datalen <= MaxHeapTupleSize);
+
+	tuple->tuple.t_len = datalen + offsetof(HeapTupleHeaderData, t_bits);
+
+	/* not a disk based tuple */
+	ItemPointerSetInvalid(&tuple->tuple.t_self);
+
+	/* we can only figure this out after reassembling the transactions */
+	tuple->tuple.t_tableOid = InvalidOid;
+	tuple->tuple.t_data = &tuple->header;
+
+	/* data is not stored aligned */
+	memcpy((char *) &xlhdr,
+		   data,
+		   SizeOfHeapHeader);
+
+	memset(&tuple->header, 0, sizeof(HeapTupleHeaderData));
+
+	memcpy((char *) &tuple->header + offsetof(HeapTupleHeaderData, t_bits),
+		   data + SizeOfHeapHeader,
+		   datalen);
+
+	tuple->header.t_infomask = xlhdr.t_infomask;
+	tuple->header.t_infomask2 = xlhdr.t_infomask2;
+	tuple->header.t_hoff = xlhdr.t_hoff;
+}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
new file mode 100644
index 0000000..2fe009b
--- /dev/null
+++ b/src/backend/replication/logical/logical.c
@@ -0,0 +1,1047 @@
+/*-------------------------------------------------------------------------
+ *
+ * logical.c
+ *
+ *	   Logical decoding shared memory management
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/replication/logical/logical.c
+ *
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+#include <sys/stat.h>
+
+#include "access/transam.h"
+
+#include "fmgr.h"
+#include "miscadmin.h"
+
+#include "replication/logical.h"
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/fd.h"
+#include "storage/copydir.h"
+
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+/*
+ * logical replication on-disk data structures.
+ */
+typedef struct LogicalDecodingSlotOnDisk
+{
+	uint32		magic;
+	LogicalDecodingSlot slot;
+} LogicalDecodingSlotOnDisk;
+
+#define LOGICAL_MAGIC	0x1051CA1		/* format identifier */
+
+/* Control array for logical decoding */
+LogicalDecodingCtlData *LogicalDecodingCtl = NULL;
+
+/* My slot for logical rep in the shared memory array */
+LogicalDecodingSlot *MyLogicalDecodingSlot = NULL;
+
+/* user settable parameters */
+int			max_logical_slots = 0;		/* the maximum number of logical slots */
+
+static void LogicalSlotKill(int code, Datum arg);
+
+/* persistency functions */
+static void RestoreLogicalSlot(const char *name);
+static void CreateLogicalSlot(LogicalDecodingSlot *slot);
+static void SaveLogicalSlot(LogicalDecodingSlot *slot);
+static void SaveLogicalSlotInternal(LogicalDecodingSlot *slot, const char *path);
+static void DeleteLogicalSlot(LogicalDecodingSlot *slot);
+
+
+/* Report shared-memory space needed by LogicalDecodingShmemInit */
+Size
+LogicalDecodingShmemSize(void)
+{
+	Size		size = 0;
+
+	if (max_logical_slots == 0)
+		return size;
+
+	size = offsetof(LogicalDecodingCtlData, logical_slots);
+	size = add_size(size,
+					mul_size(max_logical_slots, sizeof(LogicalDecodingSlot)));
+
+	return size;
+}
+
+/* Allocate and initialize walsender-related shared memory */
+void
+LogicalDecodingShmemInit(void)
+{
+	bool		found;
+
+	if (max_logical_slots == 0)
+		return;
+
+	LogicalDecodingCtl = (LogicalDecodingCtlData *)
+		ShmemInitStruct("Logical Decoding Ctl", LogicalDecodingShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		int			i;
+
+		/* First time through, so initialize */
+		MemSet(LogicalDecodingCtl, 0, LogicalDecodingShmemSize());
+
+		LogicalDecodingCtl->xmin = InvalidTransactionId;
+
+		for (i = 0; i < max_logical_slots; i++)
+		{
+			LogicalDecodingSlot *slot =
+			&LogicalDecodingCtl->logical_slots[i];
+
+			slot->xmin = InvalidTransactionId;
+			slot->effective_xmin = InvalidTransactionId;
+			SpinLockInit(&slot->mutex);
+		}
+	}
+}
+
+static void
+LogicalSlotKill(int code, Datum arg)
+{
+	/* LOCK? */
+	if (MyLogicalDecodingSlot && MyLogicalDecodingSlot->active)
+	{
+		MyLogicalDecodingSlot->active = false;
+	}
+	MyLogicalDecodingSlot = NULL;
+}
+
+/*
+ * Set the xmin required for catalog timetravel for the specific decoding slot.
+ */
+void
+IncreaseLogicalXminForSlot(XLogRecPtr lsn, TransactionId xmin)
+{
+	Assert(MyLogicalDecodingSlot != NULL);
+
+	SpinLockAcquire(&MyLogicalDecodingSlot->mutex);
+
+	/*
+	 * Only increase if the previous values have been applied, otherwise we
+	 * might never end up updating if the receiver acks too slowly.
+	 */
+	if (MyLogicalDecodingSlot->candidate_lsn == InvalidXLogRecPtr ||
+		(lsn == MyLogicalDecodingSlot->candidate_lsn &&
+		 !TransactionIdIsValid(MyLogicalDecodingSlot->candidate_xmin)))
+	{
+		MyLogicalDecodingSlot->candidate_lsn = lsn;
+		MyLogicalDecodingSlot->candidate_xmin = xmin;
+		elog(DEBUG1, "got new xmin %u at %X/%X", xmin,
+			 (uint32) (lsn >> 32), (uint32) lsn);
+	}
+	SpinLockRelease(&MyLogicalDecodingSlot->mutex);
+}
+
+void
+IncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart_lsn)
+{
+	Assert(MyLogicalDecodingSlot != NULL);
+	Assert(restart_lsn != InvalidXLogRecPtr);
+	Assert(current_lsn != InvalidXLogRecPtr);
+
+	SpinLockAcquire(&MyLogicalDecodingSlot->mutex);
+
+	/*
+	 * Only increase if the previous values have been applied, otherwise we
+	 * might never end up updating if the receiver acks too slowly. A missed
+	 * value here will just cause some extra effort after reconnecting.
+	 */
+	if (MyLogicalDecodingSlot->candidate_lsn == InvalidXLogRecPtr ||
+		(current_lsn == MyLogicalDecodingSlot->candidate_lsn &&
+	 MyLogicalDecodingSlot->candidate_restart_decoding == InvalidXLogRecPtr))
+	{
+		MyLogicalDecodingSlot->candidate_lsn = current_lsn;
+		MyLogicalDecodingSlot->candidate_restart_decoding = restart_lsn;
+
+		elog(DEBUG1, "got new restart lsn %X/%X at %X/%X",
+			 (uint32) (restart_lsn >> 32), (uint32) restart_lsn,
+			 (uint32) (current_lsn >> 32), (uint32) current_lsn);
+
+	}
+	SpinLockRelease(&MyLogicalDecodingSlot->mutex);
+}
+
+void
+LogicalConfirmReceivedLocation(XLogRecPtr lsn)
+{
+	Assert(lsn != InvalidXLogRecPtr);
+
+	/* Do an unlocked check for candidate_lsn first. */
+	if (MyLogicalDecodingSlot->candidate_lsn != InvalidXLogRecPtr)
+	{
+		bool		updated_xmin = false;
+		bool		updated_restart = false;
+
+		/* use volatile pointer to prevent code rearrangement */
+		volatile LogicalDecodingSlot *slot = MyLogicalDecodingSlot;
+
+		SpinLockAcquire(&slot->mutex);
+
+		slot->confirmed_flush = lsn;
+
+		/* if were past the location required for bumping xmin, do so */
+		if (slot->candidate_lsn != InvalidXLogRecPtr &&
+			slot->candidate_lsn < lsn)
+		{
+			/*
+			 * We have to write the changed xmin to disk *before* we change
+			 * the in-memory value, otherwise after a crash we wouldn't know
+			 * that some catalog tuples might have been removed already.
+			 *
+			 * Ensure that by first writing to ->xmin and only update
+			 * ->effective_xmin once the new state is fsynced to disk. After a
+			 * crash ->effective_xmin is set to ->xmin.
+			 */
+			if (TransactionIdIsValid(slot->candidate_xmin) &&
+				slot->xmin != slot->candidate_xmin)
+			{
+				slot->xmin = slot->candidate_xmin;
+				updated_xmin = true;
+			}
+
+			if (slot->candidate_restart_decoding != InvalidXLogRecPtr &&
+				slot->restart_decoding != slot->candidate_restart_decoding)
+			{
+				slot->restart_decoding = slot->candidate_restart_decoding;
+				updated_restart = true;
+			}
+
+			slot->candidate_lsn = InvalidXLogRecPtr;
+			slot->candidate_xmin = InvalidTransactionId;
+			slot->candidate_restart_decoding = InvalidXLogRecPtr;
+		}
+
+		SpinLockRelease(&slot->mutex);
+
+		/* first write new xmin to disk, so we know whats up after a crash */
+		if (updated_xmin || updated_restart)
+			/* cast away volatile, thats ok. */
+			SaveLogicalSlot((LogicalDecodingSlot *) slot);
+
+		/*
+		 * now the new xmin is safely on disk, we can let the global value
+		 * advance
+		 */
+		if (updated_xmin)
+		{
+			SpinLockAcquire(&slot->mutex);
+			slot->effective_xmin = slot->xmin;
+			SpinLockRelease(&slot->mutex);
+
+			ComputeLogicalXmin();
+		}
+	}
+	else
+	{
+		volatile LogicalDecodingSlot *slot = MyLogicalDecodingSlot;
+
+		SpinLockAcquire(&slot->mutex);
+		slot->confirmed_flush = lsn;
+		SpinLockRelease(&slot->mutex);
+	}
+}
+
+/*
+ * Compute the xmin between all of the decoding slots and store it in
+ * WalSndCtlData.
+ */
+void
+ComputeLogicalXmin(void)
+{
+	int			i;
+	TransactionId xmin = InvalidTransactionId;
+	LogicalDecodingSlot *slot;
+
+	Assert(LogicalDecodingCtl);
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+
+	for (i = 0; i < max_logical_slots; i++)
+	{
+		slot = &LogicalDecodingCtl->logical_slots[i];
+
+		SpinLockAcquire(&slot->mutex);
+		if (slot->in_use &&
+			TransactionIdIsValid(slot->effective_xmin) && (
+											   !TransactionIdIsValid(xmin) ||
+						   TransactionIdPrecedes(slot->effective_xmin, xmin))
+			)
+		{
+			xmin = slot->effective_xmin;
+		}
+		SpinLockRelease(&slot->mutex);
+	}
+	LogicalDecodingCtl->xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+
+	elog(DEBUG1, "computed new global xmin for decoding: %u", xmin);
+}
+
+/*
+ * Make sure the current settings & environment are capable of doing logical
+ * replication.
+ */
+void
+CheckLogicalReplicationRequirements(void)
+{
+	if (wal_level < WAL_LEVEL_LOGICAL)
+		ereport(ERROR,
+		/* XXX invent class 51 for code 51028? */
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("logical replication requires wal_level=logical")));
+
+	if (MyDatabaseId == InvalidOid)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("logical replication requires to be connected to a database")));
+
+	if (max_logical_slots == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 (errmsg("logical replication requires needs max_logical_slots > 0"))));
+}
+
+/*
+ * Search for a free slot, mark it as used and acquire a valid xmin horizon
+ * value.
+ */
+void
+LogicalDecodingAcquireFreeSlot(const char *name, const char *plugin)
+{
+	LogicalDecodingSlot *slot;
+	bool		name_in_use;
+	int			i;
+
+	Assert(!MyLogicalDecodingSlot);
+
+	CheckLogicalReplicationRequirements();
+
+	LWLockAcquire(LogicalReplicationCtlLock, LW_EXCLUSIVE);
+
+	/* First, make sure the requested name is not in use. */
+
+	name_in_use = false;
+	for (i = 0; i < max_logical_slots && !name_in_use; i++)
+	{
+		LogicalDecodingSlot *s = &LogicalDecodingCtl->logical_slots[i];
+
+		SpinLockAcquire(&s->mutex);
+		if (s->in_use && strcmp(name, NameStr(s->name)) == 0)
+			name_in_use = true;
+		SpinLockRelease(&s->mutex);
+	}
+
+	if (name_in_use)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+			  errmsg("There already is a logical slot named \"%s\"", name)));
+
+	/* Find the first available (not in_use (=> not active)) slot. */
+
+	slot = NULL;
+	for (i = 0; i < max_logical_slots; i++)
+	{
+		LogicalDecodingSlot *s = &LogicalDecodingCtl->logical_slots[i];
+
+		SpinLockAcquire(&s->mutex);
+		if (!s->in_use)
+		{
+			Assert(!s->active);
+			/* NOT releasing the lock yet */
+			slot = s;
+			break;
+		}
+		SpinLockRelease(&s->mutex);
+	}
+
+	LWLockRelease(LogicalReplicationCtlLock);
+
+	if (!slot)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("couldn't find free logical slot. free one or increase max_logical_slots")));
+
+	MyLogicalDecodingSlot = slot;
+
+	/* Lets start with enough information if we can */
+	if (!RecoveryInProgress())
+		slot->restart_decoding = LogStandbySnapshot();
+	else
+		slot->restart_decoding = GetRedoRecPtr();
+
+	slot->in_use = true;
+	slot->active = true;
+	slot->database = MyDatabaseId;
+	/* XXX: do we want to use truncate identifier instead? */
+	strncpy(NameStr(slot->plugin), plugin, NAMEDATALEN);
+	NameStr(slot->plugin)[NAMEDATALEN - 1] = '\0';
+	strncpy(NameStr(slot->name), name, NAMEDATALEN);
+	NameStr(slot->name)[NAMEDATALEN - 1] = '\0';
+
+	/* Arrange to clean up at exit/error */
+	on_shmem_exit(LogicalSlotKill, 0);
+
+	/* release slot so it can be examined by others */
+	SpinLockRelease(&slot->mutex);
+
+	/* XXX: verify that the specified plugin is valid */
+
+	/*
+	 * Acquire the current global xmin value and directly set the logical xmin
+	 * before releasing the lock if necessary. We do this so wal decoding is
+	 * guaranteed to have all catalog rows produced by xacts with an xid >
+	 * walsnd->xmin available.
+	 *
+	 * We can't use ComputeLogicalXmin here as that acquires ProcArrayLock
+	 * separately which would open a short window for the global xmin to
+	 * advance above walsnd->xmin.
+	 */
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	slot->effective_xmin = GetOldestXmin(true, true, true, true);
+	slot->xmin = slot->effective_xmin;
+
+	if (!TransactionIdIsValid(LogicalDecodingCtl->xmin) ||
+		NormalTransactionIdPrecedes(slot->effective_xmin, LogicalDecodingCtl->xmin))
+		LogicalDecodingCtl->xmin = slot->effective_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	Assert(slot->effective_xmin <= GetOldestXmin(true, true, true, false));
+
+	LWLockAcquire(LogicalReplicationCtlLock, LW_EXCLUSIVE);
+	CreateLogicalSlot(slot);
+	LWLockRelease(LogicalReplicationCtlLock);
+}
+
+/*
+ * Find an previously initiated slot and mark it as used again.
+ */
+void
+LogicalDecodingReAcquireSlot(const char *name)
+{
+	LogicalDecodingSlot *slot;
+	int			i;
+
+	CheckLogicalReplicationRequirements();
+
+	Assert(!MyLogicalDecodingSlot);
+
+	for (i = 0; i < max_logical_slots; i++)
+	{
+		slot = &LogicalDecodingCtl->logical_slots[i];
+
+		SpinLockAcquire(&slot->mutex);
+		if (slot->in_use && strcmp(name, NameStr(slot->name)) == 0)
+		{
+			MyLogicalDecodingSlot = slot;
+			/* NOT releasing the lock yet */
+			break;
+		}
+		SpinLockRelease(&slot->mutex);
+	}
+
+	if (!MyLogicalDecodingSlot)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("couldn't find logical slot \"%s\"", name)));
+
+	slot = MyLogicalDecodingSlot;
+
+	if (slot->active)
+	{
+		SpinLockRelease(&slot->mutex);
+		MyLogicalDecodingSlot = NULL;
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_IN_USE),
+				 errmsg("slot already active")));
+	}
+
+	slot->active = true;
+	/* now that we've marked it as active, we release our lock */
+	SpinLockRelease(&slot->mutex);
+
+	/* Don't let the user switch the database... */
+	if (slot->database != MyDatabaseId)
+	{
+		SpinLockAcquire(&slot->mutex);
+		slot->active = false;
+		SpinLockRelease(&slot->mutex);
+
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 (errmsg("START_LOGICAL_REPLICATION needs to be run in the same database as INIT_LOGICAL_REPLICATION"))));
+	}
+
+	/* Arrange to clean up at exit */
+	on_shmem_exit(LogicalSlotKill, 0);
+
+	SaveLogicalSlot(slot);
+}
+
+/*
+  * Temporarily remove a logical decoding slot, this or another backend can
+  * reacquire it later.
+ */
+void
+LogicalDecodingReleaseSlot(void)
+{
+	LogicalDecodingSlot *slot;
+
+	CheckLogicalReplicationRequirements();
+
+	slot = MyLogicalDecodingSlot;
+
+	Assert(slot != NULL && slot->active);
+
+	SpinLockAcquire(&slot->mutex);
+	slot->active = false;
+	SpinLockRelease(&slot->mutex);
+
+	MyLogicalDecodingSlot = NULL;
+
+	SaveLogicalSlot(slot);
+
+	cancel_shmem_exit(LogicalSlotKill, 0);
+}
+
+/*
+ * Permanently remove a logical decoding slot.
+ */
+void
+LogicalDecodingFreeSlot(const char *name)
+{
+	LogicalDecodingSlot *slot = NULL;
+	int			i;
+
+	CheckLogicalReplicationRequirements();
+
+	for (i = 0; i < max_logical_slots; i++)
+	{
+		slot = &LogicalDecodingCtl->logical_slots[i];
+
+		SpinLockAcquire(&slot->mutex);
+		if (slot->in_use && strcmp(name, NameStr(slot->name)) == 0)
+		{
+			/* NOT releasing the lock yet */
+			break;
+		}
+		SpinLockRelease(&slot->mutex);
+		slot = NULL;
+	}
+
+	if (!slot)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("couldn't find logical slot \"%s\"", name)));
+
+	if (slot->active)
+	{
+		SpinLockRelease(&slot->mutex);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_IN_USE),
+				 errmsg("cannot free active logical slot \"%s\"", name)));
+	}
+
+	/*
+	 * Mark it as as active, so nobody can claim this slot while we are
+	 * working on it. We don't want to hold the spinlock while doing stuff
+	 * like fsyncing the state file to disk.
+	 */
+	slot->active = true;
+
+	SpinLockRelease(&slot->mutex);
+
+	/*
+	 * Start critical section, we can't to be interrupted while on-disk/memory
+	 * state aren't coherent.
+	 */
+	START_CRIT_SECTION();
+
+	DeleteLogicalSlot(slot);
+
+	/* ok, everything gone, after a crash we now wouldn't restore this slot */
+	SpinLockAcquire(&slot->mutex);
+	slot->active = false;
+	slot->in_use = false;
+	SpinLockRelease(&slot->mutex);
+
+	END_CRIT_SECTION();
+
+	/* slot is dead and doesn't nail the xmin anymore */
+	ComputeLogicalXmin();
+}
+
+/*
+ * Load replication state from disk into memory at server startup.
+ */
+void
+StartupLogicalReplication(XLogRecPtr checkPointRedo)
+{
+	DIR		   *logical_dir;
+	struct dirent *logical_de;
+
+	ereport(DEBUG1,
+			(errmsg("starting up logical decoding from %X/%X",
+					(uint32) (checkPointRedo >> 32), (uint32) checkPointRedo)));
+
+	/* restore all slots */
+	logical_dir = AllocateDir("pg_llog");
+	while ((logical_de = ReadDir(logical_dir, "pg_llog")) != NULL)
+	{
+		if (strcmp(logical_de->d_name, ".") == 0 ||
+			strcmp(logical_de->d_name, "..") == 0)
+			continue;
+
+		/* one of our own directories */
+		if (strcmp(logical_de->d_name, "snapshots") == 0)
+			continue;
+
+		/* we crashed while a slot was being setup or deleted, clean up */
+		if (strcmp(logical_de->d_name, "new") == 0 ||
+			strcmp(logical_de->d_name, "old") == 0)
+		{
+			char		path[MAXPGPATH];
+
+			sprintf(path, "pg_llog/%s", logical_de->d_name);
+
+			if (!rmtree(path, true))
+			{
+				FreeDir(logical_dir);
+				ereport(PANIC,
+						(errcode_for_file_access(),
+						 errmsg("could not remove directory \"%s\": %m",
+								path)));
+			}
+			continue;
+		}
+
+		RestoreLogicalSlot(logical_de->d_name);
+	}
+	FreeDir(logical_dir);
+
+	if (max_logical_slots <= 0)
+		return;
+
+	/* Now that we have recovered all the data, compute logical xmin */
+	ComputeLogicalXmin();
+
+	ReorderBufferStartup();
+}
+
+/* ----
+ * Manipulation of ondisk state of logical slots
+ * ----
+ */
+static void
+CreateLogicalSlot(LogicalDecodingSlot *slot)
+{
+	char		tmppath[MAXPGPATH];
+	char		path[MAXPGPATH];
+
+	START_CRIT_SECTION();
+
+	sprintf(tmppath, "pg_llog/new");
+	sprintf(path, "pg_llog/%s", NameStr(slot->name));
+
+	if (mkdir(tmppath, S_IRWXU) < 0)
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not create directory \"%s\": %m",
+						tmppath)));
+
+	fsync_fname(tmppath, true);
+
+	SaveLogicalSlotInternal(slot, tmppath);
+
+	if (rename(tmppath, path) != 0)
+	{
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+						tmppath, path)));
+	}
+
+	fsync_fname(path, true);
+
+	END_CRIT_SECTION();
+}
+
+static void
+SaveLogicalSlot(LogicalDecodingSlot *slot)
+{
+	char		path[MAXPGPATH];
+
+	sprintf(path, "pg_llog/%s", NameStr(slot->name));
+	SaveLogicalSlotInternal(slot, path);
+}
+
+/*
+ * Shared functionality between saving and creating a logical slot.
+ */
+static void
+SaveLogicalSlotInternal(LogicalDecodingSlot *slot, const char *dir)
+{
+	char		tmppath[MAXPGPATH];
+	char		path[MAXPGPATH];
+	int			fd;
+	LogicalDecodingSlotOnDisk cp;
+
+	/* silence valgrind :( */
+	memset(&cp, 0, sizeof(LogicalDecodingSlotOnDisk));
+
+	sprintf(tmppath, "%s/state.tmp", dir);
+	sprintf(path, "%s/state", dir);
+
+	START_CRIT_SECTION();
+
+	fd = OpenTransientFile(tmppath,
+						   O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
+						   S_IRUSR | S_IWUSR);
+	if (fd < 0)
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not create logical checkpoint file \"%s\": %m",
+						tmppath)));
+
+	cp.magic = LOGICAL_MAGIC;
+
+	SpinLockAcquire(&slot->mutex);
+
+	cp.slot.xmin = slot->xmin;
+	cp.slot.effective_xmin = slot->effective_xmin;
+
+	strcpy(NameStr(cp.slot.name), NameStr(slot->name));
+	strcpy(NameStr(cp.slot.plugin), NameStr(slot->plugin));
+
+	cp.slot.database = slot->database;
+	cp.slot.confirmed_flush = slot->confirmed_flush;
+	cp.slot.restart_decoding = slot->restart_decoding;
+	cp.slot.candidate_lsn = InvalidXLogRecPtr;
+	cp.slot.candidate_xmin = InvalidTransactionId;
+	cp.slot.candidate_restart_decoding = InvalidXLogRecPtr;
+	cp.slot.in_use = slot->in_use;
+	cp.slot.active = false;
+
+	SpinLockRelease(&slot->mutex);
+
+	if ((write(fd, &cp, sizeof(cp))) != sizeof(cp))
+	{
+		CloseTransientFile(fd);
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not write logical checkpoint file \"%s\": %m",
+						tmppath)));
+	}
+
+	/* fsync the file */
+	if (pg_fsync(fd) != 0)
+	{
+		CloseTransientFile(fd);
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync logical checkpoint \"%s\": %m",
+						tmppath)));
+	}
+
+	CloseTransientFile(fd);
+
+	/* rename to permanent file, fsync file and directory */
+	if (rename(tmppath, path) != 0)
+	{
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+						tmppath, path)));
+	}
+
+	fsync_fname((char *) dir, true);
+	fsync_fname(path, false);
+
+	END_CRIT_SECTION();
+}
+
+
+static void
+DeleteLogicalSlot(LogicalDecodingSlot *slot)
+{
+	char		path[MAXPGPATH];
+	char		tmppath[] = "pg_llog/old";
+
+	START_CRIT_SECTION();
+
+	sprintf(path, "pg_llog/%s", NameStr(slot->name));
+
+	if (rename(path, tmppath) != 0)
+	{
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+						path, tmppath)));
+	}
+
+	/* make sure no partial state is visible after a crash */
+	fsync_fname(tmppath, true);
+	fsync_fname("pg_llog", true);
+
+	if (!rmtree(tmppath, true))
+	{
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not remove directory \"%s\": %m",
+						tmppath)));
+	}
+
+	END_CRIT_SECTION();
+}
+
+/*
+ * Load a single ondisk slot into memory.
+ */
+static void
+RestoreLogicalSlot(const char *name)
+{
+	LogicalDecodingSlotOnDisk cp;
+	int			i;
+	char		path[MAXPGPATH];
+	int			fd;
+	bool		restored = false;
+	int			readBytes;
+
+	START_CRIT_SECTION();
+
+	/* delete temp file if it exists */
+	sprintf(path, "pg_llog/%s/state.tmp", name);
+	if (unlink(path) < 0 && errno != ENOENT)
+		ereport(PANIC, (errmsg("failed while unlinking %s", path)));
+
+	sprintf(path, "pg_llog/%s/state", name);
+
+	elog(DEBUG1, "restoring logical slot from %s", path);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+
+	/*
+	 * We do not need to handle this as we are rename()ing the directory into
+	 * place only after we fsync()ed the state file.
+	 */
+	if (fd < 0)
+		ereport(PANIC, (errmsg("could not open state file %s", path)));
+
+	readBytes = read(fd, &cp, sizeof(cp));
+	if (readBytes != sizeof(cp))
+	{
+		int			saved_errno = errno;
+
+		CloseTransientFile(fd);
+		errno = saved_errno;
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not read logical checkpoint file \"%s\": %m, read %d of %zu",
+						path, readBytes, sizeof(cp))));
+	}
+
+	CloseTransientFile(fd);
+
+	if (cp.magic != LOGICAL_MAGIC)
+		ereport(PANIC, (errmsg("Logical checkpoint has wrong magic %u instead of %u",
+							   cp.magic, LOGICAL_MAGIC)));
+
+	/* nothing can be active yet, don't lock anything */
+	for (i = 0; i < max_logical_slots; i++)
+	{
+		LogicalDecodingSlot *slot;
+
+		slot = &LogicalDecodingCtl->logical_slots[i];
+
+		if (slot->in_use)
+			continue;
+
+		slot->xmin = cp.slot.xmin;
+		/* XXX: after a crash, always use xmin, not effective_xmin */
+		slot->effective_xmin = cp.slot.xmin;
+		strcpy(NameStr(slot->name), NameStr(cp.slot.name));
+		strcpy(NameStr(slot->plugin), NameStr(cp.slot.plugin));
+		slot->database = cp.slot.database;
+		slot->restart_decoding = cp.slot.restart_decoding;
+		slot->confirmed_flush = cp.slot.confirmed_flush;
+		slot->candidate_lsn = InvalidXLogRecPtr;
+		slot->candidate_xmin = InvalidTransactionId;
+		slot->candidate_restart_decoding = InvalidXLogRecPtr;
+		slot->in_use = true;
+		slot->active = false;
+		restored = true;
+
+		/*
+		 * FIXME: Do some validation here.
+		 */
+		break;
+	}
+
+	if (!restored)
+		ereport(PANIC,
+				(errmsg("too many logical slots active before shutdown, increase max_logical_slots and try again")));
+
+	END_CRIT_SECTION();
+}
+
+
+static void
+LoadOutputPlugin(OutputPluginCallbacks *callbacks, char *plugin)
+{
+	/* lookup symbols in the shared libarary */
+
+	/* optional */
+	callbacks->init_cb = (LogicalDecodeInitCB)
+		load_external_function(plugin, "pg_decode_init", false, NULL);
+
+	/* required */
+	callbacks->begin_cb = (LogicalDecodeBeginCB)
+		load_external_function(plugin, "pg_decode_begin_txn", true, NULL);
+
+	/* required */
+	callbacks->change_cb = (LogicalDecodeChangeCB)
+		load_external_function(plugin, "pg_decode_change", true, NULL);
+
+	/* required */
+	callbacks->commit_cb = (LogicalDecodeCommitCB)
+		load_external_function(plugin, "pg_decode_commit_txn", true, NULL);
+
+	/* optional */
+	callbacks->cleanup_cb = (LogicalDecodeCleanupCB)
+		load_external_function(plugin, "pg_decode_clean", false, NULL);
+}
+
+/*
+ * Context management functions to make coordination between the different
+ * logical decoding pieces.
+ */
+
+/*
+ * Callbacks for ReorderBuffer which add in some more information and then call
+ * output_plugin.h plugins.
+ */
+static void
+begin_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
+{
+	LogicalDecodingContext *ctx = cache->private_data;
+
+	ctx->callbacks.begin_cb(ctx, txn);
+}
+
+static void
+commit_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, XLogRecPtr commit_lsn)
+{
+	LogicalDecodingContext *ctx = cache->private_data;
+
+	ctx->callbacks.commit_cb(ctx, txn, commit_lsn);
+}
+
+static void
+change_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+			   Relation relation, ReorderBufferChange *change)
+{
+	LogicalDecodingContext *ctx = cache->private_data;
+
+	ctx->callbacks.change_cb(ctx, txn, relation, change);
+}
+
+LogicalDecodingContext *
+CreateLogicalDecodingContext(LogicalDecodingSlot *slot,
+							 bool is_init,
+							 XLogRecPtr	start_lsn,
+							 List *output_plugin_options,
+							 XLogPageReadCB read_page,
+						 LogicalOutputPluginWriterPrepareWrite prepare_write,
+							 LogicalOutputPluginWriterWrite do_write)
+{
+	MemoryContext context;
+	MemoryContext old_context;
+	TransactionId xmin_horizon;
+	LogicalDecodingContext *ctx;
+
+	context = AllocSetContextCreate(TopMemoryContext,
+									"ReorderBuffer",
+									ALLOCSET_DEFAULT_MINSIZE,
+									ALLOCSET_DEFAULT_INITSIZE,
+									ALLOCSET_DEFAULT_MAXSIZE);
+	old_context = MemoryContextSwitchTo(context);
+	ctx = palloc0(sizeof(LogicalDecodingContext));
+
+
+	/* load output plugins first, so we detect a wrong output plugin early */
+	LoadOutputPlugin(&ctx->callbacks, NameStr(slot->plugin));
+
+	if (is_init && start_lsn != InvalidXLogRecPtr)
+		elog(ERROR, "cannot initially start at a specified lsn");
+
+	if (is_init)
+		xmin_horizon = slot->xmin;
+	else
+		xmin_horizon = InvalidTransactionId;
+
+	ctx->slot = slot;
+
+	ctx->reader = XLogReaderAllocate(read_page, ctx);
+	ctx->reader->private_data = ctx;
+
+	ctx->reorder = ReorderBufferAllocate();
+	ctx->snapshot_builder =
+		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn);
+
+	ctx->reorder->private_data = ctx;
+
+	ctx->reorder->begin = begin_txn_wrapper;
+	ctx->reorder->apply_change = change_wrapper;
+	ctx->reorder->commit = commit_txn_wrapper;
+
+	ctx->out = makeStringInfo();
+	ctx->prepare_write = prepare_write;
+	ctx->write = do_write;
+
+	ctx->output_plugin_options = output_plugin_options;
+
+	if (is_init)
+		ctx->stop_after_consistent = true;
+	else
+		ctx->stop_after_consistent = false;
+
+	/* call output plugin initialization callback */
+	if (ctx->callbacks.init_cb != NULL)
+		ctx->callbacks.init_cb(ctx, is_init);
+
+	MemoryContextSwitchTo(old_context);
+
+	return ctx;
+}
+
+void
+FreeLogicalDecodingContext(LogicalDecodingContext *ctx)
+{
+	if (ctx->callbacks.cleanup_cb != NULL)
+		ctx->callbacks.cleanup_cb(ctx);
+}
+
+
+/* has the initial snapshot found a consistent state? */
+bool
+LogicalDecodingContextReady(LogicalDecodingContext *ctx)
+{
+	return SnapBuildCurrentState(ctx->snapshot_builder) == SNAPBUILD_CONSISTENT;
+}
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
new file mode 100644
index 0000000..9837a95
--- /dev/null
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -0,0 +1,361 @@
+/*-------------------------------------------------------------------------
+ *
+ * logicalfuncs.c
+ *
+ *	   Support functions for using xlog decoding
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/replication/logicalfuncs.c
+ *
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "fmgr.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "storage/fd.h"
+
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+
+Datum		init_logical_replication(PG_FUNCTION_ARGS);
+Datum		stop_logical_replication(PG_FUNCTION_ARGS);
+Datum		pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS);
+
+/* FIXME: duplicate code with pg_xlogdump, similar to walsender.c */
+static void
+XLogRead(char *buf, XLogRecPtr startptr, Size count)
+{
+	char	   *p;
+	XLogRecPtr	recptr;
+	Size		nbytes;
+
+	static int	sendFile = -1;
+	static XLogSegNo sendSegNo = 0;
+	static uint32 sendOff = 0;
+
+	p = buf;
+	recptr = startptr;
+	nbytes = count;
+
+	while (nbytes > 0)
+	{
+		uint32		startoff;
+		int			segbytes;
+		int			readbytes;
+
+		startoff = recptr % XLogSegSize;
+
+		if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))
+		{
+			char		path[MAXPGPATH];
+
+			/* Switch to another logfile segment */
+			if (sendFile >= 0)
+				close(sendFile);
+
+			XLByteToSeg(recptr, sendSegNo);
+
+			XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+			sendFile = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);
+
+			if (sendFile < 0)
+			{
+				if (errno == ENOENT)
+					ereport(ERROR,
+							(errcode_for_file_access(),
+							 errmsg("requested WAL segment %s has already been removed",
+									path)));
+				else
+					ereport(ERROR,
+							(errcode_for_file_access(),
+							 errmsg("could not open file \"%s\": %m",
+									path)));
+			}
+			sendOff = 0;
+		}
+
+		/* Need to seek in the file? */
+		if (sendOff != startoff)
+		{
+			if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0)
+			{
+				char		path[MAXPGPATH];
+
+				XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+				ereport(ERROR,
+						(errcode_for_file_access(),
+				  errmsg("could not seek in log segment %s to offset %u: %m",
+						 path, startoff)));
+			}
+			sendOff = startoff;
+		}
+
+		/* How many bytes are within this segment? */
+		if (nbytes > (XLogSegSize - startoff))
+			segbytes = XLogSegSize - startoff;
+		else
+			segbytes = nbytes;
+
+		readbytes = read(sendFile, p, segbytes);
+		if (readbytes <= 0)
+		{
+			char		path[MAXPGPATH];
+
+			XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read from log segment %s, offset %u, length %lu: %m",
+							path, sendOff, (unsigned long) segbytes)));
+		}
+
+		/* Update state for read */
+		recptr += readbytes;
+
+		sendOff += readbytes;
+		nbytes -= readbytes;
+		p += readbytes;
+	}
+}
+
+int
+logical_read_local_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr,
+	int reqLen, XLogRecPtr targetRecPtr, char *cur_page, TimeLineID *pageTLI)
+{
+	XLogRecPtr	flushptr,
+				loc;
+	int			count;
+
+	loc = targetPagePtr + reqLen;
+	while (1)
+	{
+		flushptr = GetFlushRecPtr();
+		if (loc <= flushptr)
+			break;
+		pg_usleep(1000L);
+	}
+
+	/* more than one block available */
+	if (targetPagePtr + XLOG_BLCKSZ <= flushptr)
+		count = XLOG_BLCKSZ;
+	/* not enough data there */
+	else if (targetPagePtr + reqLen > flushptr)
+		return -1;
+	/* part of the page available */
+	else
+		count = flushptr - targetPagePtr;
+
+	/* FIXME: more sensible/efficient implementation */
+	XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+
+	return count;
+}
+
+static void
+DummyWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+	elog(ERROR, "init_logical_replication shouldn't be writing anything");
+}
+
+Datum
+init_logical_replication(PG_FUNCTION_ARGS)
+{
+	Name		name = PG_GETARG_NAME(0);
+	Name		plugin = PG_GETARG_NAME(1);
+
+	char		xpos[MAXFNAMELEN];
+
+	TupleDesc	tupdesc;
+	HeapTuple	tuple;
+	Datum		result;
+	Datum		values[2];
+	bool		nulls[2];
+	LogicalDecodingContext *ctx = NULL;
+
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	/* Acquire a logical replication slot */
+	CheckLogicalReplicationRequirements();
+	LogicalDecodingAcquireFreeSlot(NameStr(*name), NameStr(*plugin));
+
+	/* make sure we don't end up with an unreleased slot */
+	PG_TRY();
+	{
+		XLogRecPtr	startptr;
+
+		/*
+		 * Use the same initial_snapshot_reader, but with our own read_page
+		 * callback that does not depend on walsender.
+		 */
+		ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, true,
+										   InvalidXLogRecPtr, NIL,
+										   logical_read_local_xlog_page,
+										   DummyWrite, DummyWrite);
+
+		/* setup from where to read xlog */
+		startptr = ctx->slot->restart_decoding;
+
+		/* Wait for a consistent starting point */
+		for (;;)
+		{
+			XLogRecord *record;
+			XLogRecordBuffer buf;
+			char	   *err = NULL;
+
+			/* the read_page callback waits for new WAL */
+			record = XLogReadRecord(ctx->reader, startptr, &err);
+			if (err)
+				elog(ERROR, "%s", err);
+
+			Assert(record);
+
+			startptr = InvalidXLogRecPtr;
+
+			buf.origptr = ctx->reader->ReadRecPtr;
+			buf.record = *record;
+			buf.record_data = XLogRecGetData(record);
+			DecodeRecordIntoReorderBuffer(ctx, &buf);
+
+			/* only continue till we found a consistent spot */
+			if (LogicalDecodingContextReady(ctx))
+				break;
+		}
+
+		/* Extract the values we want */
+		MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+		snprintf(xpos, sizeof(xpos), "%X/%X",
+				 (uint32) (MyLogicalDecodingSlot->confirmed_flush >> 32),
+				 (uint32) MyLogicalDecodingSlot->confirmed_flush);
+	}
+	PG_CATCH();
+	{
+		LogicalDecodingReleaseSlot();
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	values[0] = CStringGetTextDatum(NameStr(MyLogicalDecodingSlot->name));
+	values[1] = CStringGetTextDatum(xpos);
+
+	memset(nulls, 0, sizeof(nulls));
+
+	tuple = heap_form_tuple(tupdesc, values, nulls);
+	result = HeapTupleGetDatum(tuple);
+
+	LogicalDecodingReleaseSlot();
+
+	PG_RETURN_DATUM(result);
+}
+
+Datum
+stop_logical_replication(PG_FUNCTION_ARGS)
+{
+	Name		name = PG_GETARG_NAME(0);
+
+	CheckLogicalReplicationRequirements();
+	LogicalDecodingFreeSlot(NameStr(*name));
+
+	PG_RETURN_INT32(0);
+}
+
+/*
+ * Return one row for each logical replication slot currently in use.
+ */
+
+Datum
+pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS 6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int			i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_logical_slots; i++)
+	{
+		LogicalDecodingSlot *slot = &LogicalDecodingCtl->logical_slots[i];
+		Datum		values[PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS];
+		bool		nulls[PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS];
+		char		location[MAXFNAMELEN];
+		const char *slot_name;
+		const char *plugin;
+		TransactionId xmin;
+		XLogRecPtr	last_req;
+		bool		active;
+		Oid			database;
+
+		SpinLockAcquire(&slot->mutex);
+		if (!slot->in_use)
+		{
+			SpinLockRelease(&slot->mutex);
+			continue;
+		}
+		else
+		{
+			xmin = slot->xmin;
+			active = slot->active;
+			database = slot->database;
+			last_req = slot->restart_decoding;
+			slot_name = pstrdup(NameStr(slot->name));
+			plugin = pstrdup(NameStr(slot->plugin));
+		}
+		SpinLockRelease(&slot->mutex);
+
+		memset(nulls, 0, sizeof(nulls));
+
+		snprintf(location, sizeof(location), "%X/%X",
+				 (uint32) (last_req >> 32), (uint32) last_req);
+
+		values[0] = CStringGetTextDatum(slot_name);
+		values[1] = CStringGetTextDatum(plugin);
+		values[2] = database;
+		values[3] = BoolGetDatum(active);
+		values[4] = TransactionIdGetDatum(xmin);
+		values[5] = CStringGetTextDatum(location);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
new file mode 100644
index 0000000..6d2866d
--- /dev/null
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -0,0 +1,2449 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer.c
+ *
+ * PostgreSQL logical replay "cache" management
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/replication/reorderbuffer.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "access/heapam.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlog_internal.h"
+
+#include "catalog/catalog.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_control.h"
+
+#include "common/relpath.h"
+
+#include "lib/binaryheap.h"
+
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h" /* just for SnapBuildSnapDecRefcount */
+#include "replication/logical.h"
+
+#include "storage/bufmgr.h"
+#include "storage/fd.h"
+#include "storage/sinval.h"
+
+#include "utils/builtins.h"
+#include "utils/combocid.h"
+#include "utils/memutils.h"
+#include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tqual.h"
+#include "utils/syscache.h"
+
+/*
+ * For efficiency and simplicity reasons we want to keep Snapshots, CommandIds
+ * and ComboCids in the same list with the user visible INSERT/UPDATE/DELETE
+ * changes. We don't want to leak those internal values to external users
+ * though (they would just use switch()...default:) because that would make it
+ * harder to add to new user visible values.
+ *
+ * This needs to be synchronized with ReorderBufferChangeType! Adjust the
+ * StaticAssertExpr's in ReorderBufferAllocate if you add anything!
+ */
+typedef enum
+{
+	REORDER_BUFFER_CHANGE_INTERNAL_INSERT,
+	REORDER_BUFFER_CHANGE_INTERNAL_UPDATE,
+	REORDER_BUFFER_CHANGE_INTERNAL_DELETE,
+	REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT,
+	REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
+	REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID
+} ReorderBufferChangeTypeInternal;
+
+
+/* entry for a hash table we use to map from xid to our transaction state */
+typedef struct ReorderBufferTXNByIdEnt
+{
+	TransactionId xid;
+	ReorderBufferTXN *txn;
+} ReorderBufferTXNByIdEnt;
+
+
+/* data structures for (relfilenode, ctid) => (cmin, cmax) mapping */
+typedef struct ReorderBufferTupleCidKey
+{
+	RelFileNode relnode;
+	ItemPointerData tid;
+} ReorderBufferTupleCidKey;
+
+typedef struct ReorderBufferTupleCidEnt
+{
+	ReorderBufferTupleCidKey key;
+	CommandId	cmin;
+	CommandId	cmax;
+	CommandId	combocid;		/* just for debugging */
+} ReorderBufferTupleCidEnt;
+
+
+/* k-way in-order change iteration support structures */
+typedef struct ReorderBufferIterTXNEntry
+{
+	XLogRecPtr	lsn;
+	ReorderBufferChange *change;
+	ReorderBufferTXN *txn;
+	int			fd;
+	XLogSegNo	segno;
+} ReorderBufferIterTXNEntry;
+
+typedef struct ReorderBufferIterTXNState
+{
+	binaryheap *heap;
+	Size		nr_txns;
+	dlist_head	old_change;
+	ReorderBufferIterTXNEntry entries[FLEXIBLE_ARRAY_MEMBER];
+} ReorderBufferIterTXNState;
+
+
+/* toast datastructures */
+typedef struct ReorderBufferToastEnt
+{
+	Oid			chunk_id;		/* toast_table.chunk_id */
+	int32		last_chunk_seq; /* toast_table.chunk_seq of the last chunk we
+								 * have seen */
+	Size		num_chunks;		/* number of chunks we've already seen */
+	Size		size;			/* combined size of chunks seen */
+	dlist_head	chunks;			/* linked list of chunks */
+	struct varlena *reconstructed;		/* reconstructed varlena now pointed
+										 * to in main tup */
+} ReorderBufferToastEnt;
+
+
+/* number of changes kept in memory, per transaction */
+const Size	max_memtries = 4096;
+
+/* Size of the slab caches used for frequently allocated objects */
+const Size	max_cached_changes = 4096 * 2;
+const Size	max_cached_tuplebufs = 1024;		/* ~8MB */
+const Size	max_cached_transactions = 512;
+
+
+/* ---------------------------------------
+ * primary reorderbuffer support routines
+ * ---------------------------------------
+ */
+static ReorderBufferTXN *ReorderBufferGetTXN(ReorderBuffer *buffer);
+static void ReorderBufferReturnTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static ReorderBufferTXN *ReorderBufferTXNByXid(ReorderBuffer *buffer,
+					  TransactionId xid, bool create, bool *is_new,
+					  XLogRecPtr lsn, bool create_as_top);
+
+static void AssertTXNLsnOrder(ReorderBuffer *buffer);
+
+/* ---------------------------------------
+ * support functions for lsn-order iterating over the ->changes of a
+ * transaction and its subtransactions
+ *
+ * used for iteration over the k-way heap merge of a transaction and its
+ * subtransactions
+ * ---------------------------------------
+ */
+static ReorderBufferIterTXNState *ReorderBufferIterTXNInit(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static ReorderBufferChange *
+			ReorderBufferIterTXNNext(ReorderBuffer *buffer, ReorderBufferIterTXNState *state);
+static void ReorderBufferIterTXNFinish(ReorderBuffer *buffer,
+						   ReorderBufferIterTXNState *state);
+static void ReorderBufferExecuteInvalidations(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+
+/*
+ * ---------------------------------------
+ * Disk serialization support functions
+ * ---------------------------------------
+ */
+static void ReorderBufferCheckSerializeTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static void ReorderBufferSerializeTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static void ReorderBufferSerializeChange(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+							 int fd, ReorderBufferChange *change);
+static Size ReorderBufferRestoreChanges(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+							int *fd, XLogSegNo *segno);
+static void ReorderBufferRestoreChange(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+						   char *change);
+static void ReorderBufferRestoreCleanup(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+
+static void ReorderBufferFreeSnap(ReorderBuffer *buffer, Snapshot snap);
+static Snapshot ReorderBufferCopySnap(ReorderBuffer *buffer, Snapshot orig_snap,
+					  ReorderBufferTXN *txn, CommandId cid);
+
+/* ---------------------------------------
+ * toast reassembly support
+ * ---------------------------------------
+ */
+/* Size of an EXTERNAL datum that contains a standard TOAST pointer */
+#define TOAST_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_external))
+
+/* Size of an indirect datum that contains a standard TOAST pointer */
+#define INDIRECT_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_indirect))
+
+static void ReorderBufferToastInitHash(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static void ReorderBufferToastReset(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static void ReorderBufferToastReplace(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+						  Relation relation, ReorderBufferChange *change);
+static void ReorderBufferToastAppendChunk(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+							  Relation relation, ReorderBufferChange *change);
+
+
+/*
+ * Allocate a new ReorderBuffer
+ */
+ReorderBuffer *
+ReorderBufferAllocate(void)
+{
+	ReorderBuffer *buffer;
+	HASHCTL		hash_ctl;
+	MemoryContext new_ctx;
+
+	StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_INSERT == (int) REORDER_BUFFER_CHANGE_INSERT, "out of sync enums");
+	StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_UPDATE == (int) REORDER_BUFFER_CHANGE_UPDATE, "out of sync enums");
+	StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_DELETE == (int) REORDER_BUFFER_CHANGE_DELETE, "out of sync enums");
+
+	new_ctx = AllocSetContextCreate(TopMemoryContext,
+									"ReorderBuffer",
+									ALLOCSET_DEFAULT_MINSIZE,
+									ALLOCSET_DEFAULT_INITSIZE,
+									ALLOCSET_DEFAULT_MAXSIZE);
+
+	buffer = (ReorderBuffer *) MemoryContextAlloc(new_ctx, sizeof(ReorderBuffer));
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+
+	buffer->context = new_ctx;
+
+	hash_ctl.keysize = sizeof(TransactionId);
+	hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt);
+	hash_ctl.hash = tag_hash;
+	hash_ctl.hcxt = buffer->context;
+
+	buffer->by_txn = hash_create("ReorderBufferByXid", 1000, &hash_ctl,
+								 HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+
+	buffer->by_txn_last_xid = InvalidTransactionId;
+	buffer->by_txn_last_txn = NULL;
+
+	buffer->nr_cached_transactions = 0;
+	buffer->nr_cached_changes = 0;
+	buffer->nr_cached_tuplebufs = 0;
+
+	buffer->outbuf = NULL;
+	buffer->outbufsize = 0;
+
+	dlist_init(&buffer->toplevel_by_lsn);
+	dlist_init(&buffer->cached_transactions);
+	dlist_init(&buffer->cached_changes);
+	slist_init(&buffer->cached_tuplebufs);
+
+	return buffer;
+}
+
+/*
+ * Free a ReorderBuffer
+ */
+void
+ReorderBufferFree(ReorderBuffer *buffer)
+{
+	/* FIXME: check for in-progress transactions */
+	/* FIXME: clean up cached transaction */
+	/* FIXME: clean up cached changes */
+	/* FIXME: clean up cached tuplebufs */
+	if (buffer->outbufsize > 0)
+		pfree(buffer->outbuf);
+
+	hash_destroy(buffer->by_txn);
+	pfree(buffer);
+}
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferTXN.
+ */
+static ReorderBufferTXN *
+ReorderBufferGetTXN(ReorderBuffer *buffer)
+{
+	ReorderBufferTXN *txn;
+
+	if (buffer->nr_cached_transactions > 0)
+	{
+		buffer->nr_cached_transactions--;
+		txn = (ReorderBufferTXN *)
+			dlist_container(ReorderBufferTXN, node,
+						  dlist_pop_head_node(&buffer->cached_transactions));
+	}
+	else
+	{
+		txn = (ReorderBufferTXN *)
+			MemoryContextAlloc(buffer->context, sizeof(ReorderBufferTXN));
+	}
+
+	memset(txn, 0, sizeof(ReorderBufferTXN));
+
+	dlist_init(&txn->changes);
+	dlist_init(&txn->tuplecids);
+	dlist_init(&txn->subtxns);
+
+	return txn;
+}
+
+/*
+ * Free an ReorderBufferTXN. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+	/* clean the lookup cache if we were cached (quite likely) */
+	if (buffer->by_txn_last_xid == txn->xid)
+	{
+		buffer->by_txn_last_xid = InvalidTransactionId;
+		buffer->by_txn_last_txn = NULL;
+	}
+
+	if (txn->tuplecid_hash != NULL)
+	{
+		hash_destroy(txn->tuplecid_hash);
+		txn->tuplecid_hash = NULL;
+	}
+
+	if (txn->invalidations)
+	{
+		pfree(txn->invalidations);
+		txn->invalidations = NULL;
+	}
+
+	if (buffer->nr_cached_transactions < max_cached_transactions)
+	{
+		buffer->nr_cached_transactions++;
+		dlist_push_head(&buffer->cached_transactions, &txn->node);
+	}
+	else
+	{
+		pfree(txn);
+	}
+}
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferChange.
+ */
+ReorderBufferChange *
+ReorderBufferGetChange(ReorderBuffer *buffer)
+{
+	ReorderBufferChange *change;
+
+	if (buffer->nr_cached_changes)
+	{
+		buffer->nr_cached_changes--;
+		change = (ReorderBufferChange *)
+			dlist_container(ReorderBufferChange, node,
+							dlist_pop_head_node(&buffer->cached_changes));
+	}
+	else
+	{
+		change = (ReorderBufferChange *)
+			MemoryContextAlloc(buffer->context, sizeof(ReorderBufferChange));
+	}
+
+	memset(change, 0, sizeof(ReorderBufferChange));
+	return change;
+}
+
+/*
+ * Free an ReorderBufferChange. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnChange(ReorderBuffer *buffer, ReorderBufferChange *change)
+{
+	switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+	{
+		case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+		case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+		case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+			if (change->newtuple)
+			{
+				ReorderBufferReturnTupleBuf(buffer, change->newtuple);
+				change->newtuple = NULL;
+			}
+
+			if (change->oldtuple)
+			{
+				ReorderBufferReturnTupleBuf(buffer, change->oldtuple);
+				change->oldtuple = NULL;
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+			if (change->snapshot)
+			{
+				ReorderBufferFreeSnap(buffer, change->snapshot);
+				change->snapshot = NULL;
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+			break;
+		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+			break;
+	}
+
+	if (buffer->nr_cached_changes < max_cached_changes)
+	{
+		buffer->nr_cached_changes++;
+		dlist_push_head(&buffer->cached_changes, &change->node);
+	}
+	else
+	{
+		pfree(change);
+	}
+}
+
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferTupleBuf
+ */
+ReorderBufferTupleBuf *
+ReorderBufferGetTupleBuf(ReorderBuffer *buffer)
+{
+	ReorderBufferTupleBuf *tuple;
+
+	if (buffer->nr_cached_tuplebufs)
+	{
+		buffer->nr_cached_tuplebufs--;
+		tuple = slist_container(ReorderBufferTupleBuf, node,
+							 slist_pop_head_node(&buffer->cached_tuplebufs));
+#ifdef USE_ASSERT_CHECKING
+		memset(tuple, 0xdeadbeef, sizeof(ReorderBufferTupleBuf));
+#endif
+	}
+	else
+	{
+		tuple = (ReorderBufferTupleBuf *)
+			MemoryContextAlloc(buffer->context, sizeof(ReorderBufferTupleBuf));
+	}
+
+	return tuple;
+}
+
+/*
+ * Free an ReorderBufferTupleBuf. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnTupleBuf(ReorderBuffer *buffer, ReorderBufferTupleBuf *tuple)
+{
+	if (buffer->nr_cached_tuplebufs < max_cached_tuplebufs)
+	{
+		buffer->nr_cached_tuplebufs++;
+		slist_push_head(&buffer->cached_tuplebufs, &tuple->node);
+	}
+	else
+	{
+		pfree(tuple);
+	}
+}
+
+/*
+ * Return the ReorderBufferTXN from the given buffer, specified by Xid.
+ * If create is true, and a transaction doesn't already exist, create it
+ * (with the given LSN, and as top transaction if that's specified);
+ * when this happens, is_new is set to true.
+ */
+static ReorderBufferTXN *
+ReorderBufferTXNByXid(ReorderBuffer *buffer, TransactionId xid, bool create,
+					  bool *is_new, XLogRecPtr lsn, bool create_as_top)
+{
+	ReorderBufferTXN *txn;
+	ReorderBufferTXNByIdEnt *ent;
+	bool		found;
+
+	Assert(!create || lsn != InvalidXLogRecPtr);
+
+	/*
+	 * Check the one-entry lookup cache first
+	 */
+	if (TransactionIdIsValid(buffer->by_txn_last_xid) &&
+		buffer->by_txn_last_xid == xid)
+	{
+		txn = buffer->by_txn_last_txn;
+
+		if (txn != NULL)
+		{
+			/* found it, and it's valid */
+			if (is_new)
+				*is_new = false;
+			return txn;
+		}
+
+		/*
+		 * cached as non-existant, and asked not to create? Then nothing else
+		 * to do.
+		 */
+		if (!create)
+			return NULL;
+		/* otherwise fall through to create it */
+	}
+
+	/*
+	 * If the cache wasn't hit or it yielded an "does-not-exist" and we want
+	 * to create an entry.
+	 */
+
+	/* search the lookup table */
+	ent = (ReorderBufferTXNByIdEnt *)
+		hash_search(buffer->by_txn,
+					(void *) &xid,
+					create ? HASH_ENTER : HASH_FIND,
+					&found);
+	if (found)
+		txn = ent->txn;
+	else if (create)
+	{
+		/* initialize the new entry, if creation was requested */
+		Assert(ent != NULL);
+
+		ent->txn = ReorderBufferGetTXN(buffer);
+		ent->txn->xid = xid;
+		txn = ent->txn;
+		txn->lsn = lsn;
+		txn->restart_decoding_lsn = buffer->current_restart_decoding_lsn;
+
+		if (create_as_top)
+		{
+			dlist_push_tail(&buffer->toplevel_by_lsn, &txn->node);
+			AssertTXNLsnOrder(buffer);
+		}
+	}
+	else
+		txn = NULL;				/* not found and not asked to create */
+
+	/* update cache */
+	buffer->by_txn_last_xid = xid;
+	buffer->by_txn_last_txn = txn;
+
+	if (is_new)
+		*is_new = !found;
+
+	Assert(!create || !!txn);
+	return txn;
+}
+
+/*
+ * Queue a change into a transaction so it can be replayed upon commit.
+ */
+void
+ReorderBufferAddChange(ReorderBuffer *buffer, TransactionId xid, XLogRecPtr lsn,
+					   ReorderBufferChange *change)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, true, NULL, lsn, true);
+
+	change->lsn = lsn;
+	Assert(InvalidXLogRecPtr != lsn);
+	dlist_push_tail(&txn->changes, &change->node);
+	txn->nentries++;
+	txn->nentries_mem++;
+
+	ReorderBufferCheckSerializeTXN(buffer, txn);
+}
+
+static void
+AssertTXNLsnOrder(ReorderBuffer *buffer)
+{
+#ifdef USE_ASSERT_CHECKING
+	dlist_iter	iter;
+	XLogRecPtr	last_lsn = InvalidXLogRecPtr;
+
+	dlist_foreach(iter, &buffer->toplevel_by_lsn)
+	{
+		ReorderBufferTXN *cur_txn;
+
+		cur_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
+		Assert(cur_txn->lsn != InvalidXLogRecPtr);
+
+		if (cur_txn->last_lsn != InvalidXLogRecPtr)
+			Assert(cur_txn->lsn <= cur_txn->last_lsn);
+
+		if (last_lsn != InvalidXLogRecPtr)
+			Assert(last_lsn < cur_txn->lsn);
+
+		Assert(!cur_txn->is_known_as_subxact);
+		last_lsn = cur_txn->lsn;
+	}
+#endif
+}
+
+ReorderBufferTXN *
+ReorderBufferGetOldestTXN(ReorderBuffer *buffer)
+{
+	ReorderBufferTXN *txn;
+
+	if (dlist_is_empty(&buffer->toplevel_by_lsn))
+		return NULL;
+
+	AssertTXNLsnOrder(buffer);
+
+	txn = dlist_head_element(ReorderBufferTXN, node, &buffer->toplevel_by_lsn);
+
+	Assert(!txn->is_known_as_subxact);
+	Assert(txn->lsn != InvalidXLogRecPtr);
+	return txn;
+}
+
+void
+ReorderBufferSetRestartPoint(ReorderBuffer *buffer, XLogRecPtr ptr)
+{
+	buffer->current_restart_decoding_lsn = ptr;
+}
+
+void
+ReorderBufferAssignChild(ReorderBuffer *buffer, TransactionId xid,
+						 TransactionId subxid, XLogRecPtr lsn)
+{
+	ReorderBufferTXN *txn;
+	ReorderBufferTXN *subtxn;
+	bool		new_sub;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, true, NULL, lsn, true);
+	subtxn = ReorderBufferTXNByXid(buffer, subxid, true, &new_sub, lsn, false);
+
+	if (new_sub)
+	{
+		/*
+		 * we assign subtransactions to top level transaction even if we don't
+		 * have data for it yet, assignment records frequently reference xids
+		 * that have not yet produced any records. Knowing those aren't top
+		 * level xids allows us to make processing cheaper in some places.
+		 */
+		dlist_push_tail(&txn->subtxns, &subtxn->node);
+		txn->nsubtxns++;
+	}
+	else if (!subtxn->is_known_as_subxact)
+	{
+		subtxn->is_known_as_subxact = true;
+
+		/* remove from lsn order list of top-level transactions */
+		dlist_delete(&subtxn->node);
+
+		/* add to toplevel transaction */
+		dlist_push_tail(&txn->subtxns, &subtxn->node);
+		txn->nsubtxns++;
+	}
+}
+
+/*
+ * Associate a subtransaction with its toplevel transaction at commit
+ * time. There may be no further changes added after this.
+ */
+void
+ReorderBufferCommitChild(ReorderBuffer *buffer, TransactionId xid,
+						 TransactionId subxid, XLogRecPtr lsn)
+{
+	ReorderBufferTXN *txn;
+	ReorderBufferTXN *subtxn;
+	bool		top_is_new;
+
+	subtxn = ReorderBufferTXNByXid(buffer, subxid, false, NULL,
+								   InvalidXLogRecPtr, false);
+
+	/*
+	 * No need to do anything if that subtxn didn't contain any changes
+	 */
+	if (!subtxn)
+		return;
+
+	/*
+	 * FIXME: Using the subtxn lsn as top lsn isn't great (if we're creating)!
+	 */
+	txn = ReorderBufferTXNByXid(buffer, xid, true, &top_is_new, lsn, true);
+
+	subtxn->last_lsn = lsn;
+
+	Assert(!top_is_new || !subtxn->is_known_as_subxact);
+
+	if (!subtxn->is_known_as_subxact)
+	{
+		subtxn->is_known_as_subxact = true;
+
+		/* remove from lsn order list of top-level transactions */
+		dlist_delete(&subtxn->node);
+
+		/* add to subtransaction list */
+		dlist_push_tail(&txn->subtxns, &subtxn->node);
+		txn->nsubtxns++;
+	}
+}
+
+
+/*
+ * Support for efficiently iterating over a transaction's and its
+ * subtransactions' changes.
+ *
+ * We do by doing a k-way merge between transactions/subtransactions. For that
+ * we model the current heads of the different transactions as a binary heap so
+ * we easily know which (sub-)transaction has the change with the smallest lsn
+ * next.
+ *
+ * We assume the changes in individual transactions are already sorted by LSN.
+ */
+
+/*
+ * Binary heap comparison function.
+ */
+static int
+ReorderBufferIterCompare(Datum a, Datum b, void *arg)
+{
+	ReorderBufferIterTXNState *state = (ReorderBufferIterTXNState *) arg;
+	XLogRecPtr	pos_a = state->entries[DatumGetInt32(a)].lsn;
+	XLogRecPtr	pos_b = state->entries[DatumGetInt32(b)].lsn;
+
+	if (pos_a < pos_b)
+		return 1;
+	else if (pos_a == pos_b)
+		return 0;
+	return -1;
+}
+
+/*
+ * Allocate & initialize an iterator which iterates in lsn order over a
+ * transaction and all its subtransactions.
+ */
+static ReorderBufferIterTXNState *
+ReorderBufferIterTXNInit(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+	Size		nr_txns = 0;
+	ReorderBufferIterTXNState *state;
+	dlist_iter	cur_txn_i;
+	int32		off;
+
+	/*
+	 * Calculate the size of our heap: one element for every transaction that
+	 * contains changes.  (Besides the transactions already in the reorder
+	 * buffer, we count the one we were directly passed.)
+	 */
+	if (txn->nentries > 0)
+		nr_txns++;
+
+	dlist_foreach(cur_txn_i, &txn->subtxns)
+	{
+		ReorderBufferTXN *cur_txn;
+
+		cur_txn = dlist_container(ReorderBufferTXN, node, cur_txn_i.cur);
+
+		if (cur_txn->nentries > 0)
+			nr_txns++;
+	}
+
+	/*
+	 * XXX: Add fastpath for the rather common nr_txns=1 case, no need to
+	 * allocate/build a heap in that case.
+	 */
+
+	/* allocate iteration state */
+	state = (ReorderBufferIterTXNState *)
+		MemoryContextAllocZero(buffer->context,
+							   sizeof(ReorderBufferIterTXNState) +
+							   sizeof(ReorderBufferIterTXNEntry) * nr_txns);
+
+	state->nr_txns = nr_txns;
+	dlist_init(&state->old_change);
+
+	for (off = 0; off < state->nr_txns; off++)
+	{
+		state->entries[off].fd = -1;
+		state->entries[off].segno = 0;
+	}
+
+	/* allocate heap */
+	state->heap = binaryheap_allocate(state->nr_txns, ReorderBufferIterCompare,
+									  state);
+
+	/*
+	 * Now insert items into the binary heap, unordered.  (We will run a heap
+	 * assembly step at the end; this is more efficient.)
+	 */
+
+	off = 0;
+
+	/* add toplevel transaction if it contains changes */
+	if (txn->nentries > 0)
+	{
+		ReorderBufferChange *cur_change;
+
+		if (txn->nentries != txn->nentries_mem)
+			ReorderBufferRestoreChanges(buffer, txn, &state->entries[off].fd,
+										&state->entries[off].segno);
+
+		cur_change = dlist_head_element(ReorderBufferChange, node,
+										&txn->changes);
+
+		state->entries[off].lsn = cur_change->lsn;
+		state->entries[off].change = cur_change;
+		state->entries[off].txn = txn;
+
+		binaryheap_add_unordered(state->heap, Int32GetDatum(off++));
+	}
+
+	/* add subtransactions if they contain changes */
+	dlist_foreach(cur_txn_i, &txn->subtxns)
+	{
+		ReorderBufferTXN *cur_txn;
+
+		cur_txn = dlist_container(ReorderBufferTXN, node, cur_txn_i.cur);
+
+		if (cur_txn->nentries > 0)
+		{
+			ReorderBufferChange *cur_change;
+
+			if (txn->nentries != txn->nentries_mem)
+				ReorderBufferRestoreChanges(buffer, cur_txn,
+											&state->entries[off].fd,
+											&state->entries[off].segno);
+
+			cur_change = dlist_head_element(ReorderBufferChange, node,
+											&cur_txn->changes);
+
+			state->entries[off].lsn = cur_change->lsn;
+			state->entries[off].change = cur_change;
+			state->entries[off].txn = cur_txn;
+
+			binaryheap_add_unordered(state->heap, Int32GetDatum(off++));
+		}
+	}
+
+	/* assemble a valid binary heap */
+	binaryheap_build(state->heap);
+
+	return state;
+}
+
+static void
+ReorderBufferRestoreCleanup(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+	XLogSegNo	first;
+	XLogSegNo	cur;
+	XLogSegNo	last;
+
+	XLByteToSeg(txn->lsn, first);
+	XLByteToSeg(txn->last_lsn, last);
+
+	for (cur = first; cur <= last; cur++)
+	{
+		char		path[MAXPGPATH];
+		XLogRecPtr	recptr;
+
+		XLogSegNoOffsetToRecPtr(cur, 0, recptr);
+
+		sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+				NameStr(MyLogicalDecodingSlot->name), txn->xid,
+				(uint32) (recptr >> 32), (uint32) recptr);
+		if (unlink(path) != 0 && errno != ENOENT)
+			elog(FATAL, "could not unlink file \"%s\": %m", path);
+	}
+}
+
+/*
+ * Return the next change when iterating over a transaction and its
+ * subtransaction.
+ *
+ * Returns NULL when no further changes exist.
+ */
+static ReorderBufferChange *
+ReorderBufferIterTXNNext(ReorderBuffer *buffer, ReorderBufferIterTXNState *state)
+{
+	ReorderBufferChange *change;
+	ReorderBufferIterTXNEntry *entry;
+	int32		off;
+
+	/* nothing there anymore */
+	if (state->heap->bh_size == 0)
+		return NULL;
+
+	off = DatumGetInt32(binaryheap_first(state->heap));
+	entry = &state->entries[off];
+
+	if (!dlist_is_empty(&entry->txn->subtxns))
+		elog(LOG, "tx with subtxn %u", entry->txn->xid);
+
+	/* free memory we might have "leaked" in the previous *Next call */
+	if (!dlist_is_empty(&state->old_change))
+	{
+		change = dlist_container(ReorderBufferChange, node,
+								 dlist_pop_head_node(&state->old_change));
+		ReorderBufferReturnChange(buffer, change);
+		Assert(dlist_is_empty(&state->old_change));
+	}
+
+	change = entry->change;
+
+	/*
+	 * update heap with information about which transaction has the next
+	 * relevant change in LSN order
+	 */
+
+	/* there are in-memory changes */
+	if (dlist_has_next(&entry->txn->changes, &entry->change->node))
+	{
+		dlist_node *next = dlist_next_node(&entry->txn->changes, &change->node);
+		ReorderBufferChange *next_change =
+		dlist_container(ReorderBufferChange, node, next);
+
+		/* txn stays the same */
+		state->entries[off].lsn = next_change->lsn;
+		state->entries[off].change = next_change;
+
+		binaryheap_replace_first(state->heap, Int32GetDatum(off));
+		return change;
+	}
+
+	/* try to load changes from disk */
+	if (entry->txn->nentries != entry->txn->nentries_mem)
+	{
+		/*
+		 * Ugly: restoring changes will reuse *Change records, thus delete the
+		 * current one from the per-tx list and only free in the next call.
+		 */
+		dlist_delete(&change->node);
+		dlist_push_tail(&state->old_change, &change->node);
+
+		if (ReorderBufferRestoreChanges(buffer, entry->txn, &entry->fd,
+										&state->entries[off].segno))
+		{
+			/* successfully restored changes from disk */
+			ReorderBufferChange *next_change =
+			dlist_head_element(ReorderBufferChange, node,
+							   &entry->txn->changes);
+
+			elog(DEBUG2, "restored %zu/%zu changes from disk",
+				 entry->txn->nentries_mem, entry->txn->nentries);
+			Assert(entry->txn->nentries_mem);
+			/* txn stays the same */
+			state->entries[off].lsn = next_change->lsn;
+			state->entries[off].change = next_change;
+			binaryheap_replace_first(state->heap, Int32GetDatum(off));
+
+			return change;
+		}
+	}
+
+	/* ok, no changes there anymore, remove */
+	binaryheap_remove_first(state->heap);
+
+	return change;
+}
+
+/*
+ * Deallocate the iterator
+ */
+static void
+ReorderBufferIterTXNFinish(ReorderBuffer *buffer,
+						   ReorderBufferIterTXNState *state)
+{
+	int32		off;
+
+	for (off = 0; off < state->nr_txns; off++)
+	{
+		if (state->entries[off].fd != -1)
+			CloseTransientFile(state->entries[off].fd);
+	}
+
+	/* free memory we might have "leaked" in the last *Next call */
+	if (!dlist_is_empty(&state->old_change))
+	{
+		ReorderBufferChange *change;
+
+		change = dlist_container(ReorderBufferChange, node,
+								 dlist_pop_head_node(&state->old_change));
+		ReorderBufferReturnChange(buffer, change);
+		Assert(dlist_is_empty(&state->old_change));
+	}
+
+	binaryheap_free(state->heap);
+	pfree(state);
+}
+
+/*
+ * Cleanup the contents of a transaction, usually after the transaction
+ * committed or aborted.
+ */
+static void
+ReorderBufferCleanupTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+	bool		found;
+	dlist_mutable_iter iter;
+
+	/* cleanup subtransactions & their changes */
+	dlist_foreach_modify(iter, &txn->subtxns)
+	{
+		ReorderBufferTXN *subtxn;
+
+		subtxn = dlist_container(ReorderBufferTXN, node, iter.cur);
+		Assert(subtxn->is_known_as_subxact);
+
+		/*
+		 * subtransactions are always associated to the toplevel TXN, even if
+		 * they originally were happening inside another subtxn, so we won't
+		 * ever recurse more than one level here.
+		 */
+		ReorderBufferCleanupTXN(buffer, subtxn);
+	}
+
+	/* cleanup changes in the toplevel txn */
+	dlist_foreach_modify(iter, &txn->changes)
+	{
+		ReorderBufferChange *change;
+
+		change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+		ReorderBufferReturnChange(buffer, change);
+	}
+
+	/*
+	 * cleanup the tuplecids we stored timetravel access. They are always
+	 * stored in the toplevel transaction.
+	 */
+	dlist_foreach_modify(iter, &txn->tuplecids)
+	{
+		ReorderBufferChange *change;
+
+		change = dlist_container(ReorderBufferChange, node, iter.cur);
+		Assert(change->action_internal == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+		ReorderBufferReturnChange(buffer, change);
+	}
+
+	if (txn->base_snapshot != NULL)
+	{
+		SnapBuildSnapDecRefcount(txn->base_snapshot);
+		txn->base_snapshot = NULL;
+	}
+
+	/* delete from LSN ordered list of toplevel TXNs */
+	if (!txn->is_known_as_subxact)
+		dlist_delete(&txn->node);
+
+	/* now remove reference from buffer */
+	hash_search(buffer->by_txn,
+				(void *) &txn->xid,
+				HASH_REMOVE,
+				&found);
+	Assert(found);
+
+	/* remove entries spilled to disk */
+	if (txn->nentries != txn->nentries_mem)
+		ReorderBufferRestoreCleanup(buffer, txn);
+
+	/* deallocate */
+	ReorderBufferReturnTXN(buffer, txn);
+}
+
+/*
+ * Build a hash with a (relfilenode, ctid) -> (cmin, cmax) mapping for use by
+ * tqual.c's HeapTupleSatisfiesMVCCDuringDecoding.
+ */
+static void
+ReorderBufferBuildTupleCidHash(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+	dlist_iter	iter;
+	HASHCTL		hash_ctl;
+
+	if (!txn->does_timetravel || dlist_is_empty(&txn->tuplecids))
+		return;
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+
+	hash_ctl.keysize = sizeof(ReorderBufferTupleCidKey);
+	hash_ctl.entrysize = sizeof(ReorderBufferTupleCidEnt);
+	hash_ctl.hash = tag_hash;
+	hash_ctl.hcxt = buffer->context;
+
+	/*
+	 * create the hash with the exact number of to-be-stored tuplecids from
+	 * the start
+	 */
+	txn->tuplecid_hash =
+		hash_create("ReorderBufferTupleCid", txn->ntuplecids, &hash_ctl,
+					HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+
+	dlist_foreach(iter, &txn->tuplecids)
+	{
+		ReorderBufferTupleCidKey key;
+		ReorderBufferTupleCidEnt *ent;
+		bool		found;
+		ReorderBufferChange *change;
+
+		change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+		Assert(change->action_internal == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+		/* be careful about padding */
+		memset(&key, 0, sizeof(ReorderBufferTupleCidKey));
+
+		key.relnode = change->tuplecid.node;
+
+		ItemPointerCopy(&change->tuplecid.tid,
+						&key.tid);
+
+		ent = (ReorderBufferTupleCidEnt *)
+			hash_search(txn->tuplecid_hash,
+						(void *) &key,
+						HASH_ENTER | HASH_FIND,
+						&found);
+		if (!found)
+		{
+			ent->cmin = change->tuplecid.cmin;
+			ent->cmax = change->tuplecid.cmax;
+			ent->combocid = change->tuplecid.combocid;
+		}
+		else
+		{
+			Assert(ent->cmin == change->tuplecid.cmin);
+			Assert(ent->cmax == InvalidCommandId ||
+				   ent->cmax == change->tuplecid.cmax);
+
+			/*
+			 * if the tuple got valid in this transaction and now got deleted
+			 * we already have a valid cmin stored. The cmax will be
+			 * InvalidCommandId though.
+			 */
+			ent->cmax = change->tuplecid.cmax;
+		}
+	}
+}
+
+/*
+ * Copy a provided snapshot so we can modify it privately. This is needed so
+ * that catalog modifying transactions can look into intermediate catalog
+ * states.
+ */
+static Snapshot
+ReorderBufferCopySnap(ReorderBuffer *buffer, Snapshot orig_snap,
+					  ReorderBufferTXN *txn, CommandId cid)
+{
+	Snapshot	snap;
+	dlist_iter	iter;
+	int			i = 0;
+	Size		size;
+
+	size = sizeof(SnapshotData) +
+		sizeof(TransactionId) * orig_snap->xcnt +
+		sizeof(TransactionId) * (txn->nsubtxns + 1);
+
+	elog(DEBUG1, "copying a non-transaction-specific snapshot into timetravel tx %u", txn->xid);
+
+	snap = MemoryContextAllocZero(buffer->context, size);
+	memcpy(snap, orig_snap, sizeof(SnapshotData));
+
+	snap->copied = true;
+	snap->active_count = 0;
+	snap->regd_count = 0;
+	snap->xip = (TransactionId *) (snap + 1);
+
+	memcpy(snap->xip, orig_snap->xip, sizeof(TransactionId) * snap->xcnt);
+
+	/*
+	 * ->subxip contains all txids that belong to our transaction which we
+	 * need to check via cmin/cmax. Thats why we store the toplevel
+	 * transaction in there as well.
+	 */
+	snap->subxip = snap->xip + snap->xcnt;
+	snap->subxip[i++] = txn->xid;
+	snap->subxcnt = txn->nsubtxns + 1;
+
+	dlist_foreach(iter, &txn->subtxns)
+	{
+		ReorderBufferTXN *sub_txn;
+
+		sub_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
+		snap->subxip[i++] = sub_txn->xid;
+	}
+
+	/* sort so we can bsearch() later */
+	qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);
+
+	/* store the specified current CommandId */
+	snap->curcid = cid;
+
+	return snap;
+}
+
+/*
+ * Free a previously ReorderBufferCopySnap'ed snapshot
+ */
+static void
+ReorderBufferFreeSnap(ReorderBuffer *buffer, Snapshot snap)
+{
+	if (snap->copied)
+		pfree(snap);
+	else
+		SnapBuildSnapDecRefcount(snap);
+}
+
+/*
+ * Commit a transaction and replay all actions that previously have been
+ * ReorderBufferAddChange'd in the toplevel TX or any of the subtransactions
+ * assigned via ReorderBufferCommitChild.
+ */
+void
+ReorderBufferCommit(ReorderBuffer *buffer, TransactionId xid, XLogRecPtr lsn)
+{
+	ReorderBufferTXN *txn;
+	ReorderBufferIterTXNState *iterstate = NULL;
+	ReorderBufferChange *change;
+	CommandId	command_id = FirstCommandId;
+	Snapshot	snapshot_now;
+	Relation	relation = NULL;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, false, NULL, InvalidXLogRecPtr,
+								false);
+
+	/* empty transaction */
+	if (!txn)
+		return;
+
+	txn->last_lsn = lsn;
+
+	/* serialize the last bunch of changes if we need start earlier anyway */
+	if (txn->nentries_mem != txn->nentries)
+		ReorderBufferSerializeTXN(buffer, txn);
+
+	/*
+	 * If this transaction didn't have any real changes in our database, it's
+	 * OK not to have a snapshot.
+	 */
+	if (txn->base_snapshot == NULL)
+		return;
+
+	snapshot_now = txn->base_snapshot;
+
+	ReorderBufferBuildTupleCidHash(buffer, txn);
+
+	/* setup initial snapshot */
+	SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+
+	PG_TRY();
+	{
+		buffer->begin(buffer, txn);
+
+		iterstate = ReorderBufferIterTXNInit(buffer, txn);
+		while ((change = ReorderBufferIterTXNNext(buffer, iterstate)))
+		{
+			switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+			{
+				case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+				case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+				case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+					Assert(snapshot_now);
+
+					relation = LookupRelationByRelFileNode(&change->relnode);
+
+					/*
+					 * catalog tuple without data, while catalog has been
+					 * rewritten
+					 */
+					if (relation == NULL &&
+						change->newtuple == NULL && change->oldtuple == NULL)
+					{
+						continue;
+					}
+					else if (relation == NULL)
+					{
+						elog(ERROR, "could not lookup relation %s",
+							 relpathperm(change->relnode, MAIN_FORKNUM));
+					}
+
+					if (RelationIsLogicallyLogged(relation))
+					{
+						/* user-triggered change */
+						if (relation->rd_rel->relkind == RELKIND_SEQUENCE)
+						{
+						}
+						else if (!IsToastRelation(relation))
+						{
+							ReorderBufferToastReplace(buffer, txn, relation, change);
+							buffer->apply_change(buffer, txn, relation, change);
+							ReorderBufferToastReset(buffer, txn);
+						}
+						/* we're not interested in toast deletions */
+						else if (change->action == REORDER_BUFFER_CHANGE_INSERT)
+						{
+							/*
+							 * need to reassemble change in memory, ensure it
+							 * doesn't get reused till we're done.
+							 */
+							dlist_delete(&change->node);
+							ReorderBufferToastAppendChunk(buffer, txn, relation,
+														  change);
+						}
+
+					}
+					RelationClose(relation);
+					break;
+				case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+					/* XXX: we could skip snapshots in non toplevel txns */
+
+					/* get rid of the old */
+					RevertFromDecodingSnapshots();
+
+					if (snapshot_now->copied)
+					{
+						ReorderBufferFreeSnap(buffer, snapshot_now);
+						snapshot_now =
+							ReorderBufferCopySnap(buffer, change->snapshot,
+												  txn, command_id);
+					}
+
+					/*
+					 * restored from disk, we need to be careful not to double
+					 * free. We could introduce refcounting for that, but for
+					 * now this seems infrequent enough not to care.
+					 */
+					else if (change->snapshot->copied)
+					{
+						snapshot_now =
+							ReorderBufferCopySnap(buffer, change->snapshot,
+												  txn, command_id);
+					}
+					else
+					{
+						snapshot_now = change->snapshot;
+					}
+
+
+					/* and start with the new one */
+					SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+					break;
+
+				case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+					if (!snapshot_now->copied)
+					{
+						/* we don't use the global one anymore */
+						snapshot_now = ReorderBufferCopySnap(buffer, snapshot_now,
+															 txn, command_id);
+					}
+
+					command_id = Max(command_id, change->command_id);
+
+					if (command_id != InvalidCommandId)
+					{
+						snapshot_now->curcid = command_id;
+
+						RevertFromDecodingSnapshots();
+						SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+					}
+
+					/*
+					 * everytime the CommandId is incremented, we could see
+					 * new catalog contents
+					 */
+					ReorderBufferExecuteInvalidations(buffer, txn);
+
+					break;
+
+				case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+					elog(ERROR, "tuplecid value in normal queue");
+					break;
+			}
+		}
+
+		ReorderBufferIterTXNFinish(buffer, iterstate);
+
+		/* call commit callback */
+		buffer->commit(buffer, txn, lsn);
+
+
+		ResourceOwnerRelease(CurrentResourceOwner,
+							 RESOURCE_RELEASE_BEFORE_LOCKS,
+							 true, true);
+
+		AtEOXact_RelationCache(true);
+
+		ResourceOwnerRelease(CurrentResourceOwner,
+							 RESOURCE_RELEASE_LOCKS,
+							 true, true);
+
+		ResourceOwnerRelease(CurrentResourceOwner,
+							 RESOURCE_RELEASE_AFTER_LOCKS,
+							 true, true);
+
+		/* cleanup */
+		RevertFromDecodingSnapshots();
+
+		ReorderBufferExecuteInvalidations(buffer, txn);
+
+		if (snapshot_now->copied)
+			ReorderBufferFreeSnap(buffer, snapshot_now);
+
+		ReorderBufferCleanupTXN(buffer, txn);
+	}
+	PG_CATCH();
+	{
+		if (iterstate)
+			ReorderBufferIterTXNFinish(buffer, iterstate);
+
+		RevertFromDecodingSnapshots();
+
+		/* XXX: more cleanup needed */
+
+		if (snapshot_now->copied)
+			ReorderBufferFreeSnap(buffer, snapshot_now);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Abort a transaction that possibly has previous changes. Needs to be done
+ * independently for toplevel and subtransactions.
+ */
+void
+ReorderBufferAbort(ReorderBuffer *buffer, TransactionId xid, XLogRecPtr lsn)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, false, NULL, InvalidXLogRecPtr,
+								false);
+
+	/* no changes in this commit */
+	if (!txn)
+		return;
+
+	txn->last_lsn = lsn;
+
+	ReorderBufferCleanupTXN(buffer, txn);
+}
+
+/*
+ * Check whether a transaction is already known in this module
+ */
+bool
+ReorderBufferIsXidKnown(ReorderBuffer *buffer, TransactionId xid)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, false, NULL, InvalidXLogRecPtr,
+								false);
+	return txn != NULL;
+}
+
+/*
+ * Add a new snapshot to this transaction that is only used after lsn 'lsn'.
+ */
+void
+ReorderBufferAddSnapshot(ReorderBuffer *buffer, TransactionId xid,
+						 XLogRecPtr lsn, Snapshot snap)
+{
+	ReorderBufferChange *change = ReorderBufferGetChange(buffer);
+
+	change->snapshot = snap;
+	change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT;
+
+	ReorderBufferAddChange(buffer, xid, lsn, change);
+}
+
+/*
+ * Setup the base snapshot of a transaction. That is the snapshot that is used
+ * to decode all changes until either this transaction modifies the catalog or
+ * another catalog modifying transaction commits.
+ */
+void
+ReorderBufferSetBaseSnapshot(ReorderBuffer *buffer, TransactionId xid,
+							 XLogRecPtr lsn, Snapshot snap)
+{
+	ReorderBufferTXN *txn;
+	bool		is_new;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, true, &is_new, lsn, true);
+	Assert(txn->base_snapshot == NULL);
+
+	txn->base_snapshot = snap;
+}
+
+/*
+ * Access the catalog with this CommandId at this point in the changestream.
+ *
+ * May only be called for command ids > 1
+ */
+void
+ReorderBufferAddNewCommandId(ReorderBuffer *buffer, TransactionId xid,
+							 XLogRecPtr lsn, CommandId cid)
+{
+	ReorderBufferChange *change = ReorderBufferGetChange(buffer);
+
+	change->command_id = cid;
+	change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID;
+
+	ReorderBufferAddChange(buffer, xid, lsn, change);
+}
+
+
+/*
+ * Add new (relfilenode, tid) -> (cmin, cmax) mappings.
+ */
+void
+ReorderBufferAddNewTupleCids(ReorderBuffer *buffer, TransactionId xid,
+							 XLogRecPtr lsn, RelFileNode node,
+							 ItemPointerData tid, CommandId cmin,
+							 CommandId cmax, CommandId combocid)
+{
+	ReorderBufferChange *change = ReorderBufferGetChange(buffer);
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, true, NULL, lsn, true);
+
+	change->tuplecid.node = node;
+	change->tuplecid.tid = tid;
+	change->tuplecid.cmin = cmin;
+	change->tuplecid.cmax = cmax;
+	change->tuplecid.combocid = combocid;
+	change->lsn = lsn;
+	change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
+
+	dlist_push_tail(&txn->tuplecids, &change->node);
+	txn->ntuplecids++;
+}
+
+/*
+ * Setup the invalidation of the toplevel transaction.
+ *
+ * This needs to be done before ReorderBufferCommit is called!
+ */
+void
+ReorderBufferAddInvalidations(ReorderBuffer *buffer, TransactionId xid,
+							  XLogRecPtr lsn, Size nmsgs,
+							  SharedInvalidationMessage *msgs)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, true, NULL, lsn, true);
+
+	if (txn->ninvalidations != 0)
+		elog(ERROR, "only ever add one set of invalidations");
+
+	txn->ninvalidations = nmsgs;
+	txn->invalidations = (SharedInvalidationMessage *)
+		MemoryContextAlloc(buffer->context,
+						   sizeof(SharedInvalidationMessage) * nmsgs);
+	memcpy(txn->invalidations, msgs, sizeof(SharedInvalidationMessage) * nmsgs);
+}
+
+/*
+ * Apply all invalidations we know. Possibly we only need parts at this point
+ * in the changestream but we don't know which those are.
+ */
+static void
+ReorderBufferExecuteInvalidations(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+	int			i;
+
+	for (i = 0; i < txn->ninvalidations; i++)
+		LocalExecuteInvalidationMessage(&txn->invalidations[i]);
+}
+
+/*
+ * Mark a transaction as doing timetravel.
+ */
+void
+ReorderBufferXidSetTimetravel(ReorderBuffer *buffer, TransactionId xid,
+							  XLogRecPtr lsn)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, true, NULL, lsn, true);
+
+	txn->does_timetravel = true;
+}
+
+/*
+ * Query whether a transaction is already *known* to be doing timetravel. This
+ * can be wrong until directly before the commit!
+ */
+bool
+ReorderBufferXidDoesTimetravel(ReorderBuffer *buffer, TransactionId xid)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, false, NULL, InvalidXLogRecPtr,
+								false);
+	if (!txn)
+		return false;
+
+	return txn->does_timetravel;
+}
+
+/*
+ * Have we already added the first snapshot?
+ */
+bool
+ReorderBufferXidHasBaseSnapshot(ReorderBuffer *buffer, TransactionId xid)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(buffer, xid, false, NULL, InvalidXLogRecPtr,
+								false);
+
+	if (!txn)
+		return false;
+	return txn->base_snapshot != NULL;
+}
+
+static void
+ReorderBufferSerializeReserve(ReorderBuffer *buffer, Size sz)
+{
+	if (!buffer->outbufsize)
+	{
+		buffer->outbuf = MemoryContextAlloc(buffer->context, sz);
+		buffer->outbufsize = sz;
+	}
+	else if (buffer->outbufsize < sz)
+	{
+		buffer->outbuf = repalloc(buffer->outbuf, sz);
+		buffer->outbufsize = sz;
+	}
+}
+
+typedef struct ReorderBufferDiskChange
+{
+	Size		size;
+	ReorderBufferChange change;
+	/* data follows */
+} ReorderBufferDiskChange;
+
+/*
+ * Persistency support
+ */
+static void
+ReorderBufferSerializeChange(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+							 int fd, ReorderBufferChange *change)
+{
+	ReorderBufferDiskChange *ondisk;
+	Size		sz = sizeof(ReorderBufferDiskChange);
+
+	ReorderBufferSerializeReserve(buffer, sz);
+
+	ondisk = (ReorderBufferDiskChange *) buffer->outbuf;
+	memcpy(&ondisk->change, change, sizeof(ReorderBufferChange));
+
+	switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+	{
+		case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+			/* fall through */
+		case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+			/* fall through */
+		case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+			{
+				char	   *data;
+				Size		oldlen = 0;
+				Size		newlen = 0;
+
+				if (change->oldtuple)
+					oldlen = offsetof(ReorderBufferTupleBuf, data)
+						+change->oldtuple->tuple.t_len
+						- offsetof(HeapTupleHeaderData, t_bits);
+
+				if (change->newtuple)
+					newlen = offsetof(ReorderBufferTupleBuf, data)
+						+change->newtuple->tuple.t_len
+						- offsetof(HeapTupleHeaderData, t_bits);
+
+				sz += oldlen;
+				sz += newlen;
+
+				/* make sure we have enough space */
+				ReorderBufferSerializeReserve(buffer, sz);
+
+				data = ((char *) buffer->outbuf) + sizeof(ReorderBufferDiskChange);
+				/* might have been reallocated above */
+				ondisk = (ReorderBufferDiskChange *) buffer->outbuf;
+
+				if (oldlen)
+				{
+					memcpy(data, change->oldtuple, oldlen);
+					data += oldlen;
+					Assert(&change->oldtuple->header == change->oldtuple->tuple.t_data);
+				}
+
+				if (newlen)
+				{
+					memcpy(data, change->newtuple, newlen);
+					data += newlen;
+					Assert(&change->newtuple->header == change->newtuple->tuple.t_data);
+				}
+				break;
+			}
+		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+			{
+				char	   *data;
+
+				sz += sizeof(SnapshotData) +
+					sizeof(TransactionId) * change->snapshot->xcnt +
+					sizeof(TransactionId) * change->snapshot->subxcnt
+					;
+
+				/* make sure we have enough space */
+				ReorderBufferSerializeReserve(buffer, sz);
+				data = ((char *) buffer->outbuf) + sizeof(ReorderBufferDiskChange);
+				/* might have been reallocated above */
+				ondisk = (ReorderBufferDiskChange *) buffer->outbuf;
+
+				memcpy(data, change->snapshot, sizeof(SnapshotData));
+				data += sizeof(SnapshotData);
+
+				if (change->snapshot->xcnt)
+				{
+					memcpy(data, change->snapshot->xip,
+						   sizeof(TransactionId) + change->snapshot->xcnt);
+					data += sizeof(TransactionId) + change->snapshot->xcnt;
+				}
+
+				if (change->snapshot->subxcnt)
+				{
+					memcpy(data, change->snapshot->subxip,
+						   sizeof(TransactionId) + change->snapshot->subxcnt);
+					data += sizeof(TransactionId) + change->snapshot->subxcnt;
+				}
+				break;
+			}
+		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+			/* ReorderBufferChange contains everything important */
+			break;
+		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+			/* ReorderBufferChange contains everything important */
+			break;
+	}
+
+	ondisk->size = sz;
+
+	if (write(fd, buffer->outbuf, ondisk->size) != ondisk->size)
+	{
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to xid data file \"%u\": %m",
+						txn->xid)));
+	}
+
+	Assert(ondisk->change.action_internal == change->action_internal);
+}
+
+static void
+ReorderBufferCheckSerializeTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+	/* FIXME subtxn handling? */
+	if (txn->nentries_mem >= max_memtries)
+	{
+		ReorderBufferSerializeTXN(buffer, txn);
+		Assert(txn->nentries_mem == 0);
+	}
+}
+
+static void
+ReorderBufferSerializeTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+	dlist_iter	subtxn_i;
+	dlist_mutable_iter change_i;
+	int			fd = -1;
+	XLogSegNo	curOpenSegNo = 0;
+	Size		spilled = 0;
+	char		path[MAXPGPATH];
+
+	elog(DEBUG2, "spill %zu transactions in tx %u to disk",
+		 txn->nentries_mem, txn->xid);
+
+	/* do the same to all child TXs */
+	dlist_foreach(subtxn_i, &txn->subtxns)
+	{
+		ReorderBufferTXN *subtxn;
+
+		subtxn = dlist_container(ReorderBufferTXN, node, subtxn_i.cur);
+		ReorderBufferSerializeTXN(buffer, subtxn);
+	}
+
+	/* serialize changestream */
+	dlist_foreach_modify(change_i, &txn->changes)
+	{
+		ReorderBufferChange *change;
+
+		change = dlist_container(ReorderBufferChange, node, change_i.cur);
+
+		/*
+		 * store in segment in which it belongs by start lsn, don't split over
+		 * multiple segments tho
+		 */
+		if (fd == -1 || XLByteInSeg(change->lsn, curOpenSegNo))
+		{
+			XLogRecPtr	recptr;
+
+			if (fd != -1)
+				CloseTransientFile(fd);
+
+			XLByteToSeg(change->lsn, curOpenSegNo);
+			XLogSegNoOffsetToRecPtr(curOpenSegNo, 0, recptr);
+
+			sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+					NameStr(MyLogicalDecodingSlot->name), txn->xid,
+					(uint32) (recptr >> 32), (uint32) recptr);
+
+			/* open segment, create it if necessary */
+			fd = OpenTransientFile(path,
+								   O_CREAT | O_WRONLY | O_APPEND | PG_BINARY,
+								   S_IRUSR | S_IWUSR);
+
+			if (fd < 0)
+				ereport(ERROR, (errmsg("could not open reorderbuffer file %s for writing: %m", path)));
+		}
+
+		ReorderBufferSerializeChange(buffer, txn, fd, change);
+		dlist_delete(&change->node);
+		ReorderBufferReturnChange(buffer, change);
+
+		spilled++;
+	}
+
+	Assert(spilled == txn->nentries_mem);
+	Assert(dlist_is_empty(&txn->changes));
+	txn->nentries_mem = 0;
+
+	if (fd != -1)
+		CloseTransientFile(fd);
+
+	/* issue write barrier */
+	/* serialize main transaction state */
+}
+
+static Size
+ReorderBufferRestoreChanges(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+							int *fd, XLogSegNo *segno)
+{
+	Size		restored = 0;
+	XLogSegNo	last_segno;
+	dlist_mutable_iter cleanup_iter;
+
+	Assert(txn->lsn != InvalidXLogRecPtr);
+	Assert(txn->last_lsn != InvalidXLogRecPtr);
+
+	/* free current entries, so we have memory for more */
+	dlist_foreach_modify(cleanup_iter, &txn->changes)
+	{
+		ReorderBufferChange *cleanup =
+		dlist_container(ReorderBufferChange, node, cleanup_iter.cur);
+
+		dlist_delete(&cleanup->node);
+		ReorderBufferReturnChange(buffer, cleanup);
+	}
+	txn->nentries_mem = 0;
+	Assert(dlist_is_empty(&txn->changes));
+
+	XLByteToSeg(txn->last_lsn, last_segno);
+
+	while (restored < max_memtries && *segno <= last_segno)
+	{
+		int			readBytes;
+		ReorderBufferDiskChange *ondisk;
+
+		if (*fd == -1)
+		{
+			XLogRecPtr	recptr;
+			char		path[MAXPGPATH];
+
+			/* first time in */
+			if (*segno == 0)
+			{
+				XLByteToSeg(txn->lsn, *segno);
+				elog(LOG, "initial restoring from %zu to %zu",
+					 *segno, last_segno);
+			}
+
+			Assert(*segno != 0 || dlist_is_empty(&txn->changes));
+			XLogSegNoOffsetToRecPtr(*segno, 0, recptr);
+
+			sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+					NameStr(MyLogicalDecodingSlot->name), txn->xid,
+					(uint32) (recptr >> 32), (uint32) recptr);
+
+			elog(LOG, "opening file %s", path);
+
+			*fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+			if (*fd < 0 && errno == ENOENT)
+			{
+				*fd = -1;
+				(*segno)++;
+				continue;
+			}
+			else if (*fd < 0)
+				ereport(ERROR, (errmsg("could not open reorderbuffer file %s for reading: %m", path)));
+
+		}
+
+		ReorderBufferSerializeReserve(buffer, sizeof(ReorderBufferDiskChange));
+
+
+		/*
+		 * read the statically sized part of a change which has information
+		 * about the total size. If we couldn't read a record, we're at the
+		 * end of this file.
+		 */
+
+		readBytes = read(*fd, buffer->outbuf, sizeof(ReorderBufferDiskChange));
+
+		/* eof */
+		if (readBytes == 0)
+		{
+			CloseTransientFile(*fd);
+			*fd = -1;
+			(*segno)++;
+			continue;
+		}
+		else if (readBytes < 0)
+			elog(ERROR, "read failed: %m");
+		else if (readBytes != sizeof(ReorderBufferDiskChange))
+			elog(ERROR, "incomplete read, read %d instead of %zu",
+				 readBytes, sizeof(ReorderBufferDiskChange));
+
+		ondisk = (ReorderBufferDiskChange *) buffer->outbuf;
+
+		ReorderBufferSerializeReserve(buffer, sizeof(ReorderBufferDiskChange) + ondisk->size);
+		ondisk = (ReorderBufferDiskChange *) buffer->outbuf;
+
+		readBytes = read(*fd, buffer->outbuf + sizeof(ReorderBufferDiskChange),
+						 ondisk->size - sizeof(ReorderBufferDiskChange));
+
+		if (readBytes < 0)
+			elog(ERROR, "read2 failed: %m");
+		else if (readBytes != ondisk->size - sizeof(ReorderBufferDiskChange))
+			elog(ERROR, "incomplete read2, read %d instead of %zu",
+				 readBytes, ondisk->size - sizeof(ReorderBufferDiskChange));
+
+		/*
+		 * ok, read a full change from disk, now restore it into proper
+		 * in-memory format
+		 */
+		ReorderBufferRestoreChange(buffer, txn, buffer->outbuf);
+		restored++;
+	}
+
+	return restored;
+}
+
+/*
+ * Convert change from its on-disk format to in-memory format and queue it onto
+ * the TXN's ->changes list.
+ */
+static void
+ReorderBufferRestoreChange(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+						   char *data)
+{
+	ReorderBufferDiskChange *ondisk;
+	ReorderBufferChange *change;
+
+	ondisk = (ReorderBufferDiskChange *) data;
+
+	change = ReorderBufferGetChange(buffer);
+
+	/* copy static part */
+	memcpy(change, &ondisk->change, sizeof(ReorderBufferChange));
+
+	data += sizeof(ReorderBufferDiskChange);
+
+	/* restore individual stuff */
+	switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+	{
+		case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+			/* fall through */
+		case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+			/* fall through */
+		case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+			if (change->newtuple)
+			{
+				Size		len = offsetof(ReorderBufferTupleBuf, data)
+				+((ReorderBufferTupleBuf *) data)->tuple.t_len
+				- offsetof(HeapTupleHeaderData, t_bits);
+
+				change->newtuple = ReorderBufferGetTupleBuf(buffer);
+				memcpy(change->newtuple, data, len);
+				change->newtuple->tuple.t_data = &change->newtuple->header;
+
+				data += len;
+			}
+
+			if (change->oldtuple)
+			{
+				Size		len = offsetof(ReorderBufferTupleBuf, data)
+				+((ReorderBufferTupleBuf *) data)->tuple.t_len
+				- offsetof(HeapTupleHeaderData, t_bits);
+
+				change->oldtuple = ReorderBufferGetTupleBuf(buffer);
+				memcpy(change->oldtuple, data, len);
+				change->oldtuple->tuple.t_data = &change->oldtuple->header;
+				data += len;
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+			{
+				Snapshot	oldsnap = (Snapshot) data;
+				Size		size = sizeof(SnapshotData) +
+				sizeof(TransactionId) * oldsnap->xcnt +
+				sizeof(TransactionId) * (oldsnap->subxcnt + 0)
+						   ;
+
+				Assert(change->snapshot != NULL);
+
+				change->snapshot = MemoryContextAllocZero(buffer->context, size);
+
+				memcpy(change->snapshot, data, size);
+				change->snapshot->xip = (TransactionId *)
+					(((char *) change->snapshot) + sizeof(SnapshotData));
+				change->snapshot->subxip =
+					change->snapshot->xip + change->snapshot->xcnt + 0;
+				change->snapshot->copied = true;
+				break;
+			}
+			/* nothing needs to be done */
+		case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+		case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+			break;
+	}
+
+	dlist_push_tail(&txn->changes, &change->node);
+	txn->nentries_mem++;
+}
+
+/*
+ * Delete all data spilled to disk.
+ */
+void
+ReorderBufferStartup(void)
+{
+	DIR		   *logical_dir;
+	struct dirent *logical_de;
+
+	DIR		   *spill_dir;
+	struct dirent *spill_de;
+
+	logical_dir = AllocateDir("pg_llog");
+	while ((logical_de = ReadDir(logical_dir, "pg_llog")) != NULL)
+	{
+		char		path[MAXPGPATH];
+
+		if (strcmp(logical_de->d_name, ".") == 0 ||
+			strcmp(logical_de->d_name, "..") == 0)
+			continue;
+
+		/* one of our own directories */
+		if (strcmp(logical_de->d_name, "snapshots") == 0)
+			continue;
+
+		/*
+		 * ok, has to be a surviving logical slot, iterate and delete
+		 * everythign starting with xid-*
+		 */
+		sprintf(path, "pg_llog/%s", logical_de->d_name);
+
+		spill_dir = AllocateDir(path);
+		while ((spill_de = ReadDir(spill_dir, "pg_llog")) != NULL)
+		{
+			if (strcmp(spill_de->d_name, ".") == 0 ||
+				strcmp(spill_de->d_name, "..") == 0)
+				continue;
+
+			if (strncmp(spill_de->d_name, "xid", 3) == 0)
+			{
+				sprintf(path, "pg_llog/%s/%s", logical_de->d_name,
+						spill_de->d_name);
+
+				if (unlink(path) != 0)
+					ereport(PANIC,
+							(errcode_for_file_access(),
+						  errmsg("could not remove xid data file \"%s\": %m",
+								 path)));
+			}
+			/* XXX: WARN? */
+		}
+		FreeDir(spill_dir);
+	}
+	FreeDir(logical_dir);
+}
+
+/*
+ * toast support
+ */
+
+/*
+ * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h?
+ */
+#define VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr)	\
+do { \
+	varattrib_1b_e *attre = (varattrib_1b_e *) (attr); \
+	Assert(VARATT_IS_EXTERNAL(attre)); \
+	Assert(VARSIZE_EXTERNAL(attre) == sizeof(toast_pointer) + VARHDRSZ_EXTERNAL); \
+	memcpy(&(toast_pointer), VARDATA_EXTERNAL(attre), sizeof(toast_pointer)); \
+} while (0)
+
+#define VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) \
+	((toast_pointer).va_extsize < (toast_pointer).va_rawsize - VARHDRSZ)
+
+/*
+ * Initialize per tuple toast reconstruction support.
+ */
+static void
+ReorderBufferToastInitHash(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+	HASHCTL		hash_ctl;
+
+	Assert(txn->toast_hash == NULL);
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(ReorderBufferToastEnt);
+	hash_ctl.hash = tag_hash;
+	hash_ctl.hcxt = buffer->context;
+	txn->toast_hash = hash_create("ReorderBufferToastHash", 5, &hash_ctl,
+								  HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+}
+
+/*
+ * Per toast-chunk handling for toast reconstruction
+ *
+ * Appends a toast chunk so we can reconstruct it when the tuple "owning" the
+ * toasted Datum comes along.
+ */
+static void
+ReorderBufferToastAppendChunk(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+							  Relation relation, ReorderBufferChange *change)
+{
+	ReorderBufferToastEnt *ent;
+	bool		found;
+	int32		chunksize;
+	bool		isnull;
+	Pointer		chunk;
+	TupleDesc	desc = RelationGetDescr(relation);
+	Oid			chunk_id;
+	Oid			chunk_seq;
+
+	if (txn->toast_hash == NULL)
+		ReorderBufferToastInitHash(buffer, txn);
+
+	Assert(IsToastRelation(relation));
+
+	chunk_id = DatumGetObjectId(fastgetattr(&change->newtuple->tuple, 1, desc, &isnull));
+	Assert(!isnull);
+	chunk_seq = DatumGetInt32(fastgetattr(&change->newtuple->tuple, 2, desc, &isnull));
+	Assert(!isnull);
+
+	ent = (ReorderBufferToastEnt *)
+		hash_search(txn->toast_hash,
+					(void *) &chunk_id,
+					HASH_ENTER,
+					&found);
+
+	if (!found)
+	{
+		Assert(ent->chunk_id == chunk_id);
+		ent->num_chunks = 0;
+		ent->last_chunk_seq = 0;
+		ent->size = 0;
+		ent->reconstructed = NULL;
+		dlist_init(&ent->chunks);
+
+		if (chunk_seq != 0)
+			elog(ERROR, "got sequence entry %d for toast chunk %u instead of seq 0",
+				 chunk_seq, chunk_id);
+	}
+	else if (found && chunk_seq != ent->last_chunk_seq + 1)
+		elog(ERROR, "got sequence entry %d for toast chunk %u instead of seq %d",
+			 chunk_seq, chunk_id, ent->last_chunk_seq + 1);
+
+	chunk = DatumGetPointer(fastgetattr(&change->newtuple->tuple, 3, desc, &isnull));
+	Assert(!isnull);
+
+	/* calculate size so we can allocate the right size at once later */
+	if (!VARATT_IS_EXTENDED(chunk))
+		chunksize = VARSIZE(chunk) - VARHDRSZ;
+	else if (VARATT_IS_SHORT(chunk))
+		/* could happen due to heap_form_tuple doing its thing */
+		chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+	else
+		elog(ERROR, "unexpected type of toast chunk");
+
+	ent->size += chunksize;
+	ent->last_chunk_seq = chunk_seq;
+	ent->num_chunks++;
+	dlist_push_tail(&ent->chunks, &change->node);
+}
+
+/*
+ * Rejigger change->newtuple to point to in-memory toast tuples instead to
+ * on-disk toast tuples that may not longer exist (think DROP TABLE or VACUUM).
+ *
+ * We cannot replace unchanged toast tuples though, so those will still point
+ * to on-disk toast data.
+ */
+static void
+ReorderBufferToastReplace(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+						  Relation relation, ReorderBufferChange *change)
+{
+	TupleDesc	desc;
+	int			natt;
+	Datum	   *attrs;
+	bool	   *isnull;
+	bool	   *free;
+	HeapTuple	newtup;
+	Relation	toast_rel;
+	TupleDesc	toast_desc;
+
+	/* no toast tuples changed */
+	if (txn->toast_hash == NULL)
+		return;
+
+	/* we should only have toast tuples in an INSERT or UPDATE */
+	Assert(change->newtuple);
+
+	desc = RelationGetDescr(relation);
+
+	toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
+	toast_desc = RelationGetDescr(toast_rel);
+
+	/* should we allocate from stack instead? */
+	attrs = palloc0(sizeof(Datum) * desc->natts);
+	isnull = palloc0(sizeof(bool) * desc->natts);
+	free = palloc0(sizeof(bool) * desc->natts);
+
+	heap_deform_tuple(&change->newtuple->tuple, desc,
+					  attrs, isnull);
+
+	for (natt = 0; natt < desc->natts; natt++)
+	{
+		Form_pg_attribute attr = desc->attrs[natt];
+		ReorderBufferToastEnt *ent;
+		struct varlena *varlena;
+
+		/* va_rawsize is the size of the original datum -- including header */
+		struct varatt_external toast_pointer;
+		struct varatt_indirect redirect_pointer;
+		struct varlena *new_datum = NULL;
+		struct varlena *reconstructed;
+		dlist_iter	it;
+		Size		data_done = 0;
+
+		/* system columns aren't toasted */
+		if (attr->attnum < 0)
+			continue;
+
+		if (attr->attisdropped)
+			continue;
+
+		/* not a varlena datatype */
+		if (attr->attlen != -1)
+			continue;
+
+		/* no data */
+		if (isnull[natt])
+			continue;
+
+		/* ok, we know we have a toast datum */
+		varlena = (struct varlena *) DatumGetPointer(attrs[natt]);
+
+		/* no need to do anything if the tuple isn't external */
+		if (!VARATT_IS_EXTERNAL(varlena))
+			continue;
+
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, varlena);
+
+		/*
+		 * check whether the toast tuple changed, replace if so.
+		 */
+		ent = (ReorderBufferToastEnt *)
+			hash_search(txn->toast_hash,
+						(void *) &toast_pointer.va_valueid,
+						HASH_FIND,
+						NULL);
+		if (ent == NULL)
+			continue;
+
+		new_datum =
+			(struct varlena *) palloc0(INDIRECT_POINTER_SIZE);
+
+		free[natt] = true;
+
+		reconstructed = palloc0(toast_pointer.va_rawsize);
+
+		ent->reconstructed = reconstructed;
+
+		/* stitch toast tuple back together from its parts */
+		dlist_foreach(it, &ent->chunks)
+		{
+			bool		isnull;
+			ReorderBufferTupleBuf *tup =
+			dlist_container(ReorderBufferChange, node, it.cur)->newtuple;
+			Pointer		chunk =
+			DatumGetPointer(fastgetattr(&tup->tuple, 3, toast_desc, &isnull));
+
+			Assert(!isnull);
+			Assert(!VARATT_IS_EXTERNAL(chunk));
+			Assert(!VARATT_IS_SHORT(chunk));
+
+			memcpy(VARDATA(reconstructed) + data_done,
+				   VARDATA(chunk),
+				   VARSIZE(chunk) - VARHDRSZ);
+			data_done += VARSIZE(chunk) - VARHDRSZ;
+		}
+		Assert(data_done == toast_pointer.va_extsize);
+
+		/* make sure its marked as compressed or not */
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+			SET_VARSIZE_COMPRESSED(reconstructed, data_done + VARHDRSZ);
+		else
+			SET_VARSIZE(reconstructed, data_done + VARHDRSZ);
+
+		memset(&redirect_pointer, 0, sizeof(redirect_pointer));
+		redirect_pointer.pointer = reconstructed;
+
+		SET_VARTAG_EXTERNAL(new_datum, VARTAG_INDIRECT);
+		memcpy(VARDATA_EXTERNAL(new_datum), &redirect_pointer,
+			   sizeof(redirect_pointer));
+
+		attrs[natt] = PointerGetDatum(new_datum);
+	}
+
+	/*
+	 * Build tuple in separate memory & copy tuple back into the tuplebuf
+	 * passed to the output plugin. We can't directly heap_fill_tuple() into
+	 * the tuplebuf because attrs[] will point back into the current content.
+	 */
+	newtup = heap_form_tuple(desc, attrs, isnull);
+	Assert(change->newtuple->tuple.t_len <= MaxHeapTupleSize);
+	Assert(&change->newtuple->header == change->newtuple->tuple.t_data);
+
+	memcpy(change->newtuple->tuple.t_data,
+		   newtup->t_data,
+		   newtup->t_len);
+	change->newtuple->tuple.t_len = newtup->t_len;
+
+	/*
+	 * free resources we won't further need, more persistent stuff will be
+	 * free'd in ReorderBufferToastReset().
+	 */
+	RelationClose(toast_rel);
+	pfree(newtup);
+	for (natt = 0; natt < desc->natts; natt++)
+	{
+		if (free[natt])
+			pfree(DatumGetPointer(attrs[natt]));
+	}
+	pfree(attrs);
+	pfree(free);
+	pfree(isnull);
+
+}
+
+/*
+ * Free all resources allocated for toast reconstruction.
+ */
+static void
+ReorderBufferToastReset(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+	HASH_SEQ_STATUS hstat;
+	ReorderBufferToastEnt *ent;
+
+	if (txn->toast_hash == NULL)
+		return;
+
+	/* sequentially walk over the hash and free everything */
+	hash_seq_init(&hstat, txn->toast_hash);
+	while ((ent = (ReorderBufferToastEnt *) hash_seq_search(&hstat)) != NULL)
+	{
+		dlist_mutable_iter it;
+
+		if (ent->reconstructed != NULL)
+			pfree(ent->reconstructed);
+
+		dlist_foreach_modify(it, &ent->chunks)
+		{
+			ReorderBufferChange *change =
+			dlist_container(ReorderBufferChange, node, it.cur);
+
+			dlist_delete(&change->node);
+			ReorderBufferReturnChange(buffer, change);
+		}
+	}
+
+	hash_destroy(txn->toast_hash);
+}
+
+
+/*
+ * Visibility support routines
+ */
+
+/*-------------------------------------------------------------------------
+ * Lookup actual cmin/cmax values during timetravel access. We can't always
+ * rely on stored cmin/cmax values because of two scenarios:
+ *
+ * * A tuple got changed multiple times during a single transaction and thus
+ *	 has got a combocid. Combocid's are only valid for the duration of a single
+ *	 transaction.
+ * * A tuple with a cmin but no cmax (and thus no combocid) got deleted/updated
+ *	 in another transaction than the one which created it which we are looking
+ *	 at right now. As only one of cmin, cmax or combocid is actually stored in
+ *	 the heap we don't have access to the the value we need anymore.
+ *
+ * To resolve those problems we have a per-transaction hash of (cmin, cmax)
+ * tuples keyed by (relfilenode, ctid) which contains the actual (cmin, cmax)
+ * values. That also takes care of combocids by simply not caring about them at
+ * all. As we have the real cmin/cmax values thats enough.
+ *
+ * As we only care about catalog tuples here the overhead of this hashtable
+ * should be acceptable.
+ * -------------------------------------------------------------------------
+ */
+extern bool
+ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
+							  HeapTuple htup, Buffer buffer,
+							  CommandId *cmin, CommandId *cmax)
+{
+	ReorderBufferTupleCidKey key;
+	ReorderBufferTupleCidEnt *ent;
+	ForkNumber	forkno;
+	BlockNumber blockno;
+
+	/* be careful about padding */
+	memset(&key, 0, sizeof(key));
+
+	Assert(!BufferIsLocal(buffer));
+
+	/*
+	 * get relfilenode from the buffer, no convenient way to access it other
+	 * than that.
+	 */
+	BufferGetTag(buffer, &key.relnode, &forkno, &blockno);
+
+	/* tuples can only be in the main fork */
+	Assert(forkno == MAIN_FORKNUM);
+	Assert(blockno == ItemPointerGetBlockNumber(&htup->t_self));
+
+	ItemPointerCopy(&htup->t_self,
+					&key.tid);
+
+	ent = (ReorderBufferTupleCidEnt *)
+		hash_search(tuplecid_data,
+					(void *) &key,
+					HASH_FIND,
+					NULL);
+
+	if (ent == NULL)
+		return false;
+
+	if (cmin)
+		*cmin = ent->cmin;
+	if (cmax)
+		*cmax = ent->cmax;
+	return true;
+}
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
new file mode 100644
index 0000000..9edd7ff
--- /dev/null
+++ b/src/backend/replication/logical/snapbuild.c
@@ -0,0 +1,1930 @@
+/*-------------------------------------------------------------------------
+ *
+ * snapbuild.c
+ *
+ *	  Support for building timetravel snapshots based on the contents of the
+ *	  wal
+ *
+ * NOTES:
+ *
+ * We build snapshots which can *only* be used to read catalog contents by
+ * reading the wal stream. The aim is to provide mvcc and SnapshotNow snapshots
+ * that behave the same as their respective counterparts would have at the time
+ * the XLogRecord was generated. This is done to provide a reliable environment
+ * for decoding those records into every format that pleases the author of an
+ * output plugin.
+ *
+ * To build the snapshots we reuse the infrastructure built for hot
+ * standby. The snapshots we build look different than HS' because we have
+ * different needs. To successfully decode data from the WAL we only need to
+ * access catalogs/(sys|rel|cat)cache, not the actual user tables since the
+ * data we decode is contained in the wal records. Also, our snapshots need to
+ * be different because in contrast to normal snapshots we can't fully rely on
+ * the clog for information about committed transactions because they might
+ * commit in the future from the POV of the wal entry we're currently decoding.
+ *
+ * As the percentage of transactions modifying the catalog normally is fairly
+ * small we keep track of the committed catalog modifying ones inside (xmin,
+ * xmax) instead of keeping track of all running transactions like its done in
+ * a normal snapshot. That is we keep a list of transactions between
+ * snapshot->(xmin, xmax) that we consider committed, everything else is
+ * considered aborted/in progress. That also allows us not to care about
+ * subtransactions before they have committed which means we don't have to deal
+ * with suboverflowed subtransactions and similar.
+ *
+ * Classic SnapshotNow behaviour - which is mainly used for efficiency, not for
+ * correctness - is not actually required by any of the routines that we need
+ * during decoding and is hard to emulate fully. Instead we build snapshots
+ * with MVCC behaviour that are updated whenever another transaction
+ * commits. That gives behaviour consistent with a SnapshotNow behaviour
+ * happening in exactly that instant without other transactions interfering.
+ *
+ * One additional complexity of doing this is that to e.g. handle mixed DDL/DML
+ * transactions we need Snapshots that see intermediate versions of the catalog
+ * in a transaction. During normal operation this is achieved by using
+ * CommandIds/cmin/cmax. The problem with this however is that for space
+ * efficiency reasons only one value of that is stored (c.f. combocid.c). Since
+ * Combocids are only available in memory we log additional information which
+ * allows us to get the original (cmin, cmax) pair during visibility checks.
+ *
+ * To facilitate all this we need our own visibility routine, as the normal
+ * ones are optimized for different usecases. We also need the code to use our
+ * special snapshots automatically whenever SnapshotNow behaviour is expected
+ * (specifying our snapshot everywhere would be far to invasive).
+ *
+ * To replace the normal SnapshotNows snapshots use the SetupDecodingSnapshots
+ * and RevertFromDecodingSnapshots functions.
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/replication/snapbuild.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "access/heapam_xlog.h"
+#include "access/rmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogreader.h"
+
+#include "catalog/catalog.h"
+#include "catalog/pg_control.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_tablespace.h"
+
+#include "miscadmin.h"
+
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+#include "replication/logical.h"
+
+#include "utils/builtins.h"
+#include "utils/catcache.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/relmapper.h"
+#include "utils/snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+#include "storage/block.h"		/* debugging output */
+#include "storage/copydir.h"	/* fsync_fname */
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/standby.h"
+#include "storage/sinval.h"
+
+typedef struct SnapBuild
+{
+	/* how far are we along building our first full snapshot */
+	SnapBuildState state;
+
+	/* private memory context used to allocate memory for this module. */
+	MemoryContext context;
+
+	/* all transactions < than this have committed/aborted */
+	TransactionId xmin;
+
+	/* all transactions >= than this are uncommitted */
+	TransactionId xmax;
+
+	/*
+	 * Don't replay commits from an LSN <= this LSN. This can be set
+	 * externally but it will also be advanced (never retreat) from within
+	 * snapbuild.c.
+	 */
+	XLogRecPtr	transactions_after;
+
+	/*
+	 * Don't start decoding WAL until the "xl_running_xacts" information
+	 * indicates there are no running xids with a xid smaller than this.
+	 */
+	TransactionId initial_xmin_horizon;
+
+	/*
+	 * Snapshot thats valid to see all currently committed transactions that
+	 * see catalog modifications.
+	 */
+	Snapshot	snapshot;
+
+	/*
+	 * LSN of the last location we are sure a snapshot has been serialized to.
+	 */
+	XLogRecPtr	last_serialized_snapshot;
+
+	ReorderBuffer *reorder;
+
+	/* variable length data */
+
+	/*
+	 * Information about initially running transactions
+	 *
+	 * When we start building a snapshot there already may be transactions in
+	 * progress.  Those are stored in running.xip.	We don't have enough
+	 * information about those to decode their contents, so until they are
+	 * finished (xcnt=0) we cannot switch to a CONSISTENT state.
+	 */
+	struct
+	{
+		/*
+		 * As long as running.xcnt all XIDs < running.xmin and > running.xmax
+		 * have to be checked whether they still are running.
+		 */
+		TransactionId xmin;
+		TransactionId xmax;
+
+		size_t		xcnt;		/* number of used xip entries */
+		size_t		xcnt_space; /* allocated size of xip */
+		TransactionId *xip;		/* running xacts array, xidComparator-sorted */
+	}			running;
+
+	/*
+	 * Array of transactions which could have catalog changes that committed
+	 * between xmin and xmax
+	 */
+	struct
+	{
+		/* number of committed transactions */
+		size_t		xcnt;
+
+		/* available space for committed transactions */
+		size_t		xcnt_space;
+
+		/*
+		 * Until we reach a CONSISTENT state, we record commits of all
+		 * transactions, not just the catalog changing ones. Record when that
+		 * changes so we know we cannot export a snapshot safely anymore.
+		 */
+		bool		includes_all_transactions;
+
+		/*
+		 * Array of committed transactions that have modified the catalog.
+		 *
+		 * As this array is frequently modified we do *not* keep it in
+		 * xidComparator order. Instead we sort the array when building &
+		 * distributing a snapshot.
+		 *
+		 * XXX: That doesn't seem to be good reasoning anymore. Everytime we
+		 * add something here after becoming consistent will also require
+		 * distributing a snapshot. Storing them sorted would potentially make
+		 * it easier to purge as well (but more complicated wrt wraparound?).
+		 */
+		TransactionId *xip;
+	}			committed;
+
+} SnapBuild;
+
+/*
+ * Starting a transaction -- which we need to do while exporting a snapshot --
+ * removes knowledge about the previously used resowner, so we save it here.
+ */
+ResourceOwner SavedResourceOwnerDuringExport = NULL;
+
+/* transaction state manipulation functions */
+static void SnapBuildEndTxn(SnapBuild *builder, TransactionId xid);
+
+static void SnapBuildAbortTxn(SnapBuild *builder, TransactionId xid, int nsubxacts,
+				  TransactionId *subxacts);
+
+static void SnapBuildCommitTxn(SnapBuild *builder,
+				   XLogRecPtr lsn, TransactionId xid,
+				   int nsubxacts, TransactionId *subxacts);
+
+/* ->running manipulation */
+static bool SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid);
+
+/* ->committed manipulation */
+static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
+
+/* snapshot building/manipulation/distribution functions */
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid);
+
+static void SnapBuildFreeSnapshot(Snapshot snap);
+
+static void SnapBuildSnapIncRefcount(Snapshot snap);
+
+static void SnapBuildDistributeSnapshotNow(SnapBuild *builder, XLogRecPtr lsn);
+
+/* xlog reading helper functions for SnapBuildProcessRecord */
+static SnapBuildAction SnapBuildProcessFindSnapshot(SnapBuild *builder, XLogRecordBuffer *buf);
+static SnapBuildAction SnapBuildProcessHeap(SnapBuild *builder, XLogRecordBuffer *buf);
+static SnapBuildAction SnapBuildProcessHeap2(SnapBuild *builder, XLogRecordBuffer *buf);
+static SnapBuildAction SnapBuildProcessXlog(SnapBuild *builder, XLogRecordBuffer *buf);
+static SnapBuildAction SnapBuildProcessStandby(SnapBuild *builder, XLogRecordBuffer *buf);
+static SnapBuildAction SnapBuildProcessXact(SnapBuild *builder, XLogRecordBuffer *buf);
+
+
+/* on disk serialization & restore */
+static bool SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn);
+static void SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn);
+
+/*
+ * Lookup a table via its current relfilenode.
+ *
+ * This requires that some snapshot in which that relfilenode is actually
+ * visible to be set up.
+ *
+ * The result of this function needs to be released from the syscache.
+ */
+Relation
+LookupRelationByRelFileNode(RelFileNode *relfilenode)
+{
+	HeapTuple	tuple;
+	Oid			heaprel = InvalidOid;
+
+	/* shared relation */
+	if (relfilenode->spcNode == GLOBALTABLESPACE_OID)
+	{
+		heaprel = RelationMapFilenodeToOid(relfilenode->relNode, true);
+	}
+	else
+	{
+		Oid			lookup_tablespace;
+
+		/*
+		 * relations in the default tablespace are stored with InvalidOid as
+		 * pg_class."reltablespace".
+		 */
+		if (relfilenode->spcNode == DEFAULTTABLESPACE_OID)
+			lookup_tablespace = InvalidOid;
+		else
+			lookup_tablespace = relfilenode->spcNode;
+
+		tuple = SearchSysCache2(RELFILENODE,
+								lookup_tablespace,
+								relfilenode->relNode);
+
+		/* ok, found it */
+		if (HeapTupleIsValid(tuple))
+		{
+			heaprel = HeapTupleHeaderGetOid(tuple->t_data);
+			ReleaseSysCache(tuple);
+		}
+		/* has to be nonexistant or a nailed table */
+		else
+		{
+			heaprel = RelationMapFilenodeToOid(relfilenode->relNode, false);
+		}
+	}
+
+	/* shared or nailed table */
+	if (heaprel != InvalidOid)
+		return RelationIdGetRelation(heaprel);
+	return NULL;
+}
+
+
+/*
+ * Allocate a new snapshot builder.
+ */
+SnapBuild *
+AllocateSnapshotBuilder(ReorderBuffer *reorder,
+						TransactionId xmin_horizon,
+						XLogRecPtr start_lsn)
+{
+	MemoryContext context;
+	SnapBuild  *builder;
+
+	context = AllocSetContextCreate(TopMemoryContext,
+									"snapshot builder context",
+									ALLOCSET_DEFAULT_MINSIZE,
+									ALLOCSET_DEFAULT_INITSIZE,
+									ALLOCSET_DEFAULT_MAXSIZE);
+
+	builder = MemoryContextAllocZero(context, sizeof(SnapBuild));
+
+	builder->state = SNAPBUILD_START;
+	builder->context = context;
+	builder->reorder = reorder;
+	/* Other struct members initialized by zeroing, above */
+
+	/* builder->running is initialized by zeroing, above */
+
+	builder->committed.xcnt = 0;
+	builder->committed.xcnt_space = 128;		/* arbitrary number */
+	builder->committed.xip = MemoryContextAlloc(context,
+											  builder->committed.xcnt_space
+												  * sizeof(TransactionId));
+	builder->committed.includes_all_transactions = true;
+	builder->committed.xip =
+		MemoryContextAlloc(context,
+						   builder->committed.xcnt_space *
+						   sizeof(TransactionId));
+	builder->initial_xmin_horizon = xmin_horizon;
+	builder->transactions_after = start_lsn;
+	return builder;
+}
+
+/*
+ * Free a snapshot builder.
+ */
+void
+FreeSnapshotBuilder(SnapBuild *builder)
+{
+	MemoryContext context = builder->context;
+
+	if (builder->snapshot)
+		SnapBuildFreeSnapshot(builder->snapshot);
+
+	if (builder->running.xip)
+		pfree(builder->running.xip);
+
+	if (builder->committed.xip)
+		pfree(builder->committed.xip);
+
+	pfree(builder);
+
+	MemoryContextDelete(context);
+}
+
+/*
+ * Free an unreferenced snapshot that has previously been built by us.
+ */
+static void
+SnapBuildFreeSnapshot(Snapshot snap)
+{
+	/* make sure we don't get passed an external snapshot */
+	Assert(snap->satisfies == HeapTupleSatisfiesMVCCDuringDecoding);
+
+	/* make sure nobody modified our snapshot */
+	Assert(snap->curcid == FirstCommandId);
+	Assert(!snap->suboverflowed);
+	Assert(!snap->takenDuringRecovery);
+	Assert(!snap->regd_count);
+
+	/* slightly more likely, so it's checked even without c-asserts */
+	if (snap->copied)
+		elog(ERROR, "can't free a copied snapshot");
+
+	if (snap->active_count)
+		elog(ERROR, "can't free an active snapshot");
+
+	pfree(snap);
+}
+
+/*
+ * In which state of snapshot building ar we?
+ */
+SnapBuildState
+SnapBuildCurrentState(SnapBuild *builder)
+{
+	return builder->state;
+}
+
+/*
+ * Should the contents of transaction ending at 'ptr' be decoded?
+ */
+bool
+SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr)
+{
+	return ptr <= builder->transactions_after;
+}
+
+/*
+ * Increase refcount of a snapshot.
+ *
+ * This is used when handing out a snapshot to some external resource or when
+ * adding a Snapshot as builder->snapshot.
+ */
+static void
+SnapBuildSnapIncRefcount(Snapshot snap)
+{
+	snap->active_count++;
+}
+
+/*
+ * Decrease refcount of a snapshot and free if the refcount reaches zero.
+ *
+ * Externally visible so external resources that have been handed an IncRef'ed
+ * Snapshot can free it easily.
+ */
+void
+SnapBuildSnapDecRefcount(Snapshot snap)
+{
+	/* make sure we don't get passed an external snapshot */
+	Assert(snap->satisfies == HeapTupleSatisfiesMVCCDuringDecoding);
+
+	/* make sure nobody modified our snapshot */
+	Assert(snap->curcid == FirstCommandId);
+	Assert(!snap->suboverflowed);
+	Assert(!snap->takenDuringRecovery);
+	Assert(!snap->regd_count);
+
+	Assert(snap->active_count);
+
+	/* slightly more likely, so its checked even without casserts */
+	if (snap->copied)
+		elog(ERROR, "can't free a copied snapshot");
+
+	snap->active_count--;
+	if (!snap->active_count)
+		SnapBuildFreeSnapshot(snap);
+}
+
+/*
+ * Build a new snapshot, based on currently committed catalog-modifying
+ * transactions.
+ *
+ * In-progress transactions with catalog access are *not* allowed to modify
+ * these snapshots; they have to copy them and fill in appropriate ->curcid and
+ * ->subxip/subxcnt values.
+ */
+static Snapshot
+SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid)
+{
+	Snapshot	snapshot;
+	Size		ssize;
+
+	Assert(builder->state >= SNAPBUILD_FULL_SNAPSHOT);
+
+	ssize = sizeof(SnapshotData)
+		+ sizeof(TransactionId) * builder->committed.xcnt
+		+ sizeof(TransactionId) * 1 /* toplevel xid */ ;
+
+	snapshot = MemoryContextAllocZero(builder->context, ssize);
+
+	snapshot->satisfies = HeapTupleSatisfiesMVCCDuringDecoding;
+
+	/*
+	 * We misuse the original meaning of SnapshotData's xip and subxip fields
+	 * to make the more fitting for our needs.
+	 *
+	 * In the 'xip' array we store transactions that have to be treated as
+	 * committed. Since we will only ever look at tuples from transactions
+	 * that have modified the catalog its more efficient to store those few
+	 * that exist between xmin and xmax (frequently there are none).
+	 *
+	 * Snapshots that are used in transactions that have modified the catalog
+	 * also use the 'subxip' array to store their toplevel xid and all the
+	 * subtransaction xids so we can recognize when we need to treat rows as
+	 * visible that are not in xip but still need to be visible. Subxip only
+	 * gets filled when the transaction is copied into the context of a
+	 * catalog modifying transaction since we otherwise share a snapshot
+	 * between transactions. As long as a txn hasn't modified the catalog it
+	 * doesn't need to treat any uncommitted rows as visible, so there is no
+	 * need for those xids.
+	 *
+	 * Both arrays are qsort'ed so that we can use bsearch() on them.
+	 *
+	 * XXX: Do we want extra fields instead of misusing existing ones instead?
+	 */
+	Assert(TransactionIdIsNormal(builder->xmin));
+	Assert(TransactionIdIsNormal(builder->xmax));
+
+	snapshot->xmin = builder->xmin;
+	snapshot->xmax = builder->xmax;
+
+	/* store all transactions to be treated as committed by this snapshot */
+	snapshot->xip =
+		(TransactionId *) ((char *) snapshot + sizeof(SnapshotData));
+	snapshot->xcnt = builder->committed.xcnt;
+	memcpy(snapshot->xip, builder->committed.xip,
+		   builder->committed.xcnt * sizeof(TransactionId));
+
+	/* sort so we can bsearch() */
+	qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+
+	/*
+	 * Initially, subxip is empty, i.e. it's a snapshot to be used by
+	 * transactions that don't modify the catalog.  Might be changed later.
+	 * XXX how and by whom?
+	 */
+	snapshot->subxcnt = 0;
+	snapshot->subxip = NULL;
+
+	snapshot->suboverflowed = false;
+	snapshot->takenDuringRecovery = false;
+	snapshot->copied = false;
+	snapshot->curcid = FirstCommandId;
+	snapshot->active_count = 0;
+	snapshot->regd_count = 0;
+
+	return snapshot;
+}
+
+/*
+ * Export a snapshot so it can be set in another session with SET TRANSACTION
+ * SNAPSHOT.
+ *
+ * For that we need to start a transaction in the current backend as the
+ * importing side checks whether the source transaction is still open to make
+ * sure the xmin horizon hasn't advanced since then.
+ *
+ * After that we convert a locally built snapshot into the normal variant
+ * understood by HeapTupleSatisfiesMVCC et al.
+ */
+const char *
+SnapBuildExportSnapshot(SnapBuild *builder)
+{
+	Snapshot	snap;
+	char	   *snapname;
+	TransactionId xid;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+
+	elog(LOG, "building snapshot");
+
+	if (builder->state != SNAPBUILD_CONSISTENT)
+		elog(ERROR, "cannot export a snapshot before reaching a consistent state");
+
+	if (!builder->committed.includes_all_transactions)
+		elog(ERROR, "cannot export a snapshot, not all transactions are monitored anymore");
+
+	/* so we don't overwrite the existing value */
+	if (TransactionIdIsValid(MyPgXact->xmin))
+		elog(ERROR, "cannot export a snapshot when MyPgXact->xmin already is valid");
+
+	if (SavedResourceOwnerDuringExport)
+		elog(ERROR, "can only export one snapshot at a time");
+
+	SavedResourceOwnerDuringExport = CurrentResourceOwner;
+
+	StartTransactionCommand();
+
+	Assert(!FirstSnapshotSet);
+
+	/* There doesn't seem to a nice API to set these */
+	XactIsoLevel = XACT_REPEATABLE_READ;
+	XactReadOnly = true;
+
+	snap = SnapBuildBuildSnapshot(builder,
+								  GetTopTransactionId());
+
+	/*
+	 * We know that snap->xmin is alive, enforced by the logical xmin
+	 * mechanism. Due to that we can do this without locks, we're only
+	 * changing our own value.
+	 */
+	MyPgXact->xmin = snap->xmin;
+
+	/* allocate in transaction context */
+	newxip = (TransactionId *)
+		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+
+	/*
+	 * snapbuild.c builds transactions in an "inverted" manner, which means it
+	 * stores committed transactions in ->xip, not ones in progress. Build a
+	 * classical snapshot by marking all non-committed transactions as
+	 * in-progress.
+	 */
+	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	{
+		void	   *test;
+
+		/*
+		 * check whether transaction committed using the timetravel meaning of
+		 * ->xip
+		 */
+		test = bsearch(&xid, snap->xip, snap->xcnt,
+					   sizeof(TransactionId), xidComparator);
+
+		elog(DEBUG2, "checking xid %u.. %d (xmin %u, xmax %u)",
+			 xid, test == NULL, snap->xmin, snap->xmax);
+
+		if (test == NULL)
+		{
+			if (newxcnt >= GetMaxSnapshotXidCount())
+				elog(ERROR, "snapshot too large");
+
+			newxip[newxcnt++] = xid;
+
+			elog(DEBUG2, "treat %u as in-progress", xid);
+		}
+
+		TransactionIdAdvance(xid);
+	}
+
+	snap->xcnt = newxcnt;
+	snap->xip = newxip;
+
+	snapname = ExportSnapshot(snap);
+
+	elog(LOG, "exported snapbuild snapshot: %s xcnt %u", snapname, snap->xcnt);
+
+	return snapname;
+}
+
+/*
+ * Reset a previously SnapBuildExportSnapshot'ed snapshot if there is
+ * any. Aborts the previously started transaction and resets the resource owner
+ * back to the previous value.
+ */
+void
+SnapBuildClearExportedSnapshot()
+{
+	/* nothing exported, thats the usual case */
+	if (SavedResourceOwnerDuringExport == NULL)
+		return;
+
+	/* make sure nothing  could have ever happened */
+	AbortCurrentTransaction();
+
+	CurrentResourceOwner = SavedResourceOwnerDuringExport;
+	SavedResourceOwnerDuringExport = NULL;
+}
+
+/*
+ * Handle the effects of a single heap change, appropriate to the current state
+ * of the snapshot builder.
+ */
+static SnapBuildAction
+SnapBuildProcessChange(SnapBuild *builder, TransactionId xid,
+					   XLogRecordBuffer *buf, RelFileNode *relfilenode)
+{
+	SnapBuildAction ret = SNAPBUILD_SKIP;
+
+	/*
+	 * We can't handle data in transactions if we haven't built a snapshot
+	 * yet, so don't store them.
+	 */
+	if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
+		;
+
+	/*
+	 * No point in keeping track of changes in transactions that we don't have
+	 * enough information about to decode.
+	 */
+	else if (builder->state < SNAPBUILD_CONSISTENT &&
+			 SnapBuildTxnIsRunning(builder, xid))
+		;
+	else
+	{
+		bool		old_tx = ReorderBufferIsXidKnown(builder->reorder, xid);
+
+		ret = SNAPBUILD_DECODE;
+
+		if (!old_tx || !ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
+		{
+			/* only build snapshot if we don't have a prebuilt one */
+			if (builder->snapshot == NULL)
+			{
+				builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+				/* refcount of the snapshot builder */
+				SnapBuildSnapIncRefcount(builder->snapshot);
+			}
+
+			/* refcount of the transaction */
+			SnapBuildSnapIncRefcount(builder->snapshot);
+			ReorderBufferSetBaseSnapshot(builder->reorder,
+										 xid, buf->origptr,
+										 builder->snapshot);
+		}
+	}
+
+	return ret;
+}
+
+/*
+ * Process a single xlog record.
+ */
+SnapBuildAction
+SnapBuildProcessRecord(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+	SnapBuildAction ret = SNAPBUILD_SKIP;
+
+	/*
+	 * Only search for an initial starting point if we haven't build a full
+	 * snapshot yet.
+	 */
+	if (builder->state < SNAPBUILD_CONSISTENT)
+	{
+		ret = SnapBuildProcessFindSnapshot(builder, buf);
+		if (ret == SNAPBUILD_SKIP)
+			return ret;
+	}
+
+	/*
+	 * Don't have a starting point to decode from, no point in collecting any
+	 * information.
+	 */
+	if (builder->state == SNAPBUILD_START)
+		return SNAPBUILD_SKIP;
+
+	/*
+	 * Check whether individual records require changes to our snapshot and
+	 * whether their content should be decoded because it contains user
+	 * visible data.
+	 */
+	switch (buf->record.xl_rmid)
+	{
+		case RM_XLOG_ID:
+			ret = SnapBuildProcessXlog(builder, buf);
+			break;
+		case RM_STANDBY_ID:
+			ret = SnapBuildProcessStandby(builder, buf);
+			break;
+		case RM_XACT_ID:
+			ret = SnapBuildProcessXact(builder, buf);
+			break;
+		case RM_HEAP_ID:
+			ret = SnapBuildProcessHeap(builder, buf);
+			break;
+		case RM_HEAP2_ID:
+			ret = SnapBuildProcessHeap2(builder, buf);
+			break;
+	}
+
+	return ret;
+}
+
+
+/*
+ * Check whether `xid` is currently 'running'. Running transactions in our
+ * parlance are transactions which we didn't observe from the start so we can't
+ * properly decode them. They only exist after we freshly started from an
+ * < CONSISTENT snapshot.
+ */
+static bool
+SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid)
+{
+	Assert(builder->state < SNAPBUILD_CONSISTENT);
+	Assert(TransactionIdIsValid(builder->running.xmin));
+	Assert(TransactionIdIsValid(builder->running.xmax));
+
+	if (builder->running.xcnt &&
+		NormalTransactionIdFollows(xid, builder->running.xmin) &&
+		NormalTransactionIdPrecedes(xid, builder->running.xmax))
+	{
+		TransactionId *search =
+		bsearch(&xid, builder->running.xip, builder->running.xcnt_space,
+				sizeof(TransactionId), xidComparator);
+
+		if (search != NULL)
+		{
+			Assert(*search == xid);
+			return true;
+		}
+	}
+
+	return false;
+}
+
+/*
+ * Add a new SnapshotNow to all transactions we're decoding that currently are
+ * in-progress so they can see new catalog contents made by the transaction
+ * that just committed.
+ */
+static void
+SnapBuildDistributeSnapshotNow(SnapBuild *builder, XLogRecPtr lsn)
+{
+	dlist_iter	txn_i;
+	ReorderBufferTXN *txn;
+
+	dlist_foreach(txn_i, &builder->reorder->toplevel_by_lsn)
+	{
+		txn = dlist_container(ReorderBufferTXN, node, txn_i.cur);
+
+		/*
+		 * XXX: we can ignore transactions that are known as subxacts here if
+		 * we make sure their parent transaction has a base snapshot if this
+		 * one has one.
+		 */
+
+		/*
+		 * If we don't have a base snapshot yet, there are no changes yet
+		 * which in turn implies we don't yet need a new snapshot.
+		 */
+		if (ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
+		{
+			elog(DEBUG2, "adding a new snapshot to %u at %X/%X",
+				 txn->xid, (uint32) (lsn >> 32), (uint32) lsn);
+			SnapBuildSnapIncRefcount(builder->snapshot);
+			ReorderBufferAddSnapshot(builder->reorder, txn->xid, lsn,
+									 builder->snapshot);
+		}
+	}
+}
+
+/*
+ * Keep track of a new catalog changing transaction that has committed.
+ */
+static void
+SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
+{
+	Assert(TransactionIdIsValid(xid));
+
+	if (builder->committed.xcnt == builder->committed.xcnt_space)
+	{
+		builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
+
+		/* XXX: put in a limit here as a defense against bugs? */
+
+		elog(WARNING, "increasing space for committed transactions to %zu",
+			 builder->committed.xcnt_space);
+
+		builder->committed.xip = repalloc(builder->committed.xip,
+					builder->committed.xcnt_space * sizeof(TransactionId));
+	}
+
+	/*
+	 * XXX: It might make sense to keep the array sorted here instead of doing
+	 * it everytime we build a new snapshot. On the other hand this gets called
+	 * repeatedly when a transaction with subtransactions commits.
+	 */
+	builder->committed.xip[builder->committed.xcnt++] = xid;
+}
+
+/*
+ * Remove all transactions we treat as committed that are smaller than
+ * ->xmin. Those won't ever get checked via the ->commited array but via the
+ * clog machinery, so we don't need to waste memory on them.
+ */
+static void
+SnapBuildPurgeCommittedTxn(SnapBuild *builder)
+{
+	int			off;
+	TransactionId *workspace;
+	int			surviving_xids = 0;
+
+	/* not ready yet */
+	if (!TransactionIdIsNormal(builder->xmin))
+		return;
+
+	/* XXX: Neater algorithm? */
+	workspace =
+		MemoryContextAlloc(builder->context,
+						   builder->committed.xcnt * sizeof(TransactionId));
+
+	/* copy xids that still are interesting to workspace */
+	for (off = 0; off < builder->committed.xcnt; off++)
+	{
+		if (NormalTransactionIdPrecedes(builder->committed.xip[off],
+										builder->xmin))
+			;					/* remove */
+		else
+			workspace[surviving_xids++] = builder->committed.xip[off];
+	}
+
+	/* copy workspace back to persistent state */
+	memcpy(builder->committed.xip, workspace,
+		   surviving_xids * sizeof(TransactionId));
+
+	elog(DEBUG1, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
+		 (uint32) builder->committed.xcnt, (uint32) surviving_xids,
+		 builder->xmin, builder->xmax);
+	builder->committed.xcnt = surviving_xids;
+
+	pfree(workspace);
+}
+
+/*
+ * Common logic for SnapBuildAbortTxn and SnapBuildCommitTxn dealing with
+ * keeping track of the amount of running transactions.
+ */
+static void
+SnapBuildEndTxn(SnapBuild *builder, TransactionId xid)
+{
+	if (builder->state == SNAPBUILD_CONSISTENT)
+		return;
+
+	if (SnapBuildTxnIsRunning(builder, xid))
+	{
+		if (!--builder->running.xcnt)
+		{
+			/*
+			 * none of the originally running transaction is running anymore.
+			 * Due to that our incrementaly built snapshot now is complete.
+			 */
+			elog(LOG, "found consistent point due to SnapBuildEndTxn + running: %u", xid);
+			builder->state = SNAPBUILD_CONSISTENT;
+		}
+	}
+}
+
+/*
+ * Abort a transaction, throw away all state we kept
+ */
+static void
+SnapBuildAbortTxn(SnapBuild *builder, TransactionId xid,
+				  int nsubxacts, TransactionId *subxacts)
+{
+	int			i;
+
+	for (i = 0; i < nsubxacts; i++)
+	{
+		TransactionId subxid = subxacts[i];
+
+		SnapBuildEndTxn(builder, subxid);
+	}
+
+	SnapBuildEndTxn(builder, xid);
+}
+
+/*
+ * Handle everything that needs to be done when a transaction commits
+ */
+static void
+SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
+				   int nsubxacts, TransactionId *subxacts)
+{
+	int			nxact;
+
+	bool		forced_timetravel = false;
+	bool		sub_does_timetravel = false;
+	bool		top_does_timetravel = false;
+
+	TransactionId xmax = xid;
+
+	/*
+	 * If we couldn't observe every change of a transaction because it was
+	 * already running at the point we started to observe we have to assume it
+	 * made catalog changes.
+	 *
+	 * This has the positive benefit that we afterwards have enough
+	 * information to build an exportable snapshot thats usable by pg_dump et
+	 * al.
+	 */
+	if (builder->state < SNAPBUILD_CONSISTENT)
+	{
+		/* ensure that only commits after this are getting replayed */
+		if (builder->transactions_after < lsn)
+			builder->transactions_after = lsn;
+
+		/*
+		 * we could avoid treating !SnapBuildTxnIsRunning transactions as
+		 * timetravel ones, but we want to be able to export a snapshot when
+		 * we reached consistency.
+		 */
+		forced_timetravel = true;
+		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running to early", xid);
+	}
+
+	for (nxact = 0; nxact < nsubxacts; nxact++)
+	{
+		TransactionId subxid = subxacts[nxact];
+
+		/*
+		 * make sure txn is not tracked in running txn's anymore, switch state
+		 */
+		SnapBuildEndTxn(builder, subxid);
+
+		/*
+		 * If we're forcing timetravel we also need accurate subtransaction
+		 * status.
+		 */
+		if (forced_timetravel)
+		{
+			SnapBuildAddCommittedTxn(builder, subxid);
+			if (NormalTransactionIdFollows(subxid, xmax))
+				xmax = subxid;
+		}
+
+		/*
+		 * add subtransaction to base snapshot, we don't distinguish to
+		 * toplevel transactions there.
+		 */
+		else if (ReorderBufferXidDoesTimetravel(builder->reorder, subxid))
+		{
+			sub_does_timetravel = true;
+
+			elog(DEBUG1, "found subtransaction %u:%u with catalog changes.",
+				 xid, subxid);
+
+			SnapBuildAddCommittedTxn(builder, subxid);
+
+			if (NormalTransactionIdFollows(subxid, xmax))
+				xmax = subxid;
+		}
+	}
+
+	/*
+	 * make sure txn is not tracked in running txn's anymore, switch state
+	 */
+	SnapBuildEndTxn(builder, xid);
+
+	if (forced_timetravel)
+	{
+		elog(DEBUG1, "forced transaction %u to do timetravel.", xid);
+
+		SnapBuildAddCommittedTxn(builder, xid);
+	}
+	/* add toplevel transaction to base snapshot */
+	else if (ReorderBufferXidDoesTimetravel(builder->reorder, xid))
+	{
+		elog(DEBUG1, "found top level transaction %u, with catalog changes!",
+			 xid);
+
+		top_does_timetravel = true;
+		SnapBuildAddCommittedTxn(builder, xid);
+	}
+	else if (sub_does_timetravel)
+	{
+		/* mark toplevel txn as timetravel as well */
+		SnapBuildAddCommittedTxn(builder, xid);
+	}
+
+	if (forced_timetravel || top_does_timetravel || sub_does_timetravel)
+	{
+		if (!TransactionIdIsValid(builder->xmax) ||
+			TransactionIdFollowsOrEquals(xmax, builder->xmax))
+		{
+			builder->xmax = xmax;
+			TransactionIdAdvance(builder->xmax);
+		}
+
+		if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
+			return;
+
+		/* refcount of the transaction */
+		if (builder->snapshot)
+			SnapBuildSnapDecRefcount(builder->snapshot);
+
+		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+
+		/* refcount of the snapshot builder */
+		SnapBuildSnapIncRefcount(builder->snapshot);
+
+		/* add a new SnapshotNow to all currently running transactions */
+		SnapBuildDistributeSnapshotNow(builder, lsn);
+	}
+	else
+	{
+		/* record that we cannot export a general snapshot anymore */
+		builder->committed.includes_all_transactions = false;
+	}
+}
+
+
+/* -----------------------------------
+ * Snapshot building functions dealing with xlog records
+ * -----------------------------------
+ */
+
+/*
+ * Build the start of a snapshot that's capable of decoding the catalog.
+ */
+static SnapBuildAction
+SnapBuildProcessFindSnapshot(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+	uint8		info = buf->record.xl_info & ~XLR_INFO_MASK;
+	xl_running_xacts *running;
+
+	/* we need a RUNNING_XACTS record */
+	if (buf->record.xl_rmid != RM_STANDBY_ID || info != XLOG_RUNNING_XACTS)
+		return SNAPBUILD_DECODE;
+
+	/* ---
+	 * Build catalog decoding snapshot incrementally using information about
+	 * the currently running transactions. There are several ways to achieve that:
+	 * a) there were no running transactions at all
+	 * b) all transactions that were known to be running at a previous xl_running record
+	 *    now have finished (c.f. SnapBuildEndTxn).
+	 * c) This (in a previous run) or another decoding slot serialized a
+	 *    snapshot to disk that we can use to start us up.
+	 * ---
+	 */
+	running = (xl_running_xacts *) buf->record_data;
+
+	/*
+	 * xl_running_xact record is older than what we can use, we might not have
+	 * all necessary catalog rows anymore.
+	 */
+	if (TransactionIdIsNormal(builder->initial_xmin_horizon) &&
+		NormalTransactionIdPrecedes(running->oldestRunningXid,
+									builder->initial_xmin_horizon))
+	{
+		elog(LOG, "skipping snapshot at %X/%X due to initial xmin horizon of %u vs the snapshot's %u",
+			 (uint32) (buf->origptr >> 32), (uint32) buf->origptr,
+			 builder->initial_xmin_horizon, running->oldestRunningXid);
+	}
+
+	/*
+	 * a) No transaction were running, we can jump to consistent.
+	 *
+	 * NB: We might have already started to incrementally assemble a snapshot,
+	 * so we need to be careful to deal with that.
+	 */
+	else if (running->xcnt == 0)
+	{
+		if (builder->transactions_after == InvalidXLogRecPtr ||
+			builder->transactions_after < buf->origptr)
+			builder->transactions_after = buf->origptr;
+
+		builder->xmin = running->oldestRunningXid;
+		builder->xmax = running->latestCompletedXid;
+		TransactionIdAdvance(builder->xmax);
+
+		Assert(TransactionIdIsNormal(builder->xmin));
+		Assert(TransactionIdIsNormal(builder->xmax));
+
+		/* no transactions running now */
+		builder->running.xcnt = 0;
+		builder->running.xmin = InvalidTransactionId;
+		builder->running.xmax = InvalidTransactionId;
+
+		/*
+		 * FIXME: abort everything we have stored about running transactions,
+		 * relevant e.g. after a crash.
+		 */
+		builder->state = SNAPBUILD_CONSISTENT;
+
+		elog(LOG, "found initial snapshot (xmin %u) due to running xacts with xcnt == 0",
+			 builder->xmin);
+		return SNAPBUILD_SKIP;
+	}
+	/* c) valid on disk state */
+	else if (SnapBuildRestore(builder, buf->origptr))
+	{
+		Assert(builder->state == SNAPBUILD_CONSISTENT);
+		elog(LOG, "recovered initial snapshot (xmin %u) from disk",
+			 builder->xmin);
+		return SNAPBUILD_SKIP;
+	}
+
+	/*
+	 * b) first encounter of a useable xl_running_xacts record. If we had found
+	 * one earlier we would either track running transactions or be
+	 * consistent.
+	 */
+	else if (!builder->running.xcnt)
+	{
+		/*
+		 * We only care about toplevel xids as those are the ones we
+		 * definitely see in the wal stream. As snapbuild.c tracks committed
+		 * instead of running transactions we don't need to know anything
+		 * about uncommitted subtransactions.
+		 */
+		builder->xmin = running->oldestRunningXid;
+		builder->xmax = running->latestCompletedXid;
+		TransactionIdAdvance(builder->xmax);
+
+		/* so we can safely use the faster comparisons */
+		Assert(TransactionIdIsNormal(builder->xmin));
+		Assert(TransactionIdIsNormal(builder->xmax));
+
+		builder->running.xcnt = running->xcnt;
+		builder->running.xcnt_space = running->xcnt;
+		builder->running.xip =
+			MemoryContextAlloc(builder->context,
+							builder->running.xcnt * sizeof(TransactionId));
+		memcpy(builder->running.xip, running->xids,
+			   builder->running.xcnt * sizeof(TransactionId));
+
+		/* sort so we can do a binary search */
+		qsort(builder->running.xip, builder->running.xcnt,
+			  sizeof(TransactionId), xidComparator);
+
+		builder->running.xmin = builder->running.xip[0];
+		builder->running.xmax = builder->running.xip[running->xcnt - 1];
+
+		/* makes comparisons cheaper later */
+		TransactionIdRetreat(builder->running.xmin);
+		TransactionIdAdvance(builder->running.xmax);
+
+		builder->state = SNAPBUILD_FULL_SNAPSHOT;
+
+		elog(LOG, "found initial snapshot (xmin %u) due to running xacts, %u xacts need to finish",
+			 builder->xmin, (uint32) builder->running.xcnt);
+
+		return SNAPBUILD_SKIP;
+	}
+
+	/*
+	 * We already started to track running xacts and need to wait for all
+	 * in-progress ones to finish. We fall through to the normal processing of
+	 * records so incremental cleanup can be performed.
+	 */
+	return SNAPBUILD_DECODE;
+}
+
+/*
+ * Process RM_HEAP_ID records for SnapBuildProcessRecord()
+ */
+static SnapBuildAction
+SnapBuildProcessHeap(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+	uint8		info = buf->record.xl_info & ~XLR_INFO_MASK;
+	SnapBuildAction ret = SNAPBUILD_SKIP;
+	TransactionId xid = buf->record.xl_xid;
+
+	switch (info & XLOG_HEAP_OPMASK)
+	{
+		case XLOG_HEAP_INPLACE:
+			{
+				xl_heap_inplace *xlrec;
+
+				xlrec = (xl_heap_inplace *) buf->record_data;
+
+				ret = SnapBuildProcessChange(builder, xid, buf,
+											 &xlrec->target.node);
+
+				/* heap_inplace is only done in catalog modifying txns */
+				ReorderBufferXidSetTimetravel(builder->reorder, xid, buf->origptr);
+
+				break;
+			}
+
+		case XLOG_HEAP_LOCK:
+
+			/*
+			 * We only ever read changes, so row level locks aren't
+			 * interesting.
+			 */
+			break;
+
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *xlrec = (xl_heap_insert *) buf->record_data;
+
+				ret = SnapBuildProcessChange(builder, xid, buf,
+											 &xlrec->target.node);
+				break;
+			}
+			/* HEAP(_HOT)?_UPDATE use the same data layout */
+		case XLOG_HEAP_UPDATE:
+		case XLOG_HEAP_HOT_UPDATE:
+			{
+				xl_heap_update *xlrec = (xl_heap_update *) buf->record_data;
+
+				ret = SnapBuildProcessChange(builder, xid, buf,
+											 &xlrec->target.node);
+				break;
+			}
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *xlrec = (xl_heap_delete *) buf->record_data;
+
+				ret = SnapBuildProcessChange(builder, xid, buf,
+											 &xlrec->target.node);
+				break;
+			}
+		default:
+			break;
+	}
+	return ret;
+}
+
+/*
+ * Process RM_HEAP2_ID records for SnapBuildProcessRecord()
+ */
+static SnapBuildAction
+SnapBuildProcessHeap2(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+	uint8		info = buf->record.xl_info & ~XLR_INFO_MASK;
+	SnapBuildAction ret = SNAPBUILD_SKIP;
+	TransactionId xid = buf->record.xl_xid;
+
+	switch (info)
+	{
+		case XLOG_HEAP2_MULTI_INSERT:
+			{
+				xl_heap_multi_insert *xlrec;
+
+				xlrec = (xl_heap_multi_insert *) buf->record_data;
+
+				ret = SnapBuildProcessChange(builder, xid, buf,
+											 &xlrec->node);
+				break;
+			}
+		case XLOG_HEAP2_NEW_CID:
+			{
+				xl_heap_new_cid *xlrec;
+				CommandId	cid;
+
+				xlrec = (xl_heap_new_cid *) buf->record_data;
+
+				/*
+				 * we only log new_cid's if a catalog tuple was modified, so
+				 * set transaction to timetravelling.
+				 */
+				ReorderBufferXidSetTimetravel(builder->reorder, xid,
+											  buf->origptr);
+
+				ReorderBufferAddNewTupleCids(builder->reorder,
+											 xlrec->top_xid,
+											 buf->origptr,
+											 xlrec->target.node,
+											 xlrec->target.tid,
+											 xlrec->cmin, xlrec->cmax,
+											 xlrec->combocid);
+
+				/* figure out new command id */
+				if (xlrec->cmin != InvalidCommandId &&
+					xlrec->cmax != InvalidCommandId)
+					cid = Max(xlrec->cmin, xlrec->cmax);
+				else if (xlrec->cmax != InvalidCommandId)
+					cid = xlrec->cmax;
+				else if (xlrec->cmin != InvalidCommandId)
+					cid = xlrec->cmin;
+				else
+				{
+					cid = InvalidCommandId;		/* silence compiler */
+					elog(ERROR, "broken arrow, no cid?");
+				}
+
+				/*
+				 * FIXME: potential race condition here: if multiple snapshots
+				 * were running & generating changes in the same transaction
+				 * on the source side this could be problematic. But this
+				 * cannot happen for system catalogs, right?
+				 */
+				ReorderBufferAddNewCommandId(builder->reorder, xid,
+											 buf->origptr, cid + 1);
+				break;
+			}
+		default:
+			break;
+	}
+
+	return ret;
+}
+
+/*
+ * Process RM_XLOG_ID records for SnapBuildProcessRecord()
+ */
+static SnapBuildAction
+SnapBuildProcessXlog(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+	uint8		info = buf->record.xl_info & ~XLR_INFO_MASK;
+
+	switch (info)
+	{
+		case XLOG_CHECKPOINT_SHUTDOWN:
+
+			/*
+			 * FIXME: abort everything but prepared xacts, we don't track
+			 * prepared xacts though so far.  It might alo be neccesary to do
+			 * this to handle subtxn ids that haven't been assigned to a
+			 * toplevel xid after a crash.
+			 */
+			SnapBuildSerialize(builder, buf->origptr);
+			break;
+		case XLOG_CHECKPOINT_ONLINE:
+
+			/*
+			 * a RUNNING_XACTS record will have been logged around this, we
+			 * can restart from there.
+			 */
+			break;
+		default:
+			break;
+	}
+	return SNAPBUILD_SKIP;
+}
+
+/*
+ * Process RM_STANDBY_ID records for SnapBuildProcessRecord()
+ */
+static SnapBuildAction
+SnapBuildProcessStandby(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+	uint8		info = buf->record.xl_info & ~XLR_INFO_MASK;
+
+	switch (info)
+	{
+		case XLOG_RUNNING_XACTS:
+			{
+				xl_running_xacts *running;
+				ReorderBufferTXN *txn;
+
+				running = (xl_running_xacts *) buf->record_data;
+
+				SnapBuildSerialize(builder, buf->origptr);
+
+				/*
+				 * update range of interesting xids. We don't increase ->xmax
+				 * because once we are in a consistent state we can do that
+				 * ourselves and much more efficiently so because we only need
+				 * to do it for catalog transactions.
+				 */
+				builder->xmin = running->oldestRunningXid;
+
+
+				/*
+				 * xmax can be lower than xmin here because we only increase
+				 * xmax when we hit a transaction with catalog changes. While
+				 * odd looking, its correct and actually more efficient this
+				 * way since we hit fast paths in tqual.c.
+				 */
+
+				/*
+				 * Remove transactions we don't need to keep track off
+				 * anymore.
+				 */
+				SnapBuildPurgeCommittedTxn(builder);
+
+				elog(DEBUG1, "xmin: %u, xmax: %u, oldestrunning: %u",
+					 builder->xmin, builder->xmax,
+					 running->oldestRunningXid);
+
+				/*
+				 * inrease shared memory state, so vacuum can work on tuples
+				 * we prevent from being purged.
+				 */
+				IncreaseLogicalXminForSlot(buf->origptr,
+										   running->oldestRunningXid);
+
+				/*
+				 * Also tell the slot where we can restart decoding from. We
+				 * don't want to do that after every commit because changing
+				 * that implies an fsync...
+				 */
+				txn = ReorderBufferGetOldestTXN(builder->reorder);
+
+				/*
+				 * oldest ongoing txn might have started when we didn't yet
+				 * serialize anything because we haven't reached a consistent
+				 * state yet.
+				 */
+				if (txn != NULL &&
+					txn->restart_decoding_lsn != InvalidXLogRecPtr)
+				{
+					IncreaseRestartDecodingForSlot(buf->origptr,
+												   txn->restart_decoding_lsn);
+				}
+
+				/*
+				 * no ongoing transaction, can reuse the last serialized
+				 * snapshot if we have one.
+				 */
+				else if (txn == NULL &&
+				  builder->reorder->current_restart_decoding_lsn != InvalidXLogRecPtr &&
+					builder->last_serialized_snapshot != InvalidXLogRecPtr)
+				{
+					IncreaseRestartDecodingForSlot(buf->origptr,
+										builder->last_serialized_snapshot);
+				}
+
+				break;
+			}
+		case XLOG_STANDBY_LOCK:
+		default:
+			break;
+	}
+	return SNAPBUILD_SKIP;
+}
+
+/*
+ * Process RM_XACT_ID records for SnapBuildProcessRecord()
+ */
+static SnapBuildAction
+SnapBuildProcessXact(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+	uint8		info = buf->record.xl_info & ~XLR_INFO_MASK;
+	SnapBuildAction ret = SNAPBUILD_SKIP;
+	TransactionId xid = buf->record.xl_xid;
+
+
+	switch (info)
+	{
+		case XLOG_XACT_COMMIT:
+			{
+				xl_xact_commit *xlrec = (xl_xact_commit *) buf->record_data;
+
+				/*
+				 * Queue cache invalidation messages.
+				 */
+				if (xlrec->nmsgs)
+				{
+					TransactionId *subxacts;
+					SharedInvalidationMessage *inval_msgs;
+
+					/* subxid array follows relfilenodes */
+					subxacts = (TransactionId *)
+						&(xlrec->xnodes[xlrec->nrels]);
+					/* invalidation messages follow subxids */
+					inval_msgs = (SharedInvalidationMessage *)
+						&(subxacts[xlrec->nsubxacts]);
+
+					/*
+					 * no need to check XactCompletionRelcacheInitFileInval,
+					 * we will process the sinval messages that the relmapper
+					 * change has generated.
+					 */
+					ReorderBufferAddInvalidations(builder->reorder, xid,
+												  buf->origptr,
+												  xlrec->nmsgs, inval_msgs);
+
+					/*
+					 * Let everyone know that this transaction modified the
+					 * catalog. We need this at commit time.
+					 */
+					ReorderBufferXidSetTimetravel(builder->reorder, xid,
+												  buf->origptr);
+
+				}
+
+				SnapBuildCommitTxn(builder, buf->origptr, xid,
+								   xlrec->nsubxacts,
+								   (TransactionId *) &xlrec->xnodes);
+				ret = SNAPBUILD_DECODE;
+				break;
+			}
+		case XLOG_XACT_COMMIT_COMPACT:
+			{
+				xl_xact_commit_compact *xlrec;
+
+				xlrec = (xl_xact_commit_compact *) buf->record_data;
+
+				SnapBuildCommitTxn(builder, buf->origptr, xid,
+								   xlrec->nsubxacts, xlrec->subxacts);
+
+				ret = SNAPBUILD_DECODE;
+				break;
+			}
+		case XLOG_XACT_COMMIT_PREPARED:
+			{
+				xl_xact_commit_prepared *xlrec;
+				TransactionId *subxacts;
+
+				xlrec = (xl_xact_commit_prepared *) buf->record_data;
+				subxacts = (TransactionId *) &xlrec->crec.xnodes;
+				/* FIXME: check for invalidation messages! */
+
+				SnapBuildCommitTxn(builder, buf->origptr,
+								   xlrec->xid,
+								   xlrec->crec.nsubxacts, subxacts);
+
+				ret = SNAPBUILD_DECODE;
+				break;
+			}
+		case XLOG_XACT_ABORT:
+			{
+				xl_xact_abort *xlrec;
+				TransactionId *subxacts;
+
+				xlrec = (xl_xact_abort *) buf->record_data;
+				subxacts = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+
+				SnapBuildAbortTxn(builder, xid, xlrec->nsubxacts, subxacts);
+
+				ret = SNAPBUILD_DECODE;
+				break;
+			}
+		case XLOG_XACT_ABORT_PREPARED:
+			{
+				xl_xact_abort_prepared *xlrec;
+				xl_xact_abort *arec;
+				TransactionId *subxacts;
+
+				xlrec = (xl_xact_abort_prepared *) buf->record_data;
+				arec = &xlrec->arec;
+				subxacts = (TransactionId *) &(arec->xnodes[arec->nrels]);
+
+				SnapBuildAbortTxn(builder, xlrec->xid, arec->nsubxacts,
+								  subxacts);
+
+				ret = SNAPBUILD_DECODE;
+				break;
+			}
+		case XLOG_XACT_ASSIGNMENT:
+			break;
+		case XLOG_XACT_PREPARE:
+
+			/*
+			 * XXX: We could take note of all in-progress prepared xacts so we
+			 * can use shutdown checkpoints to abort in-progress
+			 * transactions...
+			 */
+			break;
+		default:
+			break;
+	}
+	return ret;
+}
+
+/* -----------------------------------
+ * Snapshot serialization support
+ * -----------------------------------
+ */
+
+/*
+ * We store current state of struct SnapBuild on disk in the following manner:
+ *
+ * struct SnapBuild;
+ * TransactionId * running.xcnt_space;
+ * TransactionId * committed.xcnt; (*not xcnt_space*)
+ *
+ */
+typedef struct SnapBuildOnDisk
+{
+	uint32		magic;
+	/* how large is the SnapBuildOnDisk including all data in state */
+	Size		size;
+	SnapBuild	builder;
+	/* variable amount of TransactionId's */
+} SnapBuildOnDisk;
+
+#define SNAPBUILD_MAGIC 0x51A1E001
+
+/*
+ * Serialize the snapshot 'builder' at the location 'lsn' if it hasn't already
+ * been done by another decoding process.
+ */
+static void
+SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
+{
+	Size		needed_size;
+	SnapBuildOnDisk *ondisk;
+	char	   *ondisk_c;
+	int			fd;
+	char		tmppath[MAXPGPATH];
+	char		path[MAXPGPATH];
+	int			ret;
+	struct stat stat_buf;
+
+	needed_size = sizeof(SnapBuildOnDisk) +
+		sizeof(TransactionId) * builder->running.xcnt_space +
+		sizeof(TransactionId) * builder->committed.xcnt;
+
+	Assert(lsn != InvalidXLogRecPtr);
+	Assert(builder->last_serialized_snapshot == InvalidXLogRecPtr ||
+		   builder->last_serialized_snapshot <= lsn);
+
+	/*
+	 * no point in serializing if we cannot continue to work immediately after
+	 * restoring the snapshot
+	 */
+	if (builder->state < SNAPBUILD_CONSISTENT)
+		return;
+
+	/*
+	 * FIXME: Timeline handling
+	 */
+
+	/*
+	 * first check whether some other backend already has written the snapshot
+	 * for this LSN
+	 */
+	sprintf(path, "pg_llog/snapshots/%X-%X.snap",
+			(uint32) (lsn >> 32), (uint32) lsn);
+
+	ret = stat(path, &stat_buf);
+
+	if (ret != 0 && errno != ENOENT)
+		ereport(ERROR, (errmsg("could not stat snapbuild state file %s", path)));
+	else if (ret == 0)
+	{
+		/*
+		 * somebody else has already serialized to this point, don't overwrite
+		 * but remember location, so we don't need to read old data again.
+		 */
+		builder->last_serialized_snapshot = lsn;
+		goto out;
+	}
+
+	/*
+	 * there is an obvious race condition here between the time we stat(2) the
+	 * file and us writing the file. But we rename the file into place
+	 * atomically and all files created need to contain the same data anyway,
+	 * so this is perfectly fine, although a bit of a resource waste. Locking
+	 * seems like pointless complication.
+	 */
+	elog(LOG, "serializing snapshot to %s", path);
+
+	/* to make sure only we will write to this tempfile, include pid */
+	sprintf(tmppath, "pg_llog/snapshots/%X-%X.snap.%u.tmp",
+			(uint32) (lsn >> 32), (uint32) lsn, getpid());
+
+	/*
+	 * unlink if file already exists, needs to have been before a crash/error
+	 */
+	if (unlink(tmppath) != 0 && errno != ENOENT)
+		ereport(ERROR, (errmsg("could not unlink old file %s", path)));
+
+	ondisk = MemoryContextAllocZero(builder->context, needed_size);
+	ondisk_c = ((char *) ondisk) + sizeof(SnapBuildOnDisk);
+	ondisk->magic = SNAPBUILD_MAGIC;
+	ondisk->size = needed_size;
+
+	/* copy state per struct assignment, lalala lazy. */
+	ondisk->builder = *builder;
+
+	/* NULL-ify memory-only data */
+	ondisk->builder.context = NULL;
+	ondisk->builder.snapshot = NULL;
+	ondisk->builder.reorder = NULL;
+
+	/* copy running xacts */
+	memcpy(ondisk_c, builder->running.xip,
+		   sizeof(TransactionId) * builder->running.xcnt_space);
+	ondisk_c += sizeof(TransactionId) * builder->running.xcnt_space;
+
+	/* copy  committed xacts */
+	memcpy(ondisk_c, builder->committed.xip,
+		   sizeof(TransactionId) * builder->committed.xcnt);
+	ondisk_c += sizeof(TransactionId) * builder->committed.xcnt;
+
+	/* we have valid data now, open tempfile and write it there */
+	fd = OpenTransientFile(tmppath,
+						   O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
+						   S_IRUSR | S_IWUSR);
+	if (fd < 0)
+		ereport(ERROR, (errmsg("could not open snapbuild state file %s for writing: %m", path)));
+
+	if ((write(fd, ondisk, needed_size)) != needed_size)
+	{
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to snapbuild state file \"%s\": %m",
+						tmppath)));
+	}
+
+	/*
+	 * fsync the file before renaming so that even if we crash after this we
+	 * have either a fully valid file or nothing.
+	 *
+	 * XXX: Do the fsync() via checkpoints/restartpoints, doing it here has
+	 * some noticeable overhead?
+	 */
+	if (pg_fsync(fd) != 0)
+	{
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync snapbuild state file \"%s\": %m",
+						tmppath)));
+	}
+
+	CloseTransientFile(fd);
+
+	/*
+	 * We may overwrite the work from some other backend, but that's ok, our
+	 * snapshot is valid as well.
+	 */
+	if (rename(tmppath, path) != 0)
+	{
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not rename snapbuild state file from \"%s\" to \"%s\": %m",
+						tmppath, path)));
+	}
+
+	/* make sure we persist */
+	fsync_fname(path, false);
+	fsync_fname("pg_llog/snapshots", true);
+
+	/* remember serialization point */
+	builder->last_serialized_snapshot = lsn;
+
+out:
+	ReorderBufferSetRestartPoint(builder->reorder,
+								 builder->last_serialized_snapshot);
+}
+
+/*
+ * Restore a snapshot into 'builder' if previously one has been stored at the
+ * location indicated by 'lsn'. Returns true if successfull, false otherwise.
+ */
+static bool
+SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
+{
+	SnapBuildOnDisk ondisk;
+	int			fd;
+	char		path[MAXPGPATH];
+	Size		sz;
+
+	sprintf(path, "pg_llog/snapshots/%X-%X.snap",
+			(uint32) (lsn >> 32), (uint32) lsn);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+
+	elog(LOG, "restoring snapbuild state from %s", path);
+
+	if (fd < 0 && errno == ENOENT)
+		return false;
+	else if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open snapbuild state file %s", path)));
+
+	elog(LOG, "really restoring from %s", path);
+
+	/* read statically sized portion of snapshot */
+	if (read(fd, &ondisk, sizeof(ondisk)) != sizeof(ondisk))
+	{
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read snapbuild file \"%s\": %m",
+						path)));
+	}
+
+	if (ondisk.magic != SNAPBUILD_MAGIC)
+		ereport(ERROR, (errmsg("snapbuild state file has wrong magic %u instead of %u",
+							   ondisk.magic, SNAPBUILD_MAGIC)));
+
+	/* restore running xact information */
+	sz = sizeof(TransactionId) * ondisk.builder.running.xcnt_space;
+	ondisk.builder.running.xip = MemoryContextAlloc(builder->context, sz);
+	if (read(fd, ondisk.builder.running.xip, sz) != sz)
+	{
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read running xacts from snapbuild file \"%s\": %m",
+			   path)));
+	}
+
+	/* restore running xact information */
+	sz = sizeof(TransactionId) * ondisk.builder.committed.xcnt;
+	ondisk.builder.committed.xip = MemoryContextAlloc(builder->context, sz);
+	if (read(fd, ondisk.builder.committed.xip, sz) != sz)
+	{
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read committed xacts from snapbuild file \"%s\": %m",
+						path)));
+	}
+
+	CloseTransientFile(fd);
+
+	/*
+	 * ok, we now have a sensible snapshot here, figure out if it has more
+	 * information than we have.
+	 */
+
+	/*
+	 * We are only interested in consistent snapshots for now, comparing
+	 * whether one imcomplete snapshot is more "advanced" seems to be
+	 * unnecessarily complex.
+	 */
+	if (ondisk.builder.state < SNAPBUILD_CONSISTENT)
+		goto snapshot_not_interesting;
+
+	/*
+	 * Don't use a snapshot that requires an xmin that we cannot guarantee to
+	 * be available.
+	 */
+	if (TransactionIdPrecedes(ondisk.builder.xmin, builder->initial_xmin_horizon))
+		goto snapshot_not_interesting;
+
+	/*
+	 * XXX: transactions_after needs to be updated differently, to be checked
+	 * here
+	 */
+
+	/* ok, we think the snapshot is sensible, copy over everything important */
+	builder->xmin = ondisk.builder.xmin;
+	builder->xmax = ondisk.builder.xmax;
+	builder->state = ondisk.builder.state;
+
+	builder->committed.xcnt = ondisk.builder.committed.xcnt;
+	/* We only allocated/stored xcnt, not xcnt_space xids ! */
+	/* don't overwrite preallocated xip, if we don't have anything here */
+	if (builder->committed.xcnt > 0)
+	{
+		pfree(builder->committed.xip);
+		builder->committed.xcnt_space = ondisk.builder.committed.xcnt;
+		builder->committed.xip = ondisk.builder.committed.xip;
+	}
+	ondisk.builder.committed.xip = NULL;
+
+	builder->running.xcnt = ondisk.builder.committed.xcnt;
+	if (builder->running.xip)
+		pfree(builder->running.xip);
+	builder->running.xcnt_space = ondisk.builder.committed.xcnt_space;
+	builder->running.xip = ondisk.builder.running.xip;
+
+	/* our snapshot is not interesting anymore, build a new one */
+	if (builder->snapshot != NULL)
+	{
+		SnapBuildSnapDecRefcount(builder->snapshot);
+	}
+	builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidTransactionId);
+	SnapBuildSnapIncRefcount(builder->snapshot);
+
+	ReorderBufferSetRestartPoint(builder->reorder, lsn);
+
+	return true;
+
+snapshot_not_interesting:
+	if (ondisk.builder.running.xip != NULL)
+		pfree(ondisk.builder.running.xip);
+	if (ondisk.builder.committed.xip != NULL)
+		pfree(ondisk.builder.committed.xip);
+	return false;
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index bce18b8..2de01f1 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -65,7 +65,7 @@ Node *replication_parse_result;
 }
 
 /* Non-keyword tokens */
-%token <str> SCONST
+%token <str> SCONST IDENT
 %token <intval> ICONST
 %token <recptr> RECPTR
 
@@ -73,6 +73,9 @@ Node *replication_parse_result;
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
 %token K_START_REPLICATION
+%token K_INIT_LOGICAL_REPLICATION
+%token K_START_LOGICAL_REPLICATION
+%token K_FREE_LOGICAL_REPLICATION
 %token K_TIMELINE_HISTORY
 %token K_LABEL
 %token K_PROGRESS
@@ -82,10 +85,13 @@ Node *replication_parse_result;
 %token K_TIMELINE
 
 %type <node>	command
-%type <node>	base_backup start_replication identify_system timeline_history
+%type <node>	base_backup start_replication start_logical_replication init_logical_replication free_logical_replication identify_system timeline_history
 %type <list>	base_backup_opt_list
 %type <defelt>	base_backup_opt
 %type <intval>	opt_timeline
+%type <list>	plugin_options plugin_opt_list
+%type <defelt>	plugin_opt_elem
+%type <node>	plugin_opt_arg
 %%
 
 firstcmd: command opt_semicolon
@@ -102,6 +108,9 @@ command:
 			identify_system
 			| base_backup
 			| start_replication
+			| init_logical_replication
+			| start_logical_replication
+			| free_logical_replication
 			| timeline_history
 			;
 
@@ -186,6 +195,67 @@ opt_timeline:
 				| /* nothing */			{ $$ = 0; }
 			;
 
+init_logical_replication:
+			K_INIT_LOGICAL_REPLICATION IDENT IDENT
+				{
+					InitLogicalReplicationCmd *cmd;
+					cmd = makeNode(InitLogicalReplicationCmd);
+					cmd->name = $2;
+					cmd->plugin = $3;
+					$$ = (Node *) cmd;
+				}
+			;
+
+start_logical_replication:
+			K_START_LOGICAL_REPLICATION IDENT RECPTR plugin_options
+				{
+					StartLogicalReplicationCmd *cmd;
+					cmd = makeNode(StartLogicalReplicationCmd);
+					cmd->name = $2;
+					cmd->startpoint = $3;
+					cmd->options = $4;
+					$$ = (Node *) cmd;
+				}
+			;
+
+plugin_options:
+			'(' plugin_opt_list ')'			{ $$ = $2; }
+			| /* EMPTY */					{ $$ = NIL; }
+		;
+
+plugin_opt_list:
+			plugin_opt_elem
+				{
+					$$ = list_make1($1);
+				}
+			| plugin_opt_list ',' plugin_opt_elem
+				{
+					$$ = lappend($1, $3);
+				}
+		;
+
+plugin_opt_elem:
+			IDENT plugin_opt_arg
+				{
+					$$ = makeDefElem($1, $2);
+				}
+		;
+
+plugin_opt_arg:
+			SCONST							{ $$ = (Node *) makeString($1); }
+			| /* EMPTY */					{ $$ = NULL; }
+		;
+
+free_logical_replication:
+			K_FREE_LOGICAL_REPLICATION IDENT
+				{
+					FreeLogicalReplicationCmd *cmd;
+					cmd = makeNode(FreeLogicalReplicationCmd);
+					cmd->name = $2;
+					$$ = (Node *) cmd;
+				}
+			;
+
 /*
  * TIMELINE_HISTORY %d
  */
@@ -205,6 +275,7 @@ timeline_history:
 					$$ = (Node *) cmd;
 				}
 			;
+
 %%
 
 #include "repl_scanner.c"
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index b4743e6..1044bd0 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -16,6 +16,7 @@
 #include "postgres.h"
 
 #include "utils/builtins.h"
+#include "parser/scansup.h"
 
 /* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
 #undef fprintf
@@ -48,7 +49,7 @@ static void addlitchar(unsigned char ychar);
 %option warn
 %option prefix="replication_yy"
 
-%x xq
+%x xq xd
 
 /* Extended quote
  * xqdouble implements embedded quote, ''''
@@ -57,12 +58,26 @@ xqstart			{quote}
 xqdouble		{quote}{quote}
 xqinside		[^']+
 
+/* Double quote
+ * Allows embedded spaces and other special characters into identifiers.
+ */
+dquote			\"
+xdstart			{dquote}
+xdstop			{dquote}
+xddouble		{dquote}{dquote}
+xdinside		[^"]+
+
 digit			[0-9]+
 hexdigit		[0-9A-Za-z]+
 
 quote			'
 quotestop		{quote}
 
+ident_start		[A-Za-z\200-\377_]
+ident_cont		[A-Za-z\200-\377_0-9\$]
+
+identifier		{ident_start}{ident_cont}*
+
 %%
 
 BASE_BACKUP			{ return K_BASE_BACKUP; }
@@ -74,9 +89,14 @@ PROGRESS			{ return K_PROGRESS; }
 WAL			{ return K_WAL; }
 TIMELINE			{ return K_TIMELINE; }
 START_REPLICATION	{ return K_START_REPLICATION; }
+INIT_LOGICAL_REPLICATION	{ return K_INIT_LOGICAL_REPLICATION; }
+START_LOGICAL_REPLICATION	{ return K_START_LOGICAL_REPLICATION; }
+FREE_LOGICAL_REPLICATION	{ return K_FREE_LOGICAL_REPLICATION; }
 TIMELINE_HISTORY	{ return K_TIMELINE_HISTORY; }
 ","				{ return ','; }
 ";"				{ return ';'; }
+"("				{ return '('; }
+")"				{ return ')'; }
 
 [\n]			;
 [\t]			;
@@ -100,20 +120,49 @@ TIMELINE_HISTORY	{ return K_TIMELINE_HISTORY; }
 					BEGIN(xq);
 					startlit();
 				}
+
 <xq>{quotestop}	{
 					yyless(1);
 					BEGIN(INITIAL);
 					yylval.str = litbufdup();
 					return SCONST;
 				}
-<xq>{xqdouble} {
+
+<xq>{xqdouble}	{
 					addlitchar('\'');
 				}
+
 <xq>{xqinside}  {
 					addlit(yytext, yyleng);
 				}
 
-<xq><<EOF>>		{ yyerror("unterminated quoted string"); }
+{xdstart}		{
+					BEGIN(xd);
+					startlit();
+				}
+
+<xd>{xdstop}	{
+					int len;
+					yyless(1);
+					BEGIN(INITIAL);
+					yylval.str = litbufdup();
+					len = strlen(yylval.str);
+					truncate_identifier(yylval.str, len, true);
+					return IDENT;
+				}
+
+<xd>{xdinside}  {
+					addlit(yytext, yyleng);
+				}
+
+{identifier}	{
+					int len = strlen(yytext);
+
+					yylval.str = downcase_truncate_identifier(yytext, len, true);
+					return IDENT;
+				}
+
+<xq,xd><<EOF>>	{ yyerror("unterminated quoted string"); }
 
 
 <<EOF>>			{
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 4c74d1b..3cbad64 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1137,7 +1137,7 @@ XLogWalRcvSendHSFeedback(bool immed)
 	 * everything else has been checked.
 	 */
 	if (hot_standby_feedback)
-		xmin = GetOldestXmin(true, false, false);
+		xmin = GetOldestXmin(true, true, false, false);
 	else
 		xmin = InvalidTransactionId;
 
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index a421ec5..723d5f8 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -53,6 +53,10 @@
 #include "miscadmin.h"
 #include "nodes/replnodes.h"
 #include "replication/basebackup.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+#include "replication/snapbuild.h"
 #include "replication/syncrep.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
@@ -153,6 +157,9 @@ static bool ping_sent = false;
 static bool streamingDoneSending;
 static bool streamingDoneReceiving;
 
+/* Are we there yet? */
+static bool		WalSndCaughtUp = false;
+
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t walsender_ready_to_stop = false;
@@ -165,24 +172,42 @@ static volatile sig_atomic_t walsender_ready_to_stop = false;
  */
 static volatile sig_atomic_t replication_active = false;
 
+/* XXX reader */
+static MemoryContext decoding_ctx = NULL;
+static MemoryContext old_decoding_ctx = NULL;
+
+static LogicalDecodingContext *logical_decoding_ctx = NULL;
+static XLogRecPtr  logical_startptr = InvalidXLogRecPtr;
+
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
-static void WalSndLoop(void);
+typedef void (*WalSndSendData)(void);
+static void WalSndLoop(WalSndSendData send_data);
 static void InitWalSenderSlot(void);
 static void WalSndKill(int code, Datum arg);
-static void XLogSend(bool *caughtup);
+static void XLogSendPhysical(void);
+static void XLogSendLogical(void);
+static void WalSndDone(WalSndSendData send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
 static void StartReplication(StartReplicationCmd *cmd);
+static void InitLogicalReplication(InitLogicalReplicationCmd *cmd);
+static void StartLogicalReplication(StartLogicalReplicationCmd *cmd);
+static void FreeLogicalReplication(FreeLogicalReplicationCmd *cmd);
 static void ProcessStandbyMessage(void);
 static void ProcessStandbyReplyMessage(void);
 static void ProcessStandbyHSFeedbackMessage(void);
 static void ProcessRepliesIfAny(void);
 static void WalSndKeepalive(bool requestReply);
+static void WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
+static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
+static void XLogRead(char *buf, XLogRecPtr startptr, Size count);
+
+
 
 
 /* Initialize walsender process before entering the main command loop */
@@ -269,8 +294,6 @@ IdentifySystem(void)
 
 	if (MyDatabaseId != InvalidOid)
 		dbname = get_database_name(MyDatabaseId);
-	else
-		dbname = "(none)";
 
 	/* Send a RowDescription message */
 	pq_beginmessage(&buf, 'T');
@@ -295,22 +318,22 @@ IdentifySystem(void)
 	pq_sendint(&buf, 0, 2);		/* format code */
 
 	/* third field */
-	pq_sendstring(&buf, "xlogpos");
-	pq_sendint(&buf, 0, 4);
-	pq_sendint(&buf, 0, 2);
-	pq_sendint(&buf, TEXTOID, 4);
-	pq_sendint(&buf, -1, 2);
-	pq_sendint(&buf, 0, 4);
-	pq_sendint(&buf, 0, 2);
+	pq_sendstring(&buf, "xlogpos");	/* col name */
+	pq_sendint(&buf, 0, 4);		/* table oid */
+	pq_sendint(&buf, 0, 2);		/* attnum */
+	pq_sendint(&buf, TEXTOID, 4);		/* type oid */
+	pq_sendint(&buf, -1, 2);		/* typlen */
+	pq_sendint(&buf, 0, 4);		/* typmod */
+	pq_sendint(&buf, 0, 2);		/* format code */
 
 	/* fourth field */
-	pq_sendstring(&buf, "dbname");
-	pq_sendint(&buf, 0, 4);
-	pq_sendint(&buf, 0, 2);
-	pq_sendint(&buf, TEXTOID, 4);
-	pq_sendint(&buf, -1, 2);
-	pq_sendint(&buf, 0, 4);
-	pq_sendint(&buf, 0, 2);
+	pq_sendstring(&buf, "dbname");	/* col name */
+	pq_sendint(&buf, 0, 4);		/* table oid */
+	pq_sendint(&buf, 0, 2);		/* attnum */
+	pq_sendint(&buf, TEXTOID, 4);		/* type oid */
+	pq_sendint(&buf, -1, 2);		/* typlen */
+	pq_sendint(&buf, 0, 4);		/* typmod */
+	pq_sendint(&buf, 0, 2);		/* format code */
 	pq_endmessage(&buf);
 
 	/* Send a DataRow message */
@@ -322,9 +345,16 @@ IdentifySystem(void)
 	pq_sendbytes(&buf, (char *) tli, strlen(tli));
 	pq_sendint(&buf, strlen(xpos), 4);	/* col3 len */
 	pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
-	pq_sendint(&buf, strlen(dbname), 4);	/* col4 len */
-	pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
-
+	/* send NULL if not connected to a database */
+	if (dbname)
+	{
+		pq_sendint(&buf, strlen(dbname), 4);	/* col4 len */
+		pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
+	}
+	else
+	{
+		pq_sendint(&buf, -1, 4);	/* col4 len */
+	}
 	pq_endmessage(&buf);
 }
 
@@ -573,7 +603,7 @@ StartReplication(StartReplicationCmd *cmd)
 		/* Main loop of walsender */
 		replication_active = true;
 
-		WalSndLoop();
+		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
 		if (walsender_ready_to_stop)
@@ -640,6 +670,498 @@ StartReplication(StartReplicationCmd *cmd)
 	pq_puttextmessage('C', "START_STREAMING");
 }
 
+static int
+replay_read_page(XLogReaderState* state, XLogRecPtr targetPagePtr, int reqLen,
+				 XLogRecPtr targetRecPtr, char* cur_page, TimeLineID *pageTLI)
+{
+	XLogRecPtr flushptr;
+	int		count;
+
+	flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
+
+	/* more than one block available */
+	if (targetPagePtr + XLOG_BLCKSZ <= flushptr)
+		count = XLOG_BLCKSZ;
+	/* not enough data there */
+	else if (targetPagePtr + reqLen > flushptr)
+		return -1;
+	/* part of the page available */
+	else
+		count = flushptr - targetPagePtr;
+
+	/* FIXME: more sensible/efficient implementation */
+	XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+
+	return count;
+}
+
+/*
+ * Initialize logical replication and wait for an initial consistent point to
+ * start sending changes from.
+ */
+static void
+InitLogicalReplication(InitLogicalReplicationCmd *cmd)
+{
+	const char *slot_name;
+	StringInfoData buf;
+	char		xpos[MAXFNAMELEN];
+	const char *snapshot_name = NULL;
+	LogicalDecodingContext *ctx;
+	XLogRecPtr startptr;
+
+	CheckLogicalReplicationRequirements();
+
+	Assert(!MyLogicalDecodingSlot);
+
+	/* XXX apply sanity checking to slot name? */
+	LogicalDecodingAcquireFreeSlot(cmd->name, cmd->plugin);
+
+	Assert(MyLogicalDecodingSlot);
+
+	decoding_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "decoding context",
+										 ALLOCSET_DEFAULT_MINSIZE,
+										 ALLOCSET_DEFAULT_INITSIZE,
+										 ALLOCSET_DEFAULT_MAXSIZE);
+	old_decoding_ctx = MemoryContextSwitchTo(decoding_ctx);
+	/* XXX pointless? */
+	TopTransactionContext = decoding_ctx;
+
+	/* setup state for XLogReadPage */
+	sendTimeLineIsHistoric = false;
+	sendTimeLine = ThisTimeLineID;
+
+	initStringInfo(&output_message);
+	ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false, InvalidXLogRecPtr,
+									   NIL,	replay_read_page,
+									   WalSndPrepareWrite, WalSndWriteData);
+
+	MemoryContextSwitchTo(old_decoding_ctx);
+	TopTransactionContext = NULL;
+
+	startptr = MyLogicalDecodingSlot->restart_decoding;
+
+	elog(WARNING, "Initiating logical rep from %X/%X",
+		 (uint32)(startptr >> 32), (uint32)startptr);
+
+	for (;;)
+	{
+		XLogRecord *record;
+		XLogRecordBuffer buf;
+		char *err = NULL;
+
+		/* the read_page callback waits for new WAL */
+		record = XLogReadRecord(ctx->reader, startptr, &err);
+		/* xlog record was invalid */
+		if (err)
+			elog(ERROR, "%s", err);
+
+		/* read up from last position next time round */
+		startptr = InvalidXLogRecPtr;
+
+		Assert(record);
+
+		buf.origptr = ctx->reader->ReadRecPtr;
+		buf.record = *record;
+		buf.record_data = XLogRecGetData(record);
+		DecodeRecordIntoReorderBuffer(ctx, &buf);
+
+		/* only continue till we found a consistent spot */
+		if (LogicalDecodingContextReady(ctx))
+		{
+			/* export plain, importable, snapshot to the user */
+			snapshot_name = SnapBuildExportSnapshot(ctx->snapshot_builder);
+			break;
+		}
+	}
+
+	MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+	slot_name = NameStr(MyLogicalDecodingSlot->name);
+	snprintf(xpos, sizeof(xpos), "%X/%X",
+			 (uint32) (MyLogicalDecodingSlot->confirmed_flush >> 32),
+			 (uint32) MyLogicalDecodingSlot->confirmed_flush);
+
+	pq_beginmessage(&buf, 'T');
+	pq_sendint(&buf, 4, 2);		/* 4 fields */
+
+	/* first field */
+	pq_sendstring(&buf, "replication_id");	/* col name */
+	pq_sendint(&buf, 0, 4);		/* table oid */
+	pq_sendint(&buf, 0, 2);		/* attnum */
+	pq_sendint(&buf, TEXTOID, 4);		/* type oid */
+	pq_sendint(&buf, -1, 2);	/* typlen */
+	pq_sendint(&buf, 0, 4);		/* typmod */
+	pq_sendint(&buf, 0, 2);		/* format code */
+
+	pq_sendstring(&buf, "consistent_point");	/* col name */
+	pq_sendint(&buf, 0, 4);		/* table oid */
+	pq_sendint(&buf, 0, 2);		/* attnum */
+	pq_sendint(&buf, TEXTOID, 4);		/* type oid */
+	pq_sendint(&buf, -1, 2);	/* typlen */
+	pq_sendint(&buf, 0, 4);		/* typmod */
+	pq_sendint(&buf, 0, 2);		/* format code */
+
+	pq_sendstring(&buf, "snapshot_name");	/* col name */
+	pq_sendint(&buf, 0, 4);		/* table oid */
+	pq_sendint(&buf, 0, 2);		/* attnum */
+	pq_sendint(&buf, TEXTOID, 4);		/* type oid */
+	pq_sendint(&buf, -1, 2);	/* typlen */
+	pq_sendint(&buf, 0, 4);		/* typmod */
+	pq_sendint(&buf, 0, 2);		/* format code */
+
+	pq_sendstring(&buf, "plugin");	/* col name */
+	pq_sendint(&buf, 0, 4);		/* table oid */
+	pq_sendint(&buf, 0, 2);		/* attnum */
+	pq_sendint(&buf, TEXTOID, 4);		/* type oid */
+	pq_sendint(&buf, -1, 2);	/* typlen */
+	pq_sendint(&buf, 0, 4);		/* typmod */
+	pq_sendint(&buf, 0, 2);		/* format code */
+
+	pq_endmessage(&buf);
+
+	/* Send a DataRow message */
+	pq_beginmessage(&buf, 'D');
+	pq_sendint(&buf, 4, 2);		/* # of columns */
+
+	/* replication_id */
+	pq_sendint(&buf, strlen(slot_name), 4); /* col1 len */
+	pq_sendbytes(&buf, slot_name, strlen(slot_name));
+
+	/* consistent wal location */
+	pq_sendint(&buf, strlen(xpos), 4); /* col2 len */
+	pq_sendbytes(&buf, xpos, strlen(xpos));
+
+	/* snapshot name */
+	pq_sendint(&buf, strlen(snapshot_name), 4); /* col3 len */
+	pq_sendbytes(&buf, snapshot_name, strlen(snapshot_name));
+
+	/* plugin */
+	pq_sendint(&buf, strlen(cmd->plugin), 4); /* col4 len */
+	pq_sendbytes(&buf, cmd->plugin, strlen(cmd->plugin));
+
+	pq_endmessage(&buf);
+
+	/*
+	 * release active status again, START_LOGICAL_REPLICATION will reacquire it
+	 */
+	LogicalDecodingReleaseSlot();
+}
+
+/*
+ * Load previously initiated logical slot and prepare for sending data (via
+ * WalSndLoop).
+ */
+static void
+StartLogicalReplication(StartLogicalReplicationCmd *cmd)
+{
+	StringInfoData buf;
+	XLogRecPtr confirmed_flush;
+
+	elog(WARNING, "Starting logical replication");
+
+	/* make sure that our requirements are still fulfilled */
+	CheckLogicalReplicationRequirements();
+
+	Assert(!MyLogicalDecodingSlot);
+
+	LogicalDecodingReAcquireSlot(cmd->name);
+
+	if (am_cascading_walsender && !RecoveryInProgress())
+	{
+		ereport(LOG,
+				(errmsg("terminating walsender process to force cascaded standby to update timeline and reconnect")));
+		walsender_ready_to_stop = true;
+	}
+
+	WalSndSetState(WALSNDSTATE_CATCHUP);
+
+	/* Send a CopyBothResponse message, and start streaming */
+	pq_beginmessage(&buf, 'W');
+	pq_sendbyte(&buf, 0);
+	pq_sendint(&buf, 0, 2);
+	pq_endmessage(&buf);
+	pq_flush();
+
+	/* setup state for XLogReadPage */
+	sendTimeLineIsHistoric = false;
+	sendTimeLine = ThisTimeLineID;
+
+	confirmed_flush = MyLogicalDecodingSlot->confirmed_flush;
+
+	Assert(confirmed_flush != InvalidXLogRecPtr);
+
+	/* continue from last position */
+	if (cmd->startpoint == InvalidXLogRecPtr)
+		cmd->startpoint = MyLogicalDecodingSlot->confirmed_flush;
+	else if (cmd->startpoint > MyLogicalDecodingSlot->confirmed_flush)
+		elog(ERROR, "cannot stream from %X/%X, minimum is %X/%X",
+			 (uint32)(cmd->startpoint >> 32), (uint32)cmd->startpoint,
+			 (uint32)(confirmed_flush >> 32), (uint32)confirmed_flush);
+
+	/*
+	 * Initialize position to the last ack'ed one, then the xlog records begin
+	 * to be shipped from that position.
+	 */
+	logical_decoding_ctx = CreateLogicalDecodingContext(
+		MyLogicalDecodingSlot, false, cmd->startpoint, cmd->options,
+		replay_read_page, WalSndPrepareWrite, WalSndWriteData);
+
+	/*
+	 * XXX: For feedback purposes it would be nicer to set sentPtr to
+	 * cmd->startpoint, but we use it to know where to read xlog in the main
+	 * loop...
+	 */
+	sentPtr = MyLogicalDecodingSlot->restart_decoding;
+	logical_startptr = sentPtr;
+
+	/* Also update the start position status in shared memory */
+	{
+		/* use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = MyWalSnd;
+
+		SpinLockAcquire(&walsnd->mutex);
+		walsnd->sentPtr = MyLogicalDecodingSlot->restart_decoding;
+		SpinLockRelease(&walsnd->mutex);
+	}
+
+	elog(LOG, "starting to decode from %X/%X, replay %X/%X",
+		 (uint32)(MyWalSnd->sentPtr >> 32), (uint32)MyWalSnd->sentPtr,
+		 (uint32)(cmd->startpoint >> 32), (uint32)cmd->startpoint);
+
+	replication_active = true;
+
+	SyncRepInitConfig();
+
+	/* Main loop of walsender */
+	WalSndLoop(XLogSendLogical);
+
+	LogicalDecodingReleaseSlot();
+
+	replication_active = false;
+	if (walsender_ready_to_stop)
+		proc_exit(0);
+	WalSndSetState(WALSNDSTATE_STARTUP);
+
+	/* Get out of COPY mode (CommandComplete). */
+	EndCommand("COPY 0", DestRemote);
+}
+
+/*
+ * Free permanent state by a now inactive but defined logical slot.
+ */
+static void
+FreeLogicalReplication(FreeLogicalReplicationCmd *cmd)
+{
+	CheckLogicalReplicationRequirements();
+	LogicalDecodingFreeSlot(cmd->name);
+	EndCommand("FREE_LOGICAL_REPLICATION", DestRemote);
+}
+
+/*
+ * LogicalDecodingContext 'prepare_write' callback.
+ *
+ * Prepare a write into a StringInfo.
+ *
+ * Don't do anything lasting in here, it's quite possible that nothing will done
+ * with the data.
+ */
+static void
+WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+	AssertVariableIsOfType(&WalSndPrepareWrite, LogicalOutputPluginWriterPrepareWrite);
+
+	resetStringInfo(ctx->out);
+
+	pq_sendbyte(ctx->out, 'w');
+	pq_sendint64(ctx->out, lsn);	/* dataStart */
+	/* XXX: overwrite when data is assembled */
+	pq_sendint64(ctx->out, lsn);	/* walEnd */
+	/* XXX: gather that value later just as it's done in XLogSendPhysical */
+	pq_sendint64(ctx->out, 0 /*GetCurrentIntegerTimestamp() */);/* sendtime */
+}
+
+/*
+ * LogicalDecodingContext 'write' callback.
+ *
+ * Actually write out data previously prepared by WalSndPrepareWrite out to the
+ * network, take as long as needed but process replies from the other side
+ * during that.
+ */
+static void
+WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+	AssertVariableIsOfType(&WalSndWriteData, LogicalOutputPluginWriterWrite);
+
+	/* output previously gathered data in a CopyData packet */
+	pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
+
+	/* fast path */
+	/* Try to flush pending output to the client */
+	if (pq_flush_if_writable() != 0)
+		return;
+
+	if (!pq_is_send_pending())
+		return;
+
+	for (;;)
+	{
+		int			wakeEvents;
+		long		sleeptime = 10000;		/* 10s */
+
+		/*
+		 * Emergency bailout if postmaster has died.  This is to avoid the
+		 * necessity for manual cleanup of all postmaster children.
+		 */
+		if (!PostmasterIsAlive())
+			exit(1);
+
+		/* Process any requests or signals received recently */
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+			SyncRepInitConfig();
+		}
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Check for input from the client */
+		ProcessRepliesIfAny();
+
+		/* Clear any already-pending wakeups */
+		ResetLatch(&MyWalSnd->latch);
+
+		/* Try to flush pending output to the client */
+		if (pq_flush_if_writable() != 0)
+			break;
+
+		/* If we finished clearing the buffered data, we're done here. */
+		if (!pq_is_send_pending())
+			break;
+
+		/*
+		 * Note we don't set a timeout here.  It would be pointless, because
+		 * if the socket is not writable there's not much we can do elsewhere
+		 * anyway.
+		 */
+		wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+			WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
+
+		ImmediateInterruptOK = true;
+		CHECK_FOR_INTERRUPTS();
+		WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
+						  MyProcPort->sock, sleeptime);
+		ImmediateInterruptOK = false;
+	}
+
+	/* reactivate latch so WalSndLoop knows to continue */
+	SetLatch(&MyWalSnd->latch);
+}
+
+/*
+ * Wait till WAL < loc is flushed to disk so it can be safely read.
+ */
+XLogRecPtr
+WalSndWaitForWal(XLogRecPtr loc)
+{
+	int			wakeEvents;
+	XLogRecPtr  flushptr;
+
+	/* fast path if everything is there already */
+	/*
+	 * XXX: introduce RecentFlushPtr to avoid acquiring the spinlock in the
+	 * fast path case where we already know we have enough WAL available.
+	 */
+	flushptr = GetFlushRecPtr();
+	if (loc <= flushptr)
+		return flushptr;
+
+	for (;;)
+	{
+		long		sleeptime = 10000;		/* 10 s */
+
+		/*
+		 * Emergency bailout if postmaster has died.  This is to avoid the
+		 * necessity for manual cleanup of all postmaster children.
+		 */
+		if (!PostmasterIsAlive())
+			exit(1);
+
+		/* Process any requests or signals received recently */
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+			SyncRepInitConfig();
+		}
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Check for input from the client */
+		ProcessRepliesIfAny();
+
+		/* Clear any already-pending wakeups */
+		ResetLatch(&MyWalSnd->latch);
+
+		/* Update our idea of flushed position. */
+		flushptr = GetFlushRecPtr();
+
+		/* If postmaster asked us to stop, don't wait here anymore */
+		if (walsender_ready_to_stop)
+			break;
+
+		/* check whether we're done */
+		if (loc <= flushptr)
+			break;
+
+		/* Determine time until replication timeout */
+		if (wal_sender_timeout > 0)
+		{
+			if (!ping_sent)
+			{
+				TimestampTz timeout;
+
+				/*
+				 * If half of wal_sender_timeout has lapsed without receiving
+				 * any reply from standby, send a keep-alive message to standby
+				 * requesting an immediate reply.
+				 */
+				timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
+													  wal_sender_timeout / 2);
+				if (GetCurrentTimestamp() >= timeout)
+				{
+					WalSndKeepalive(true);
+					ping_sent = true;
+					/* Try to flush pending output to the client */
+					if (pq_flush_if_writable() != 0)
+						break;
+				}
+			}
+
+			sleeptime = 1 + (wal_sender_timeout / 10);
+		}
+
+		wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+			WL_SOCKET_READABLE | WL_TIMEOUT;
+
+		ImmediateInterruptOK = true;
+		CHECK_FOR_INTERRUPTS();
+		WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
+						  MyProcPort->sock, sleeptime);
+		ImmediateInterruptOK = false;
+
+		/*
+		 * The equivalent code in WalSndLoop checks here that replication
+		 * timeout hasn't been exceeded.  We don't do that here.   XXX explain
+		 * why.
+		 */
+	}
+
+	/* reactivate latch so WalSndLoop knows to continue */
+	SetLatch(&MyWalSnd->latch);
+	return flushptr;
+}
+
 /*
  * Execute an incoming replication command.
  */
@@ -651,6 +1173,12 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext cmd_context;
 	MemoryContext old_context;
 
+	/*
+	 * INIT_LOGICAL_REPLICATION exports a snapshot until the next command
+	 * arrives. Clean up the old stuff if there's anything.
+	 */
+	SnapBuildClearExportedSnapshot();
+
 	elog(DEBUG1, "received replication command: %s", cmd_string);
 
 	CHECK_FOR_INTERRUPTS();
@@ -682,6 +1210,18 @@ exec_replication_command(const char *cmd_string)
 			StartReplication((StartReplicationCmd *) cmd_node);
 			break;
 
+		case T_InitLogicalReplicationCmd:
+			InitLogicalReplication((InitLogicalReplicationCmd *) cmd_node);
+			break;
+
+		case T_StartLogicalReplicationCmd:
+			StartLogicalReplication((StartLogicalReplicationCmd *) cmd_node);
+			break;
+
+		case T_FreeLogicalReplicationCmd:
+			FreeLogicalReplication((FreeLogicalReplicationCmd *) cmd_node);
+			break;
+
 		case T_BaseBackupCmd:
 			SendBaseBackup((BaseBackupCmd *) cmd_node);
 			break;
@@ -891,6 +1431,12 @@ ProcessStandbyReplyMessage(void)
 		SpinLockRelease(&walsnd->mutex);
 	}
 
+	/*
+	 * Advance our local xmin horizon when the client confirmed a flush.
+	 */
+	if (MyLogicalDecodingSlot && flushPtr != InvalidXLogRecPtr)
+		LogicalConfirmReceivedLocation(flushPtr);
+
 	if (!am_cascading_walsender)
 		SyncRepReleaseWaiters();
 }
@@ -975,10 +1521,8 @@ ProcessStandbyHSFeedbackMessage(void)
 
 /* Main loop of walsender process that streams the WAL over Copy messages. */
 static void
-WalSndLoop(void)
+WalSndLoop(WalSndSendData send_data)
 {
-	bool		caughtup = false;
-
 	/*
 	 * Allocate buffers that will be used for each outgoing and incoming
 	 * message.  We do this just once to reduce palloc overhead.
@@ -1030,21 +1574,21 @@ WalSndLoop(void)
 
 		/*
 		 * If we don't have any pending data in the output buffer, try to send
-		 * some more.  If there is some, we don't bother to call XLogSend
+		 * some more.  If there is some, we don't bother to call send_data
 		 * again until we've flushed it ... but we'd better assume we are not
 		 * caught up.
 		 */
 		if (!pq_is_send_pending())
-			XLogSend(&caughtup);
+			send_data();
 		else
-			caughtup = false;
+			WalSndCaughtUp = false;
 
 		/* Try to flush pending output to the client */
 		if (pq_flush_if_writable() != 0)
 			goto send_failure;
 
 		/* If nothing remains to be sent right now ... */
-		if (caughtup && !pq_is_send_pending())
+		if (WalSndCaughtUp && !pq_is_send_pending())
 		{
 			/*
 			 * If we're in catchup state, move to streaming.  This is an
@@ -1069,28 +1613,17 @@ WalSndLoop(void)
 			 * the walsender is not sure which.
 			 */
 			if (walsender_ready_to_stop)
-			{
-				/* ... let's just be real sure we're caught up ... */
-				XLogSend(&caughtup);
-				if (caughtup && !pq_is_send_pending())
-				{
-					/* Inform the standby that XLOG streaming is done */
-					EndCommand("COPY 0", DestRemote);
-					pq_flush();
-
-					proc_exit(0);
-				}
-			}
+				WalSndDone(send_data);
 		}
 
 		/*
 		 * We don't block if not caught up, unless there is unsent data
 		 * pending in which case we'd better block until the socket is
-		 * write-ready.  This test is only needed for the case where XLogSend
+		 * write-ready.  This test is only needed for the case where send_data
 		 * loaded a subset of the available data but then pq_flush_if_writable
 		 * flushed it all --- we should immediately try to send more.
 		 */
-		if ((caughtup && !streamingDoneSending) || pq_is_send_pending())
+		if ((WalSndCaughtUp && !streamingDoneSending) || pq_is_send_pending())
 		{
 			TimestampTz timeout = 0;
 			long		sleeptime = 10000;		/* 10 s */
@@ -1419,15 +1952,17 @@ retry:
 }
 
 /*
+ * Send out the WAL in its normal physical/stored form.
+ *
  * Read up to MAX_SEND_SIZE bytes of WAL that's been flushed to disk,
  * but not yet sent to the client, and buffer it in the libpq output
  * buffer.
  *
- * If there is no unsent WAL remaining, *caughtup is set to true, otherwise
- * *caughtup is set to false.
+ * If there is no unsent WAL remaining, WalSndCaughtUp is set to true,
+ * otherwise WalSndCaughtUp is set to false.
  */
 static void
-XLogSend(bool *caughtup)
+XLogSendPhysical(void)
 {
 	XLogRecPtr	SendRqstPtr;
 	XLogRecPtr	startptr;
@@ -1436,7 +1971,7 @@ XLogSend(bool *caughtup)
 
 	if (streamingDoneSending)
 	{
-		*caughtup = true;
+		WalSndCaughtUp = true;
 		return;
 	}
 
@@ -1553,7 +2088,7 @@ XLogSend(bool *caughtup)
 		pq_putmessage_noblock('c', NULL, 0);
 		streamingDoneSending = true;
 
-		*caughtup = true;
+		WalSndCaughtUp = true;
 
 		elog(DEBUG1, "walsender reached end of timeline at %X/%X (sent up to %X/%X)",
 			 (uint32) (sendTimeLineValidUpto >> 32), (uint32) sendTimeLineValidUpto,
@@ -1565,7 +2100,7 @@ XLogSend(bool *caughtup)
 	Assert(sentPtr <= SendRqstPtr);
 	if (SendRqstPtr <= sentPtr)
 	{
-		*caughtup = true;
+		WalSndCaughtUp = true;
 		return;
 	}
 
@@ -1589,15 +2124,15 @@ XLogSend(bool *caughtup)
 	{
 		endptr = SendRqstPtr;
 		if (sendTimeLineIsHistoric)
-			*caughtup = false;
+			WalSndCaughtUp = false;
 		else
-			*caughtup = true;
+			WalSndCaughtUp = true;
 	}
 	else
 	{
 		/* round down to page boundary. */
 		endptr -= (endptr % XLOG_BLCKSZ);
-		*caughtup = false;
+		WalSndCaughtUp = false;
 	}
 
 	nbytes = endptr - startptr;
@@ -1658,6 +2193,96 @@ XLogSend(bool *caughtup)
 }
 
 /*
+ * Send out the WAL after it being decoded into a logical format by the output
+ * plugin specified in INIT_LOGICAL_DECODING
+ */
+static void
+XLogSendLogical(void)
+{
+	XLogRecord *record;
+	char	   *errm;
+
+	if (decoding_ctx == NULL)
+	{
+		decoding_ctx = AllocSetContextCreate(TopMemoryContext,
+											 "decoding context",
+											 ALLOCSET_DEFAULT_MINSIZE,
+											 ALLOCSET_DEFAULT_INITSIZE,
+											 ALLOCSET_DEFAULT_MAXSIZE);
+	}
+
+	record = XLogReadRecord(logical_decoding_ctx->reader, logical_startptr, &errm);
+	logical_startptr = InvalidXLogRecPtr;
+
+	/* xlog record was invalid */
+	if (errm != NULL)
+		elog(ERROR, "%s", errm);
+
+	if (record != NULL)
+	{
+		XLogRecordBuffer buf;
+
+		buf.origptr = logical_decoding_ctx->reader->ReadRecPtr;
+		buf.record = *record;
+		buf.record_data = XLogRecGetData(record);
+
+		old_decoding_ctx = MemoryContextSwitchTo(decoding_ctx);
+		TopTransactionContext = decoding_ctx;
+
+		DecodeRecordIntoReorderBuffer(logical_decoding_ctx, &buf);
+
+		MemoryContextSwitchTo(old_decoding_ctx);
+		TopTransactionContext = NULL;
+
+		/*
+		 * If the record we just read is at or beyond the flushed point, then
+		 * we're caught up.
+		 */
+		WalSndCaughtUp =
+			logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr();
+	}
+	else
+		/*
+		 * xlogreader failed, and no error was reported? we must be caught up.
+		 */
+		WalSndCaughtUp = true;
+
+	/* Update shared memory status */
+	{
+		/* use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = MyWalSnd;
+
+		SpinLockAcquire(&walsnd->mutex);
+		walsnd->sentPtr = logical_decoding_ctx->reader->ReadRecPtr;
+		SpinLockRelease(&walsnd->mutex);
+	}
+}
+
+/*
+ * The sender is caught up, so we can go away for shutdown processing
+ * to finish normally.  (This should only be called when the shutdown
+ * signal has been received from postmaster.)
+ *
+ * Note that if while doing this we determine that there's still more
+ * data to send, this function will return control to the caller.
+ */
+static void
+WalSndDone(WalSndSendData send_data)
+{
+	/* ... let's just be real sure we're caught up ... */
+	send_data();
+
+	if (WalSndCaughtUp && !pq_is_send_pending())
+	{
+		/* Inform the standby that XLOG streaming is done */
+		EndCommand("COPY 0", DestRemote);
+		pq_flush();
+
+		proc_exit(0);
+	}
+}
+
+/*
  * Returns the latest point in WAL that has been safely flushed to disk, and
  * can be sent to the standby. This should only be called when in recovery,
  * ie. we're streaming to a cascaded standby.
@@ -2124,7 +2749,8 @@ wait_for_remote_lsn(int32 pid, XLogRecPtr ptr, bool wait_for_apply)
 	int i;
 	bool done;
 
-	do {
+	do
+	{
 		done = true;
 
 		for (i = 0; i < max_wal_senders; i++)
@@ -2135,7 +2761,9 @@ wait_for_remote_lsn(int32 pid, XLogRecPtr ptr, bool wait_for_apply)
 
 			if (walsnd->pid != 0 && (pid == 0 || pid == walsnd->pid))
 			{
-				XLogRecPtr rptr = wait_for_apply ? walsnd->apply : walsnd->flush;
+				XLogRecPtr rptr;
+
+				rptr = wait_for_apply ? walsnd->apply : walsnd->flush;
 				if (rptr < ptr)
 					done = false;
 			}
@@ -2147,7 +2775,7 @@ wait_for_remote_lsn(int32 pid, XLogRecPtr ptr, bool wait_for_apply)
 		}
 
 		if (!done)
-			pg_usleep(10*1000);
+			pg_usleep(10 * 1000);
 	}
 	while (!done);
 }
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b34ba44..4fcbd4a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -26,6 +26,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgwriter.h"
 #include "postmaster/postmaster.h"
+#include "replication/logical.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
@@ -122,6 +123,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, ProcSignalShmemSize());
 		size = add_size(size, CheckpointerShmemSize());
 		size = add_size(size, AutoVacuumShmemSize());
+		size = add_size(size, LogicalDecodingShmemSize());
 		size = add_size(size, WalSndShmemSize());
 		size = add_size(size, WalRcvShmemSize());
 		size = add_size(size, BTreeShmemSize());
@@ -227,6 +229,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	ProcSignalShmemInit();
 	CheckpointerShmemInit();
 	AutoVacuumShmemInit();
+	LogicalDecodingShmemInit();
 	WalSndShmemInit();
 	WalRcvShmemInit();
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 993efac..4c6c1ed 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -51,6 +51,9 @@
 #include "access/xact.h"
 #include "access/twophase.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/walsender.h"
+#include "replication/walsender_private.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/spin.h"
@@ -1100,11 +1103,12 @@ TransactionIdIsActive(TransactionId xid)
  * GetOldestXmin() move backwards, with no consequences for data integrity.
  */
 TransactionId
-GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked)
+GetOldestXmin(bool allDbs, bool ignoreVacuum, bool systable, bool alreadyLocked)
 {
 	ProcArrayStruct *arrayP = procArray;
 	TransactionId result;
 	int			index;
+	volatile TransactionId logical_xmin = InvalidTransactionId;
 
 	/* Cannot look for individual databases during recovery */
 	Assert(allDbs || !RecoveryInProgress());
@@ -1157,6 +1161,10 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked)
 		}
 	}
 
+	/* fetch into volatile var while ProcArrayLock is held */
+	if (max_logical_slots > 0)
+		logical_xmin = LogicalDecodingCtl->xmin;
+
 	if (RecoveryInProgress())
 	{
 		/*
@@ -1196,6 +1204,15 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked)
 			result = FirstNormalTransactionId;
 	}
 
+	/*
+	 * after locks are released and defer_cleanup_age has been applied, check
+	 * whether we need to back up further to make logical decoding possible.
+	 */
+	if (systable &&
+		TransactionIdIsValid(logical_xmin) &&
+		NormalTransactionIdPrecedes(logical_xmin, result))
+		result = logical_xmin;
+
 	return result;
 }
 
@@ -1250,6 +1267,8 @@ GetMaxSnapshotSubxidCount(void)
  *		RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
  *			running transactions, except those running LAZY VACUUM).  This is
  *			the same computation done by GetOldestXmin(true, true, ...).
+ *		RecentGlobalDataXmin: the global xmin for non-catalog tables
+ *			>= RecentGlobalXmin
  *
  * Note: this function should probably not be called with an argument that's
  * not statically allocated (see xip allocation below).
@@ -1265,6 +1284,7 @@ GetSnapshotData(Snapshot snapshot)
 	int			count = 0;
 	int			subcount = 0;
 	bool		suboverflowed = false;
+	volatile TransactionId logical_xmin = InvalidTransactionId;
 
 	Assert(snapshot != NULL);
 
@@ -1442,8 +1462,14 @@ GetSnapshotData(Snapshot snapshot)
 			suboverflowed = true;
 	}
 
+
+	/* fetch into volatile var while ProcArrayLock is held */
+	if (max_logical_slots > 0)
+		logical_xmin = LogicalDecodingCtl->xmin;
+
 	if (!TransactionIdIsValid(MyPgXact->xmin))
 		MyPgXact->xmin = TransactionXmin = xmin;
+
 	LWLockRelease(ProcArrayLock);
 
 	/*
@@ -1458,6 +1484,17 @@ GetSnapshotData(Snapshot snapshot)
 	RecentGlobalXmin = globalxmin - vacuum_defer_cleanup_age;
 	if (!TransactionIdIsNormal(RecentGlobalXmin))
 		RecentGlobalXmin = FirstNormalTransactionId;
+
+	/* Non-catalog tables can be vacuumed if older than this xid */
+	RecentGlobalDataXmin = RecentGlobalXmin;
+
+	/*
+	 * peg the global xmin to the one required for logical decoding if required
+	 */
+	if (TransactionIdIsNormal(logical_xmin) &&
+		NormalTransactionIdPrecedes(logical_xmin, RecentGlobalXmin))
+		RecentGlobalXmin = logical_xmin;
+
 	RecentXmin = xmin;
 
 	snapshot->xmin = xmin;
@@ -1558,9 +1595,11 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
  * Similar to GetSnapshotData but returns more information. We include
  * all PGXACTs with an assigned TransactionId, even VACUUM processes.
  *
- * We acquire XidGenLock, but the caller is responsible for releasing it.
- * This ensures that no new XIDs enter the proc array until the caller has
- * WAL-logged this snapshot, and releases the lock.
+ * We acquire XidGenLock and ProcArrayLock, but the caller is responsible for
+ * releasing them. Acquiring XidGenLock ensures that no new XIDs enter the proc
+ * array until the caller has WAL-logged this snapshot, and releases the
+ * lock. Acquiring ProcArrayLock ensures that no transactions commit until the
+ * lock is released.
  *
  * The returned data structure is statically allocated; caller should not
  * modify it, and must not assume it is valid past the next call.
@@ -1695,6 +1734,12 @@ GetRunningTransactionData(void)
 		}
 	}
 
+	/*
+	 * Its important *not* to track decoding tasks here because snapbuild.c
+	 * uses ->oldestRunningXid to manage its xmin. If it were to be included
+	 * here the initial value could never increase.
+	 */
+
 	CurrentRunningXacts->xcnt = count - subcount;
 	CurrentRunningXacts->subxcnt = subcount;
 	CurrentRunningXacts->subxid_overflow = suboverflowed;
@@ -1702,13 +1747,12 @@ GetRunningTransactionData(void)
 	CurrentRunningXacts->oldestRunningXid = oldestRunningXid;
 	CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
 
-	/* We don't release XidGenLock here, the caller is responsible for that */
-	LWLockRelease(ProcArrayLock);
-
 	Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
 	Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
 	Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));
 
+	/* We don't release the locks here, the caller is responsible for that */
+
 	return CurrentRunningXacts;
 }
 
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index e85733b..93ed9dd 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -879,7 +879,22 @@ LogStandbySnapshot(void)
 	 * record we write, because standby will open up when it sees this.
 	 */
 	running = GetRunningTransactionData();
-	LogCurrentRunningXacts(running);
+
+	/*
+	 * GetRunningTransactionData() acquired ProcArrayLock, we must release
+	 * it. We can do that before inserting the WAL record because
+	 * ProcArrayApplyRecoveryInfo can recheck the commit status using the
+	 * clog. If we're doing logical replication we can't do that though, so
+	 * hold the lock for a moment longer.
+	 */
+	if (wal_level < WAL_LEVEL_LOGICAL)
+		LWLockRelease(ProcArrayLock);
+
+	recptr = LogCurrentRunningXacts(running);
+
+	/* Release lock if we kept it longer ... */
+	if (wal_level >= WAL_LEVEL_LOGICAL)
+		LWLockRelease(ProcArrayLock);
 
 	/* GetRunningTransactionData() acquired XidGenLock, we must release it */
 	LWLockRelease(XidGenLock);
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index e0dc126..9c93cb4 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -475,7 +475,7 @@ RegisterRelcacheInvalidation(Oid dbId, Oid relId)
  * Only the local caches are flushed; this does not transmit the message
  * to other backends.
  */
-static void
+void
 LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
 {
 	if (msg->id >= 0)
@@ -547,7 +547,7 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
  *		since that tells us we've lost some shared-inval messages and hence
  *		don't know what needs to be invalidated.
  */
-static void
+void
 InvalidateSystemCaches(void)
 {
 	int			i;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 3f7386e..5425d32 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1602,6 +1602,10 @@ RelationIdGetRelation(Oid relationId)
 		return rd;
 	}
 
+	/* up2date system relations, even during timetravel */
+	if (IsSystemRelationId(relationId))
+		SuspendDecodingSnapshots();
+
 	/*
 	 * no reldesc in the cache, so have RelationBuildDesc() build one and add
 	 * it.
@@ -1609,6 +1613,10 @@ RelationIdGetRelation(Oid relationId)
 	rd = RelationBuildDesc(relationId, true);
 	if (RelationIsValid(rd))
 		RelationIncrementReferenceCount(rd);
+
+	if (IsSystemRelationId(relationId))
+		UnSuspendDecodingSnapshots();
+
 	return rd;
 }
 
@@ -1730,6 +1738,10 @@ RelationReloadIndexInfo(Relation relation)
 		return;
 	}
 
+	/* up2date system relations, even during timetravel */
+	if (IsSystemRelation(relation))
+		SuspendDecodingSnapshots();
+
 	/*
 	 * Read the pg_class row
 	 *
@@ -1797,6 +1809,9 @@ RelationReloadIndexInfo(Relation relation)
 
 	/* Okay, now it's valid again */
 	relation->rd_isvalid = true;
+
+	if (IsSystemRelation(relation))
+		UnSuspendDecodingSnapshots();
 }
 
 /*
@@ -1978,6 +1993,10 @@ RelationClearRelation(Relation relation, bool rebuild)
 		bool		keep_tupdesc;
 		bool		keep_rules;
 
+		/* up2date system relations, even during timetravel */
+		if (IsSystemRelation(relation))
+			SuspendDecodingSnapshots();
+
 		/* Build temporary entry, but don't link it into hashtable */
 		newrel = RelationBuildDesc(save_relid, false);
 		if (newrel == NULL)
@@ -2047,6 +2066,9 @@ RelationClearRelation(Relation relation, bool rebuild)
 
 		/* And now we can throw away the temporary entry */
 		RelationDestroyRelation(newrel);
+
+		if (IsSystemRelation(relation))
+			UnSuspendDecodingSnapshots();
 	}
 }
 
@@ -3552,7 +3574,10 @@ RelationGetIndexList(Relation relation)
 					Form_pg_attribute attr;
 					/* internal column, like oid */
 					if (attno <= 0)
-						continue;
+					{
+						found = false;
+						break;
+					}
 
 					attr = relation->rd_att->attrs[attno - 1];
 					if (!attr->attnotnull)
@@ -3840,17 +3865,26 @@ RelationGetIndexPredicate(Relation relation)
  * be bms_free'd when not needed anymore.
  */
 Bitmapset *
-RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
+RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
 {
 	Bitmapset  *indexattrs;
-	Bitmapset  *uindexattrs;
+	Bitmapset  *uindexattrs; /* unique keys */
+	Bitmapset  *cindexattrs; /* best candidate key */
 	List	   *indexoidlist;
 	ListCell   *l;
 	MemoryContext oldcxt;
 
 	/* Quick exit if we already computed the result. */
 	if (relation->rd_indexattr != NULL)
-		return bms_copy(keyAttrs ? relation->rd_keyattr : relation->rd_indexattr);
+		switch(attrKind)
+		{
+			case INDEX_ATTR_BITMAP_CANDIDATE_KEY:
+				return bms_copy(relation->rd_ckeyattr);
+			case INDEX_ATTR_BITMAP_KEY:
+				return bms_copy(relation->rd_keyattr);
+			case INDEX_ATTR_BITMAP_ALL:
+				return bms_copy(relation->rd_indexattr);
+		}
 
 	/* Fast path if definitely no indexes */
 	if (!RelationGetForm(relation)->relhasindex)
@@ -3877,13 +3911,16 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
 	 */
 	indexattrs = NULL;
 	uindexattrs = NULL;
+	cindexattrs = NULL;
 	foreach(l, indexoidlist)
 	{
 		Oid			indexOid = lfirst_oid(l);
 		Relation	indexDesc;
 		IndexInfo  *indexInfo;
 		int			i;
-		bool		isKey;
+		bool		isCKey;/* candidate or primary key */
+		bool		isKey;/* key member */
+
 
 		indexDesc = index_open(indexOid, AccessShareLock);
 
@@ -3895,6 +3932,8 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
 			indexInfo->ii_Expressions == NIL &&
 			indexInfo->ii_Predicate == NIL;
 
+		isCKey = indexOid == relation->rd_primary;
+
 		/* Collect simple attribute references */
 		for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
 		{
@@ -3904,6 +3943,11 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
 			{
 				indexattrs = bms_add_member(indexattrs,
 							   attrnum - FirstLowInvalidHeapAttributeNumber);
+
+				if (isCKey)
+					cindexattrs = bms_add_member(cindexattrs,
+												 attrnum - FirstLowInvalidHeapAttributeNumber);
+
 				if (isKey)
 					uindexattrs = bms_add_member(uindexattrs,
 							   attrnum - FirstLowInvalidHeapAttributeNumber);
@@ -3925,10 +3969,21 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
 	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
 	relation->rd_indexattr = bms_copy(indexattrs);
 	relation->rd_keyattr = bms_copy(uindexattrs);
+	relation->rd_ckeyattr = bms_copy(cindexattrs);
 	MemoryContextSwitchTo(oldcxt);
 
 	/* We return our original working copy for caller to play with */
-	return keyAttrs ? uindexattrs : indexattrs;
+	switch(attrKind)
+	{
+		case INDEX_ATTR_BITMAP_CANDIDATE_KEY:
+			return cindexattrs;
+		case INDEX_ATTR_BITMAP_KEY:
+			return uindexattrs;
+		case INDEX_ATTR_BITMAP_ALL:
+			return indexattrs;
+		default:
+			elog(ERROR, "unknown attrKind %u", attrKind);
+	}
 }
 
 /*
@@ -4903,3 +4958,49 @@ unlink_initfile(const char *initfilename)
 			elog(LOG, "could not remove cache file \"%s\": %m", initfilename);
 	}
 }
+
+bool
+RelationIsDoingTimetravelInternal(Relation relation)
+{
+	Assert(wal_level >= WAL_LEVEL_LOGICAL);
+
+	if (!RelationNeedsWAL(relation))
+		return false;
+
+	/*
+	 * XXX: Doing this test instead of using IsSystemNamespace has the
+	 * advantage of classifying a catalog relation's toast tables as a
+	 * timetravel relation as well. This is safe since even a oid wraparound
+	 * will preserve this property (c.f. GetNewObjectId()).
+	 */
+	if (IsSystemRelation(relation))
+		return true;
+
+	/*
+	 * Also log relevant data if we want the table to behave as a catalog
+	 * table, although its not a system provided one.
+	 * XXX: we need to make sure both the relation and its toast relation have
+	 * the flag set!
+	 */
+	if (RelationIsTreatedAsCatalogTable(relation))
+	    return true;
+
+	return false;
+}
+
+bool
+RelationIsLogicallyLoggedInternal(Relation relation)
+{
+	Assert(wal_level >= WAL_LEVEL_LOGICAL);
+	if (!RelationNeedsWAL(relation))
+		return false;
+	/*
+	 * XXX: In addition to the above comment, we could decide to always log
+	 * data even for real system catalogs, although the benefits of that seem
+	 * unclear.
+	 */
+	if (IsSystemRelation(relation))
+		return false;
+
+	return true;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea16c64..896df78 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -57,6 +57,7 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "postmaster/walwriter.h"
+#include "replication/logical.h"
 #include "replication/syncrep.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
@@ -2047,6 +2048,17 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		/* see max_connections */
+		{"max_logical_slots", PGC_POSTMASTER, REPLICATION_SENDING,
+			gettext_noop("Sets the maximum number of simultaneously defined WAL decoding slots."),
+			NULL
+		},
+		&max_logical_slots,
+		0, 0, MAX_BACKENDS /*?*/,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
 			gettext_noop("Sets the maximum time to wait for WAL replication."),
 			NULL,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 0303ac7..92f276d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -160,7 +160,7 @@
 
 # - Settings -
 
-#wal_level = minimal			# minimal, archive, or hot_standby
+#wal_level = minimal			# minimal, archive, logical or hot_standby
 					# (change requires restart)
 #fsync = on				# turns forced synchronization on or off
 #synchronous_commit = on		# synchronization level;
@@ -207,11 +207,18 @@
 
 # Set these on the master and on any standby that will send replication data.
 
-#max_wal_senders = 0		# max number of walsender processes
+#max_wal_senders = 0		# max number of walsender processes, including
+				# both physical and logical replication senders.
 				# (change requires restart)
 #wal_keep_segments = 0		# in logfile segments, 16MB each; 0 disables
 #wal_sender_timeout = 60s	# in milliseconds; 0 disables
 
+#max_logical_slots = 0		# max number of logical replication sender
+				# and receiver processes. Logical senders
+				# (but not receivers) also consume a
+				# max_wal_senders slot.
+				# (change requires restart)
+
 # - Master Server -
 
 # These settings are ignored on a standby server.
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index e739d2d..4162f92 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -75,13 +75,14 @@ static Snapshot SecondarySnapshot = NULL;
  * for the convenience of TransactionIdIsInProgress: even in bootstrap
  * mode, we don't want it to say that BootstrapTransactionId is in progress.
  *
- * RecentGlobalXmin is initialized to InvalidTransactionId, to ensure that no
+ * RecentGlobal(Data)?Xmin is initialized to InvalidTransactionId, to ensure that no
  * one tries to use a stale value.	Readers should ensure that it has been set
  * to something else before using it.
  */
 TransactionId TransactionXmin = FirstNormalTransactionId;
 TransactionId RecentXmin = FirstNormalTransactionId;
 TransactionId RecentGlobalXmin = InvalidTransactionId;
+TransactionId RecentGlobalDataXmin = InvalidTransactionId;
 
 /*
  * Elements of the active snapshot stack.
@@ -731,7 +732,7 @@ AtEOXact_Snapshot(bool isCommit)
  *		Returns the token (the file name) that can be used to import this
  *		snapshot.
  */
-static char *
+char *
 ExportSnapshot(Snapshot snapshot)
 {
 	TransactionId topXid;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 3254a2d..24f0949 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -64,6 +64,8 @@
 #include "access/xact.h"
 #include "storage/bufmgr.h"
 #include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/combocid.h"
 #include "utils/tqual.h"
 
 
@@ -73,9 +75,17 @@ SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
 SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
 SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};
 
+static Snapshot SnapshotNowDecoding;
+/* (table, ctid) => (cmin, cmax) mapping during timetravel */
+static HTAB *tuplecid_data = NULL;
+static int timetravel_suspended = 0;
+
+
 /* local functions */
 static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
-
+static bool FailsSatisfies(HeapTuple htup, Snapshot snapshot, Buffer buffer);
+static bool RedirectSatisfiesNow(HeapTuple htup, Snapshot snapshot,
+								 Buffer buffer);
 
 /*
  * SetHintBits()
@@ -1700,3 +1710,242 @@ HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
 	 */
 	return true;
 }
+
+/*
+ * check whether the transaciont id 'xid' in in the pre-sorted array 'xip'.
+ */
+static bool
+TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
+{
+	return bsearch(&xid, xip, num,
+	               sizeof(TransactionId), xidComparator) != NULL;
+}
+
+/*
+ * See the comments for HeapTupleSatisfiesMVCC for the semantics this function
+ * obeys.
+ *
+ * Only usable on tuples from catalog tables!
+ *
+ * We don't need to support HEAP_MOVED_(IN|OFF) for now because we only support
+ * reading catalog pages which couldn't have been created in an older version.
+ *
+ * We don't set any hint bits in here as it seems unlikely to be beneficial as
+ * those should already be set by normal access and it seems to be too
+ * dangerous to do so as the semantics of doing so during timetravel are more
+ * complicated than when dealing "only" with the present.
+ */
+bool
+HeapTupleSatisfiesMVCCDuringDecoding(HeapTuple htup, Snapshot snapshot,
+                                     Buffer buffer)
+{
+	HeapTupleHeader tuple = htup->t_data;
+	TransactionId xmin = HeapTupleHeaderGetXmin(tuple);
+	TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
+
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
+	/* transaction aborted */
+	if (tuple->t_infomask & HEAP_XMIN_INVALID)
+	{
+		Assert(!TransactionIdDidCommit(xmin));
+		return false;
+	}
+    /* check if its one of our txids, toplevel is also in there */
+	else if (TransactionIdInArray(xmin, snapshot->subxip, snapshot->subxcnt))
+	{
+		CommandId cmin = HeapTupleHeaderGetRawCommandId(tuple);
+		CommandId cmax = InvalidCommandId;
+
+		/*
+		 * if another transaction deleted this tuple or if cmin/cmax is stored
+		 * in a combocid we need to to lookup the actual values externally.
+		 */
+		if ((!(tuple->t_infomask & HEAP_XMAX_INVALID) &&
+			 !TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt)) ||
+			tuple->t_infomask & HEAP_COMBOCID
+			)
+		{
+			bool resolved;
+
+			resolved = ResolveCminCmaxDuringDecoding(tuplecid_data, htup,
+													 buffer, &cmin, &cmax);
+
+			if (!resolved)
+				elog(ERROR, "could not resolve cmin/cmax of catalog tuple");
+		}
+
+		if (cmin >= snapshot->curcid)
+			return false;	/* inserted after scan started */
+	}
+	/* normal transaction state counts */
+	else if (TransactionIdPrecedes(xmin, snapshot->xmin))
+	{
+		Assert(!(tuple->t_infomask & HEAP_XMIN_COMMITTED &&
+				 !TransactionIdDidCommit(xmin)));
+
+		if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED) &&
+			!TransactionIdDidCommit(xmin))
+			return false;
+	}
+	/* beyond our xmax horizon, i.e. invisible */
+	else if (TransactionIdFollowsOrEquals(xmin, snapshot->xmax))
+	{
+		return false;
+	}
+	/* check if we know the transaction has committed */
+	else if(TransactionIdInArray(xmin, snapshot->xip, snapshot->xcnt))
+	{
+	}
+	else
+	{
+		return false;
+	}
+
+	/* at this point we know xmin is visible, check xmax */
+
+	/* why should those be in catalog tables? */
+	Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
+
+	/* xid invalid or aborted */
+	if (tuple->t_infomask & HEAP_XMAX_INVALID)
+		return true;
+	/* locked tuples are always visible */
+	else if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+		return true;
+    /* check if its one of our txids, toplevel is also in there */
+	else if (TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt))
+	{
+		CommandId cmin;
+		CommandId cmax = HeapTupleHeaderGetRawCommandId(tuple);
+
+		/* Lookup actual cmin/cmax values */
+		if (tuple->t_infomask & HEAP_COMBOCID)
+		{
+			bool resolved;
+
+			resolved = ResolveCminCmaxDuringDecoding(tuplecid_data, htup,
+													 buffer, &cmin, &cmax);
+
+			if (!resolved)
+				elog(ERROR, "could not resolve combocid to cmax");
+		}
+
+		if (cmax >= snapshot->curcid)
+			return true;	/* deleted after scan started */
+		else
+			return false;	/* deleted before scan started */
+	}
+	/* normal transaction state is valid */
+	else if (TransactionIdPrecedes(xmax, snapshot->xmin))
+	{
+		Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED &&
+				 !TransactionIdDidCommit(xmax)));
+
+		if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
+			return false;
+
+		return !TransactionIdDidCommit(xmax);
+	}
+	/* we cannot possibly see the deleting transaction */
+	else if (TransactionIdFollowsOrEquals(xmax, snapshot->xmax))
+		return true;
+	/* do we know that the deleting txn is valid? */
+	else if (TransactionIdInArray(xmax, snapshot->xip, snapshot->xcnt))
+		return false;
+	else
+		return true;
+}
+
+/*
+ * Setup a replacement SnapshotNow that allows catalog access to behave just
+ * like it did at a certain point in the past.
+ *
+ * Needed for after-the-fact WAL decoding.
+ */
+void
+SetupDecodingSnapshots(Snapshot snapshot_now, HTAB *tuplecids)
+{
+	/* prevent recursively setting up decoding snapshots */
+	Assert(SnapshotNowData.satisfies != RedirectSatisfiesNow);
+
+	SnapshotNowData.satisfies = RedirectSatisfiesNow;
+	/* make sure normal snapshots aren't used*/
+	SnapshotSelfData.satisfies = FailsSatisfies;
+	SnapshotAnyData.satisfies = FailsSatisfies;
+	/* don't overwrite SnapshotToastData, we want that to behave normally */
+
+	/* setup the timetravel snapshot */
+	SnapshotNowDecoding = snapshot_now;
+
+	/* setup (cmin, cmax) lookup hash */
+	tuplecid_data = tuplecids;
+
+	timetravel_suspended = 0;
+}
+
+
+/*
+ * Make SnapshotNow behave normally again.
+ */
+void
+RevertFromDecodingSnapshots(void)
+{
+	SnapshotNowDecoding = NULL;
+	tuplecid_data = NULL;
+
+	/* rally to restore sanity and/or boredom */
+	SnapshotNowData.satisfies = HeapTupleSatisfiesNow;
+	SnapshotSelfData.satisfies = HeapTupleSatisfiesSelf;
+	SnapshotAnyData.satisfies = HeapTupleSatisfiesAny;
+	timetravel_suspended = 0;
+}
+
+/*
+ * Disable timetravel SnapshotNow emulation and perform old-fashioned
+ * SnapshotNow access but make re-enabling cheap.. This is useful for accessing
+ * catalog entries which must stay up2date like the pg_class entries of system
+ * relations.
+ *
+ * Can be called several times in a nested fashion since several of it's
+ * callers suspend timetravel access on several code levels.
+ */
+void
+SuspendDecodingSnapshots(void)
+{
+	timetravel_suspended++;
+}
+
+/*
+ * Enable timetravel SnapshotNow emulation again.
+ */
+void
+UnSuspendDecodingSnapshots(void)
+{
+	timetravel_suspended--;
+}
+
+/*
+ * Error out if a normal snapshot is used. That is neither legal nor expected
+ * during timetravel, so this is just extra assurance.
+ */
+static bool
+FailsSatisfies(HeapTuple htup, Snapshot snapshot, Buffer buffer)
+{
+	elog(ERROR, "Normal snapshots cannot be used during timetravel access.");
+	return false;
+}
+
+/*
+ * Call the replacement SatisifiesNow with the required SnapshotNow data.
+ */
+static bool
+RedirectSatisfiesNow(HeapTuple htup, Snapshot snapshot, Buffer buffer)
+{
+	Assert(SnapshotNowDecoding != NULL);
+	if (timetravel_suspended > 0)
+		return HeapTupleSatisfiesNow(htup, snapshot, buffer);
+	return HeapTupleSatisfiesMVCCDuringDecoding(htup, SnapshotNowDecoding,
+	                                            buffer);
+}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9ff96c6..18b8ca0 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -193,7 +193,9 @@ const char *subdirs[] = {
 	"base/1",
 	"pg_tblspc",
 	"pg_stat",
-	"pg_stat_tmp"
+	"pg_stat_tmp",
+	"pg_llog",
+	"pg_llog/snapshots"
 };
 
 
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index a790f99..8d86de0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -77,6 +77,8 @@ wal_level_str(WalLevel wal_level)
 			return "archive";
 		case WAL_LEVEL_HOT_STANDBY:
 			return "hot_standby";
+		case WAL_LEVEL_LOGICAL:
+			return "logical";
 	}
 	return _("unrecognized wal_level");
 }
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 4381778..42f3e6b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -55,6 +55,18 @@
 #define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
+#define XLOG_HEAP2_NEW_CID		0x70
+
+/*
+ * xl_heap_* ->flag values
+ */
+/* PD_ALL_VISIBLE was cleared */
+#define XLOG_HEAP_ALL_VISIBLE_CLEARED		(1<<0)
+/* PD_ALL_VISIBLE was cleared in the 2nd page */
+#define XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED	(1<<1)
+#define XLOG_HEAP_CONTAINS_OLD_TUPLE		(1<<2)
+#define XLOG_HEAP_CONTAINS_OLD_KEY			(1<<3)
+#define XLOG_HEAP_CONTAINS_NEW_TUPLE		(1<<4)
 
 /*
  * All what we need to find changed tuple
@@ -78,10 +90,10 @@ typedef struct xl_heap_delete
 	xl_heaptid	target;			/* deleted tuple id */
 	TransactionId xmax;			/* xmax of the deleted tuple */
 	uint8		infobits_set;	/* infomask bits */
-	bool		all_visible_cleared;	/* PD_ALL_VISIBLE was cleared */
+	uint8		flags;
 } xl_heap_delete;
 
-#define SizeOfHeapDelete	(offsetof(xl_heap_delete, all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapDelete	(offsetof(xl_heap_delete, flags) + sizeof(uint8))
 
 /*
  * We don't store the whole fixed part (HeapTupleHeaderData) of an inserted
@@ -100,15 +112,23 @@ typedef struct xl_heap_header
 
 #define SizeOfHeapHeader	(offsetof(xl_heap_header, t_hoff) + sizeof(uint8))
 
+typedef struct xl_heap_header_len
+{
+	uint16      t_len;
+	xl_heap_header header;
+} xl_heap_header_len;
+
+#define SizeOfHeapHeaderLen	(offsetof(xl_heap_header_len, header) + SizeOfHeapHeader)
+
 /* This is what we need to know about insert */
 typedef struct xl_heap_insert
 {
 	xl_heaptid	target;			/* inserted tuple id */
-	bool		all_visible_cleared;	/* PD_ALL_VISIBLE was cleared */
+	uint8		flags;
 	/* xl_heap_header & TUPLE DATA FOLLOWS AT END OF STRUCT */
 } xl_heap_insert;
 
-#define SizeOfHeapInsert	(offsetof(xl_heap_insert, all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapInsert	(offsetof(xl_heap_insert, flags) + sizeof(uint8))
 
 /*
  * This is what we need to know about a multi-insert. The record consists of
@@ -120,7 +140,7 @@ typedef struct xl_heap_multi_insert
 {
 	RelFileNode node;
 	BlockNumber blkno;
-	bool		all_visible_cleared;
+	uint8		flags;
 	uint16		ntuples;
 	OffsetNumber offsets[1];
 
@@ -147,13 +167,12 @@ typedef struct xl_heap_update
 	TransactionId old_xmax;		/* xmax of the old tuple */
 	TransactionId new_xmax;		/* xmax of the new tuple */
 	ItemPointerData newtid;		/* new inserted tuple id */
-	uint8		old_infobits_set;		/* infomask bits to set on old tuple */
-	bool		all_visible_cleared;	/* PD_ALL_VISIBLE was cleared */
-	bool		new_all_visible_cleared;		/* same for the page of newtid */
+	uint8		old_infobits_set;	/* infomask bits to set on old tuple */
+	uint8		flags;
 	/* NEW TUPLE xl_heap_header AND TUPLE DATA FOLLOWS AT END OF STRUCT */
 } xl_heap_update;
 
-#define SizeOfHeapUpdate	(offsetof(xl_heap_update, new_all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapUpdate	(offsetof(xl_heap_update, flags) + sizeof(uint8))
 
 /*
  * This is what we need to know about vacuum page cleanup/redirect
@@ -261,6 +280,28 @@ typedef struct xl_heap_visible
 
 #define SizeOfHeapVisible (offsetof(xl_heap_visible, cutoff_xid) + sizeof(TransactionId))
 
+typedef struct xl_heap_new_cid
+{
+	/*
+	 * store toplevel xid so we don't have to merge cids from different
+	 * transactions
+	 */
+	TransactionId top_xid;
+	CommandId cmin;
+	CommandId cmax;
+	/*
+	 * don't really need the combocid but the padding makes it free and its
+	 * useful for debugging.
+	 */
+	CommandId combocid;
+	/*
+	 * Store the relfilenode/ctid pair to facilitate lookups.
+	 */
+	xl_heaptid target;
+} xl_heap_new_cid;
+
+#define SizeOfHeapNewCid (offsetof(xl_heap_new_cid, target) + SizeOfHeapTid)
+
 extern void HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
 									   TransactionId *latestRemovedXid);
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 23a41fd..8452ec5 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -63,6 +63,11 @@
 	(AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
 	(int32) ((id1) - (id2)) < 0)
 
+/* compare two XIDs already known to be normal; this is a macro for speed */
+#define NormalTransactionIdFollows(id1, id2) \
+	(AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
+	(int32) ((id1) - (id2)) > 0)
+
 /* ----------
  *		Object ID (OID) zero is InvalidOid.
  *
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index b4a75ce..80f9ab6 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -196,7 +196,8 @@ typedef enum WalLevel
 {
 	WAL_LEVEL_MINIMAL = 0,
 	WAL_LEVEL_ARCHIVE,
-	WAL_LEVEL_HOT_STANDBY
+	WAL_LEVEL_HOT_STANDBY,
+	WAL_LEVEL_LOGICAL
 } WalLevel;
 extern int	wal_level;
 
@@ -209,9 +210,12 @@ extern int	wal_level;
  */
 #define XLogIsNeeded() (wal_level >= WAL_LEVEL_ARCHIVE)
 
-/* Do we need to WAL-log information required only for Hot Standby? */
+/* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_HOT_STANDBY)
 
+/* Do we need to WAL-log information required only for logical replication? */
+#define XLogLogicalInfoActive() (wal_level >= WAL_LEVEL_LOGICAL)
+
 #ifdef WAL_DEBUG
 extern bool XLOG_DEBUG;
 #endif
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 3829ce2..72179ab 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -19,6 +19,7 @@
 #ifndef XLOGREADER_H
 #define XLOGREADER_H
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 
 typedef struct XLogReaderState XLogReaderState;
@@ -108,10 +109,19 @@ struct XLogReaderState
 	char	   *errormsg_buf;
 };
 
-/* Get a new XLogReader */
+
 extern XLogReaderState *XLogReaderAllocate(XLogPageReadCB pagereadfunc,
 				   void *private_data);
 
+
+typedef struct XLogRecordBuffer
+{
+	XLogRecPtr origptr;
+	XLogRecord record;
+	char *record_data;
+} XLogRecordBuffer;
+
+
 /* Free an XLogReader */
 extern void XLogReaderFree(XLogReaderState *state);
 
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 44b6f38..a96ed69 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -23,6 +23,7 @@ extern ForkNumber forkname_to_number(char *forkName);
 extern char *GetDatabasePath(Oid dbNode, Oid spcNode);
 
 
+extern bool IsSystemRelationId(Oid relid);
 extern bool IsSystemRelation(Relation relation);
 extern bool IsToastRelation(Relation relation);
 
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8d268dd..9b38477 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2619,6 +2619,8 @@ DATA(insert OID = 2022 (  pg_stat_get_activity			PGNSP PGUID 12 1 100 0 0 f f f
 DESCR("statistics: information about currently active backends");
 DATA(insert OID = 3099 (  pg_stat_get_wal_senders	PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 "" "{23,25,25,25,25,25,23,25}" "{o,o,o,o,o,o,o,o}" "{pid,state,sent_location,write_location,flush_location,replay_location,sync_priority,sync_state}" _null_ pg_stat_get_wal_senders _null_ _null_ _null_ ));
 DESCR("statistics: information about currently active replication");
+DATA(insert OID = 3457 (  pg_stat_get_logical_decoding_slots	PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 "" "{25,25,26,16,28,25}" "{o,o,o,o,o,o}" "{slot_name,plugin,database,active,xmin,restart_decoding_lsn}" _null_ pg_stat_get_logical_decoding_slots _null_ _null_ _null_ ));
+DESCR("statistics: information about logical replication slots currently in use");
 DATA(insert OID = 2026 (  pg_backend_pid				PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 23 "" _null_ _null_ _null_ _null_ pg_backend_pid _null_ _null_ _null_ ));
 DESCR("statistics: current backend PID");
 DATA(insert OID = 1937 (  pg_stat_get_backend_pid		PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 23 "23" _null_ _null_ _null_ _null_ pg_stat_get_backend_pid _null_ _null_ _null_ ));
@@ -4723,6 +4725,10 @@ DESCR("SP-GiST support for quad tree over range");
 DATA(insert OID = 3473 (  spg_range_quad_leaf_consistent	PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "2281 2281" _null_ _null_ _null_ _null_  spg_range_quad_leaf_consistent _null_ _null_ _null_ ));
 DESCR("SP-GiST support for quad tree over range");
 
+DATA(insert OID = 3779 (  init_logical_replication PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2249 "19 19" "{19,19,25,25}" "{i,i,o,o}" "{slotname,plugin,slotname,xlog_position}" _null_ init_logical_replication _null_ _null_ _null_ ));
+DESCR("set up a logical replication slot");
+DATA(insert OID = 3780 (  stop_logical_replication PGNSP PGUID 12 1 0 0 0 f f f f f f v 1 0 23 "19" _null_ _null_ _null_ _null_ stop_logical_replication _null_ _null_ _null_ ));
+DESCR("stop logical replication");
 
 DATA(insert OID = 3781 (  pg_xlog_wait_remote_apply PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2278 "25 23" _null_ _null_ _null_ _null_ pg_xlog_wait_remote_apply _null_ _null_ _null_ ));
 DESCR("wait for an lsn to be applied by a remote node");
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index d8dd8b0..2616ac1 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -156,7 +156,7 @@ extern void vac_update_relstats(Relation relation,
 					TransactionId frozenxid,
 					MultiXactId minmulti);
 extern void vacuum_set_xid_limits(int freeze_min_age, int freeze_table_age,
-					  bool sharedRel,
+					  bool sharedRel, bool catalogRel,
 					  TransactionId *oldestXmin,
 					  TransactionId *freezeLimit,
 					  TransactionId *freezeTableLimit,
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 0d5c007..0b17182 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -408,6 +408,9 @@ typedef enum NodeTag
 	T_IdentifySystemCmd,
 	T_BaseBackupCmd,
 	T_StartReplicationCmd,
+	T_InitLogicalReplicationCmd,
+	T_StartLogicalReplicationCmd,
+	T_FreeLogicalReplicationCmd,
 	T_TimeLineHistoryCmd,
 
 	/*
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 85b4544..3da8d40 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -52,6 +52,41 @@ typedef struct StartReplicationCmd
 
 
 /* ----------------------
+ *		INIT_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct InitLogicalReplicationCmd
+{
+	NodeTag		type;
+	char       *name;
+	char       *plugin;
+} InitLogicalReplicationCmd;
+
+
+/* ----------------------
+ *		START_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct StartLogicalReplicationCmd
+{
+	NodeTag		type;
+	char       *name;
+	XLogRecPtr	startpoint;
+	List       *options;
+} StartLogicalReplicationCmd;
+
+/* ----------------------
+ *		FREE_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct FreeLogicalReplicationCmd
+{
+	NodeTag		type;
+	char       *name;
+} FreeLogicalReplicationCmd;
+
+
+/* ----------------------
  *		TIMELINE_HISTORY command
  * ----------------------
  */
diff --git a/src/include/replication/decode.h b/src/include/replication/decode.h
new file mode 100644
index 0000000..dd3f2ca
--- /dev/null
+++ b/src/include/replication/decode.h
@@ -0,0 +1,20 @@
+/*-------------------------------------------------------------------------
+ * decode.h
+ *	   PostgreSQL WAL to logical transformation
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DECODE_H
+#define DECODE_H
+
+#include "access/xlogreader.h"
+#include "replication/reorderbuffer.h"
+#include "replication/logical.h"
+
+void DecodeRecordIntoReorderBuffer(LogicalDecodingContext *ctx,
+							  XLogRecordBuffer *buf);
+
+#endif
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
new file mode 100644
index 0000000..971180b
--- /dev/null
+++ b/src/include/replication/logical.h
@@ -0,0 +1,198 @@
+/*-------------------------------------------------------------------------
+ * logical.h
+ *	   PostgreSQL WAL to logical transformation
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LOGICAL_H
+#define LOGICAL_H
+
+#include "access/xlog.h"
+#include "access/xlogreader.h"
+#include "replication/output_plugin.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+
+/*
+ * Shared memory state of a single logical decoding slot
+ */
+typedef struct LogicalDecodingSlot
+{
+	/* lock, on same cacheline as effective_xmin */
+	slock_t		mutex;
+
+	/* on-disk xmin, updated first */
+	TransactionId xmin;
+
+	/* in-memory xmin, updated after syncing to disk */
+	TransactionId effective_xmin;
+
+	/* is this slot defined */
+	bool		in_use;
+
+	/* is somebody streaming out changes for this slot */
+	bool		active;
+
+	/* have we been aborted while ->active */
+	bool		aborted;
+
+	/* ----
+	 * If we shutdown, crash, whatever where do we have to restart decoding
+	 * from to
+	 * a) find a valid & ready snapshot
+	 * b) the complete content for all in-progress xacts
+	 * ----
+	 */
+	XLogRecPtr	restart_decoding;
+
+	/*
+	 * Last location we know the client has confirmed to have safely received
+	 * data to. No earlier data can be decoded after a restart/crash.
+	 */
+	XLogRecPtr	confirmed_flush;
+
+	/* ----
+	 * When the client has confirmed flushes >= candidate_xmin_after we can
+	 * a) advance the pegged xmin
+	 * b) advance restart_decoding_from so we have to read/keep less WAL
+	 * ----
+	 */
+	XLogRecPtr	candidate_lsn;
+	TransactionId candidate_xmin;
+	XLogRecPtr	candidate_restart_decoding;
+
+	/* database the slot is active on */
+	Oid			database;
+
+	/* slot identifier */
+	NameData	name;
+
+	/* plugin name */
+	NameData	plugin;
+} LogicalDecodingSlot;
+
+/*
+ * Shared memory control area for all of logical decoding
+ */
+typedef struct LogicalDecodingCtlData
+{
+	/*
+	 * Xmin across all logical slots.
+	 *
+	 * Protected by ProcArrayLock.
+	 */
+	TransactionId xmin;
+
+	LogicalDecodingSlot logical_slots[FLEXIBLE_ARRAY_MEMBER];
+} LogicalDecodingCtlData;
+
+/*
+ * Pointers to shared memory
+ */
+extern LogicalDecodingCtlData *LogicalDecodingCtl;
+extern LogicalDecodingSlot *MyLogicalDecodingSlot;
+
+struct LogicalDecodingContext;
+
+typedef void (*LogicalOutputPluginWriterWrite) (
+										   struct LogicalDecodingContext *lr,
+															XLogRecPtr Ptr,
+															TransactionId xid
+);
+
+typedef LogicalOutputPluginWriterWrite LogicalOutputPluginWriterPrepareWrite;
+
+/*
+ * Output plugin callbacks
+ */
+typedef struct OutputPluginCallbacks
+{
+	LogicalDecodeInitCB init_cb;
+	LogicalDecodeBeginCB begin_cb;
+	LogicalDecodeChangeCB change_cb;
+	LogicalDecodeCommitCB commit_cb;
+	LogicalDecodeCleanupCB cleanup_cb;
+} OutputPluginCallbacks;
+
+typedef struct LogicalDecodingContext
+{
+	struct XLogReaderState *reader;
+	struct LogicalDecodingSlot *slot;
+	struct ReorderBuffer *reorder;
+	struct SnapBuild *snapshot_builder;
+
+	struct OutputPluginCallbacks callbacks;
+
+	bool		stop_after_consistent;
+
+	/*
+	 * User specified options
+	 */
+	List	   *output_plugin_options;
+
+	/*
+	 * User-Provided callback for writing/streaming out data.
+	 */
+	LogicalOutputPluginWriterPrepareWrite prepare_write;
+	LogicalOutputPluginWriterWrite write;
+
+	/*
+	 * Output buffer.
+	 */
+	StringInfo	out;
+
+	/*
+	 * Private data pointer for the creator of the logical decoding context.
+	 */
+	void	   *owner_private;
+
+	/*
+	 * Private data pointer of the output plugin.
+	 */
+	void	   *output_plugin_private;
+
+	/*
+	 * Private data pointer for the data writer.
+	 */
+	void	   *output_writer_private;
+} LogicalDecodingContext;
+
+/* GUCs */
+extern PGDLLIMPORT int max_logical_slots;
+
+extern Size LogicalDecodingShmemSize(void);
+extern void LogicalDecodingShmemInit(void);
+
+extern void LogicalDecodingAcquireFreeSlot(const char *name, const char *plugin);
+extern void LogicalDecodingReleaseSlot(void);
+extern void LogicalDecodingReAcquireSlot(const char *name);
+extern void LogicalDecodingFreeSlot(const char *name);
+
+extern void ComputeLogicalXmin(void);
+
+/* change logical xmin */
+extern void IncreaseLogicalXminForSlot(XLogRecPtr lsn, TransactionId xmin);
+
+/* change recovery restart location */
+extern void IncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart_lsn);
+
+extern void LogicalConfirmReceivedLocation(XLogRecPtr lsn);
+
+extern void CheckLogicalReplicationRequirements(void);
+
+extern void StartupLogicalReplication(XLogRecPtr checkPointRedo);
+
+extern LogicalDecodingContext *CreateLogicalDecodingContext(
+							 LogicalDecodingSlot *slot,
+							 bool is_init,
+							 XLogRecPtr	start_lsn,
+							 List *output_plugin_options,
+							 XLogPageReadCB read_page,
+						 LogicalOutputPluginWriterPrepareWrite prepare_write,
+							 LogicalOutputPluginWriterWrite do_write);
+extern bool LogicalDecodingContextReady(LogicalDecodingContext *ctx);
+extern void FreeLogicalDecodingContext(LogicalDecodingContext *ctx);
+
+#endif
diff --git a/src/include/replication/logicalfuncs.h b/src/include/replication/logicalfuncs.h
new file mode 100644
index 0000000..37f36a5
--- /dev/null
+++ b/src/include/replication/logicalfuncs.h
@@ -0,0 +1,19 @@
+/*-------------------------------------------------------------------------
+ * logicalfuncs.h
+ *	   PostgreSQL WAL to logical transformation support functions
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LOGICALFUNCS_H
+#define LOGICALFUNCS_H
+
+extern int logical_read_local_xlog_page(XLogReaderState *state,
+							 XLogRecPtr targetPagePtr,
+							 int reqLen, XLogRecPtr targetRecPtr,
+							 char *cur_page, TimeLineID *pageTLI);
+
+extern Datum pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS);
+
+#endif
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
new file mode 100644
index 0000000..66b4fd9
--- /dev/null
+++ b/src/include/replication/output_plugin.h
@@ -0,0 +1,73 @@
+/*-------------------------------------------------------------------------
+ * output_plugin.h
+ *	   PostgreSQL Logical Decode Plugin Interface
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OUTPUT_PLUGIN_H
+#define OUTPUT_PLUGIN_H
+
+#include "replication/reorderbuffer.h"
+
+struct LogicalDecodingContext;
+
+/*
+ * Callback that gets called in a user-defined plugin.	ctx->private_data can
+ * be set to some private data.
+ *
+ * Gets looked up via the library symbol pg_decode_init.
+ */
+typedef void (*LogicalDecodeInitCB) (
+										  struct LogicalDecodingContext *ctx,
+												 bool is_init
+);
+
+/*
+ * Gets called for every BEGIN of a successful transaction.
+ *
+ * Return "true" if the message in "out" should get sent, false otherwise.
+ *
+ * Gets looked up via the library symbol pg_decode_begin_txn.
+ */
+typedef bool (*LogicalDecodeBeginCB) (
+											 struct LogicalDecodingContext *,
+												  ReorderBufferTXN *txn);
+
+/*
+ * Gets called for every change in a successful transaction.
+ *
+ * Return "true" if the message in "out" should get sent, false otherwise.
+ *
+ * Gets looked up via the library symbol pg_decode_change.
+ */
+typedef bool (*LogicalDecodeChangeCB) (
+											 struct LogicalDecodingContext *,
+												   ReorderBufferTXN *txn,
+												   Relation relation,
+												   ReorderBufferChange *change
+);
+
+/*
+ * Gets called for every COMMIT of a successful transaction.
+ *
+ * Return "true" if the message in "out" should get sent, false otherwise.
+ *
+ * Gets looked up via the library symbol pg_decode_commit_txn.
+ */
+typedef bool (*LogicalDecodeCommitCB) (
+											 struct LogicalDecodingContext *,
+												   ReorderBufferTXN *txn,
+												   XLogRecPtr commit_lsn);
+
+/*
+ * Gets called to cleanup the state of an output plugin
+ *
+ * Gets looked up via the library symbol pg_decode_cleanup.
+ */
+typedef void (*LogicalDecodeCleanupCB) (
+											  struct LogicalDecodingContext *
+);
+
+#endif   /* OUTPUT_PLUGIN_H */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
new file mode 100644
index 0000000..b34b6fd
--- /dev/null
+++ b/src/include/replication/reorderbuffer.h
@@ -0,0 +1,320 @@
+/*
+ * reorderbuffer.h
+ *
+ * PostgreSQL logical replay "cache" management
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * src/include/replication/reorderbuffer.h
+ */
+#ifndef REORDERBUFFER_H
+#define REORDERBUFFER_H
+
+#include "access/htup_details.h"
+#include "utils/hsearch.h"
+#include "utils/rel.h"
+
+#include "lib/ilist.h"
+
+#include "storage/sinval.h"
+
+#include "utils/snapshot.h"
+
+
+typedef struct ReorderBuffer ReorderBuffer;
+
+/* types of the change passed to a 'change' callback */
+enum ReorderBufferChangeType
+{
+	REORDER_BUFFER_CHANGE_INSERT,
+	REORDER_BUFFER_CHANGE_UPDATE,
+	REORDER_BUFFER_CHANGE_DELETE
+};
+
+/* an individual tuple, stored in one chunk of memory */
+typedef struct ReorderBufferTupleBuf
+{
+	/* position in preallocated list */
+	slist_node	node;
+
+	/* tuple, stored sequentially */
+	HeapTupleData tuple;
+	HeapTupleHeaderData header;
+	char		data[MaxHeapTupleSize];
+} ReorderBufferTupleBuf;
+
+/*
+ * a single 'change', can be an insert (with one tuple), an update (old, new),
+ * or a delete (old).
+ *
+ * The same struct is also used internally for other purposes but that should
+ * never be visible outside reorderbuffer.c.
+ */
+typedef struct ReorderBufferChange
+{
+	XLogRecPtr	lsn;
+
+	/* type of change */
+	union
+	{
+		enum ReorderBufferChangeType action;
+		/* do not leak internal enum values to the outside */
+		int			action_internal;
+	};
+
+	/*
+	 * Context data for the change, which part of the union is valid depends
+	 * on action/action_internal.
+	 */
+	union
+	{
+		/* old, new tuples when action == *_INSERT|UPDATE|DELETE */
+		struct
+		{
+			/* relation that has been changed */
+			RelFileNode relnode;
+			/* valid for DELETE || UPDATE */
+			ReorderBufferTupleBuf *oldtuple;
+			/* valid for INSERT || UPDATE */
+			ReorderBufferTupleBuf *newtuple;
+		};
+
+		/* new snapshot */
+		Snapshot	snapshot;
+
+		/* new command id for existing snapshot in a catalog changing tx */
+		CommandId	command_id;
+
+		/* new cid mapping for catalog changing transaction */
+		struct
+		{
+			RelFileNode node;
+			ItemPointerData tid;
+			CommandId	cmin;
+			CommandId	cmax;
+			CommandId	combocid;
+		}			tuplecid;
+	};
+
+	/*
+	 * While in use this is how a change is linked into a transactions,
+	 * otherwise it's the preallocated list.
+	 */
+	dlist_node	node;
+} ReorderBufferChange;
+
+typedef struct ReorderBufferTXN
+{
+	/*
+	 * The transactions transaction id, can be a toplevel or sub xid.
+	 */
+	TransactionId xid;
+
+	/*
+	 * LSN of the first wal record with knowledge about this xid.
+	 */
+	XLogRecPtr	lsn;
+	XLogRecPtr	last_lsn;
+
+	/*
+	 * LSN of the last lsn at which snapshot information reside, so we can
+	 * restart decoding from there and fully recover this transaction from
+	 * WAL.
+	 */
+	XLogRecPtr	restart_decoding_lsn;
+
+	/* did the TX have catalog changes */
+	bool		does_timetravel;
+
+	/*
+	 * Base snapshot or NULL.
+	 */
+	Snapshot	base_snapshot;
+
+	/*
+	 * Do we know this is a subxact?
+	 */
+	bool		is_known_as_subxact;
+
+	/*
+	 * How many ReorderBufferChange's do we have in this txn.
+	 *
+	 * Changes in subtransactions are *not* included but tracked separately.
+	 */
+	Size		nentries;
+
+	/*
+	 * How many of the above entries are stored in memory in contrast to being
+	 * spilled to disk.
+	 */
+	Size		nentries_mem;
+
+	/*
+	 * List of ReorderBufferChange structs, including new Snapshots and new
+	 * CommandIds
+	 */
+	dlist_head	changes;
+
+	/*
+	 * List of (relation, ctid) => (cmin, cmax) mappings for catalog tuples.
+	 * Those are always assigned to the toplevel transaction. (Keep track of
+	 * #entries to create a hash of the right size)
+	 */
+	dlist_head	tuplecids;
+	size_t		ntuplecids;
+
+	/*
+	 * On-demand built hash for looking up the above values.
+	 */
+	HTAB	   *tuplecid_hash;
+
+	/*
+	 * Hash containing (potentially partial) toast entries. NULL if no toast
+	 * tuples have been found for the current change.
+	 */
+	HTAB	   *toast_hash;
+
+	/*
+	 * non-hierarchical list of subtransactions that are *not* aborted. Only
+	 * used in toplevel transactions.
+	 */
+	dlist_head	subtxns;
+	size_t		nsubtxns;
+
+	/*
+	 * Position in one of three lists: * list of subtransactions if we are
+	 * *known* to be subxact * list of toplevel xacts (can be a as-yet unknown
+	 * subxact) * list of preallocated ReorderBufferTXNs
+	 */
+	dlist_node	node;
+
+	/*
+	 * Stored cache invalidations. This is not a linked list because we get
+	 * all the invalidations at once.
+	 */
+	SharedInvalidationMessage *invalidations;
+	size_t		ninvalidations;
+
+} ReorderBufferTXN;
+
+
+/* change callback signature */
+typedef void (*ReorderBufferApplyChangeCB) (
+														ReorderBuffer *cache,
+														ReorderBufferTXN *txn,
+														Relation relation,
+												ReorderBufferChange *change);
+
+/* begin callback signature */
+typedef void (*ReorderBufferBeginCB) (
+												  ReorderBuffer *cache,
+												  ReorderBufferTXN *txn);
+
+/* commit callback signature */
+typedef void (*ReorderBufferCommitCB) (
+												   ReorderBuffer *cache,
+												   ReorderBufferTXN *txn,
+												   XLogRecPtr commit_lsn);
+
+struct ReorderBuffer
+{
+	/*
+	 * xid => ReorderBufferTXN lookup table
+	 */
+	HTAB	   *by_txn;
+
+	/*
+	 * Transactions that could be a toplevel xact, ordered by LSN of the first
+	 * record bearing that xid..
+	 */
+	dlist_head	toplevel_by_lsn;
+
+	/*
+	 * one-entry sized cache for by_txn. Very frequently the same txn gets
+	 * looked up over and over again.
+	 */
+	TransactionId by_txn_last_xid;
+	ReorderBufferTXN *by_txn_last_txn;
+
+	/*
+	 * Callacks to be called when a transactions commits.
+	 */
+	ReorderBufferBeginCB begin;
+	ReorderBufferApplyChangeCB apply_change;
+	ReorderBufferCommitCB commit;
+
+	/*
+	 * Pointer that will be passed untouched to the callbacks.
+	 */
+	void	   *private_data;
+
+	/*
+	 * Private memory context.
+	 */
+	MemoryContext context;
+
+	/*
+	 * Data structure slab cache.
+	 *
+	 * We allocate/deallocate some structures very frequently, to avoid bigger
+	 * overhead we cache some unused ones here.
+	 *
+	 * The maximum number of cached entries is controlled by const variables
+	 * ontop of reorderbuffer.c
+	 */
+
+	/* cached ReorderBufferTXNs */
+	dlist_head	cached_transactions;
+	Size		nr_cached_transactions;
+
+	/* cached ReorderBufferChanges */
+	dlist_head	cached_changes;
+	Size		nr_cached_changes;
+
+	/* cached ReorderBufferTupleBufs */
+	slist_head	cached_tuplebufs;
+	Size		nr_cached_tuplebufs;
+
+	XLogRecPtr	current_restart_decoding_lsn;
+
+	/* buffer for disk<->memory conversions */
+	char	   *outbuf;
+	Size		outbufsize;
+};
+
+
+ReorderBuffer *ReorderBufferAllocate(void);
+void		ReorderBufferFree(ReorderBuffer *);
+
+ReorderBufferTupleBuf *ReorderBufferGetTupleBuf(ReorderBuffer *);
+void		ReorderBufferReturnTupleBuf(ReorderBuffer *, ReorderBufferTupleBuf *tuple);
+ReorderBufferChange *ReorderBufferGetChange(ReorderBuffer *);
+void		ReorderBufferReturnChange(ReorderBuffer *, ReorderBufferChange *);
+
+void		ReorderBufferAddChange(ReorderBuffer *, TransactionId, XLogRecPtr lsn, ReorderBufferChange *);
+void		ReorderBufferCommit(ReorderBuffer *, TransactionId, XLogRecPtr lsn);
+void		ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr lsn);
+void		ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr lsn);
+void		ReorderBufferAbort(ReorderBuffer *, TransactionId, XLogRecPtr lsn);
+
+void		ReorderBufferSetBaseSnapshot(ReorderBuffer *, TransactionId, XLogRecPtr lsn, struct SnapshotData *snap);
+void		ReorderBufferAddSnapshot(ReorderBuffer *, TransactionId, XLogRecPtr lsn, struct SnapshotData *snap);
+void ReorderBufferAddNewCommandId(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+							 CommandId cid);
+void ReorderBufferAddNewTupleCids(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+							 RelFileNode node, ItemPointerData pt,
+						 CommandId cmin, CommandId cmax, CommandId combocid);
+void ReorderBufferAddInvalidations(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+							  Size nmsgs, SharedInvalidationMessage *msgs);
+bool		ReorderBufferIsXidKnown(ReorderBuffer *, TransactionId xid);
+void		ReorderBufferXidSetTimetravel(ReorderBuffer *, TransactionId xid, XLogRecPtr lsn);
+bool		ReorderBufferXidDoesTimetravel(ReorderBuffer *, TransactionId xid);
+bool		ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+
+ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
+
+void		ReorderBufferSetRestartPoint(ReorderBuffer *, XLogRecPtr ptr);
+
+void		ReorderBufferStartup(void);
+
+#endif
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
new file mode 100644
index 0000000..20d1368
--- /dev/null
+++ b/src/include/replication/snapbuild.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * snapbuild.h
+ *	  Exports from replication/logical/snapbuild.c.
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * src/include/replication/snapbuild.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SNAPBUILD_H
+#define SNAPBUILD_H
+
+#include "replication/reorderbuffer.h"
+
+#include "utils/hsearch.h"
+#include "utils/snapshot.h"
+#include "access/htup.h"
+
+typedef enum
+{
+	/*
+	 * Initial state, we can't do much yet.
+	 */
+	SNAPBUILD_START,
+
+	/*
+	 * We have collected enough information to decode tuples in transactions
+	 * that started after this.
+	 *
+	 * Once we reached this we start to collect changes. We cannot apply them
+	 * yet because the might be based on transactions that were still running
+	 * when we reached them yet.
+	 */
+	SNAPBUILD_FULL_SNAPSHOT,
+
+	/*
+	 * Found a point after hitting built_full_snapshot where all transactions
+	 * that were running at that point finished. Till we reach that we hold
+	 * off calling any commit callbacks.
+	 */
+	SNAPBUILD_CONSISTENT
+} SnapBuildState;
+
+typedef enum
+{
+	SNAPBUILD_SKIP,
+	SNAPBUILD_DECODE
+} SnapBuildAction;
+
+/* forward declare so we don't have to expose the struct to the public */
+struct SnapBuild;
+typedef struct SnapBuild SnapBuild;
+
+/* forward declare so we don't have to include xlogreader */
+struct XLogRecordBuffer;
+
+extern SnapBuild *AllocateSnapshotBuilder(ReorderBuffer *cache, TransactionId xmin_horizon, XLogRecPtr start_lsn);
+extern void FreeSnapshotBuilder(SnapBuild *cache);
+
+extern SnapBuildAction SnapBuildProcessRecord(SnapBuild *snapstate, struct XLogRecordBuffer *buf);
+
+extern Relation LookupRelationByRelFileNode(RelFileNode *r);
+
+extern void SnapBuildSnapDecRefcount(Snapshot snap);
+
+extern const char *SnapBuildExportSnapshot(SnapBuild *snapstate);
+extern void SnapBuildClearExportedSnapshot(void);
+
+extern SnapBuildState SnapBuildCurrentState(SnapBuild *snapstate);
+
+extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+
+#endif   /* SNAPBUILD_H */
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 7eaa21b..daae320 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -66,6 +66,7 @@ typedef struct WalSnd
 
 extern WalSnd *MyWalSnd;
 
+
 /* There is one WalSndCtl struct for the whole database cluster */
 typedef struct
 {
@@ -93,7 +94,6 @@ typedef struct
 
 extern WalSndCtlData *WalSndCtl;
 
-
 extern void WalSndSetState(WalSndState state);
 
 /*
@@ -108,4 +108,8 @@ extern void replication_scanner_finish(void);
 
 extern Node *replication_parse_result;
 
+/* logical wal sender data gathering functions */
+extern XLogRecPtr WalSndWaitForWal(XLogRecPtr loc);
+
+
 #endif   /* _WALSENDER_PRIVATE_H */
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index e0eb184..75c56a9 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -116,6 +116,9 @@ typedef ItemPointerData *ItemPointer;
 /*
  * ItemPointerCopy
  *		Copies the contents of one disk item pointer to another.
+ *
+ * Should there ever be padding in an ItemPointer this would need to be handled
+ * differently as it's used as hash key.
  */
 #define ItemPointerCopy(fromPointer, toPointer) \
 ( \
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index d8f7e9d..1a6dee9 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -79,6 +79,7 @@ typedef enum LWLockId
 	SerializablePredicateLockListLock,
 	OldSerXidLock,
 	SyncRepLock,
+	LogicalReplicationCtlLock,
 	/* Individual lock IDs end here */
 	FirstBufMappingLock,
 	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index fe0bad7..5465be5 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -49,7 +49,7 @@ extern RunningTransactions GetRunningTransactionData(void);
 
 extern bool TransactionIdIsInProgress(TransactionId xid);
 extern bool TransactionIdIsActive(TransactionId xid);
-extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked);
+extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum, bool systable, bool alreadyLocked);
 extern TransactionId GetOldestActiveTransactionId(void);
 
 extern VirtualTransactionId *GetVirtualXIDsDelayingChkpt(int *nvxids);
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index 9e833ca..8e1611c 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -136,4 +136,6 @@ extern void ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs
 									 int nmsgs, bool RelcacheInitFileInval,
 									 Oid dbid, Oid tsid);
 
+extern void LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg);
+
 #endif   /* SINVAL_H */
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index feb55f1..4b9d967 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -66,5 +66,5 @@ extern void CallSyscacheCallbacks(int cacheid, uint32 hashvalue);
 
 extern void inval_twophase_postcommit(TransactionId xid, uint16 info,
 						  void *recdata, uint32 len);
-
+extern void InvalidateSystemCaches(void);
 #endif   /* INVAL_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index bd2466e..9cbf8a1 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -104,6 +104,7 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
 	Bitmapset  *rd_keyattr;		/* cols that can be ref'd by foreign keys */
+	Bitmapset  *rd_ckeyattr;	/* cols that are included ref'd by pkey */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	LockInfoData rd_lockInfo;	/* lock mgr's info for locking relation */
 	RuleLock   *rd_rules;		/* rewrite rules */
@@ -220,6 +221,7 @@ typedef struct StdRdOptions
 	int			fillfactor;		/* page fill factor in percent (0..100) */
 	AutoVacOpts autovacuum;		/* autovacuum-related options */
 	bool		security_barrier;		/* for views */
+	bool        treat_as_catalog_table; /* treat as timetraveleable table */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10
@@ -256,6 +258,15 @@ typedef struct StdRdOptions
 	 ((StdRdOptions *) (relation)->rd_options)->security_barrier : false)
 
 /*
+ * RelationIsTreatedAsCatalogTable
+ *		Returns whether the relation should be treated as a catalog table
+ *      from the pov of logical decoding.
+ */
+#define RelationIsTreatedAsCatalogTable(relation)	\
+	((relation)->rd_options ?				\
+	 ((StdRdOptions *) (relation)->rd_options)->treat_as_catalog_table : false)
+
+/*
  * RelationIsValid
  *		True iff relation descriptor is valid.
  */
@@ -407,7 +418,6 @@ typedef struct StdRdOptions
 	((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP && \
 	 !(relation)->rd_islocaltemp)
 
-
 /*
  * RelationIsScannable
  *		Currently can only be false for a materialized view which has not been
@@ -424,6 +434,24 @@ typedef struct StdRdOptions
  */
 #define RelationIsPopulated(relation) ((relation)->rd_rel->relispopulated)
 
+/*
+ * RelationIsDoingTimetravel
+ *		True if we need to log enough information to provide timetravel access
+ */
+#define RelationIsDoingTimetravel(relation) \
+	(wal_level >= WAL_LEVEL_LOGICAL && \
+	 RelationIsDoingTimetravelInternal(relation))
+
+/*
+ * RelationIsLogicallyLogged
+ *		True if we need to log enough information to provide timetravel access
+ */
+#define RelationIsLogicallyLogged(relation) \
+	(wal_level >= WAL_LEVEL_LOGICAL && \
+	 RelationIsLogicallyLoggedInternal(relation))
+
+extern bool RelationIsDoingTimetravelInternal(Relation relation);
+extern bool RelationIsLogicallyLoggedInternal(Relation relation);
 
 /* routines in utils/cache/relcache.c */
 extern void RelationIncrementReferenceCount(Relation rel);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8ac2549..cfeded8 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -41,7 +41,16 @@ extern List *RelationGetIndexList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
 extern List *RelationGetIndexPredicate(Relation relation);
-extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs);
+
+typedef enum IndexAttrBitmapKind {
+	INDEX_ATTR_BITMAP_ALL,
+	INDEX_ATTR_BITMAP_KEY,
+	INDEX_ATTR_BITMAP_CANDIDATE_KEY
+}  IndexAttrBitmapKind;
+
+extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
+											 IndexAttrBitmapKind keyAttrs);
+
 extern void RelationGetExclusionInfo(Relation indexRelation,
 						 Oid **operators,
 						 Oid **procs,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index bfbd8dd..b6a766a 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -23,6 +23,7 @@ extern bool FirstSnapshotSet;
 extern TransactionId TransactionXmin;
 extern TransactionId RecentXmin;
 extern TransactionId RecentGlobalXmin;
+extern TransactionId RecentGlobalDataXmin;
 
 extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
@@ -50,4 +51,6 @@ extern bool XactHasExportedSnapshots(void);
 extern void DeleteAllExportedSnapshotFiles(void);
 extern bool ThereAreNoPriorRegisteredSnapshots(void);
 
+extern char *ExportSnapshot(Snapshot snapshot);
+
 #endif   /* SNAPMGR_H */
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 800e366..f686607 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -39,7 +39,8 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
 
 /* This macro encodes the knowledge of which snapshots are MVCC-safe */
 #define IsMVCCSnapshot(snapshot)  \
-	((snapshot)->satisfies == HeapTupleSatisfiesMVCC)
+	((snapshot)->satisfies == HeapTupleSatisfiesMVCC || \
+	 (snapshot)->satisfies == HeapTupleSatisfiesMVCCDuringDecoding)
 
 /*
  * HeapTupleSatisfiesVisibility
@@ -90,4 +91,34 @@ extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
 					 uint16 infomask, TransactionId xid);
 extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
 
+/*
+ * Special "satisfies" routines used during decoding xlog from a different
+ * point of lsn. Also used for timetravel SnapshotNow's.
+ */
+extern bool HeapTupleSatisfiesMVCCDuringDecoding(HeapTuple htup,
+                                                 Snapshot snapshot, Buffer buffer);
+
+/*
+ * install the 'snapshot_now' snapshot as a timetravelling snapshot replacing
+ * the normal SnapshotNow behaviour. This snapshot needs to have been created
+ * by snapbuild.c otherwise you will see crashes!
+ *
+ * FIXME: We need something resembling the real SnapshotNow to handle things
+ * like enum lookups from indices correctly.
+ */
+extern void SetupDecodingSnapshots(Snapshot snapshot_now, HTAB *tuplecids);
+extern void RevertFromDecodingSnapshots(void);
+extern void SuspendDecodingSnapshots(void);
+extern void UnSuspendDecodingSnapshots(void);
+
+/*
+ * resolve combocids and overwritten cmin values
+ *
+ * To avoid leaking to much knowledge about the reorderbuffer this is
+ * implemented in reorderbuffer.c not tqual.c.
+ */
+extern bool ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data, HeapTuple htup,
+										  Buffer buffer,
+										  CommandId *cmin, CommandId *cmax);
+
 #endif   /* TQUAL_H */
diff --git a/src/test/regress/expected/logical.out b/src/test/regress/expected/logical.out
new file mode 100644
index 0000000..e59f7d9
--- /dev/null
+++ b/src/test/regress/expected/logical.out
@@ -0,0 +1,7 @@
+--CHECKPOINT;
+CREATE EXTENSION test_logical_decoding;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 57ae842..bc02e08 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1678,6 +1678,13 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
                                  |     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,                                                                                                                                               +
                                  |     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock                                                                                                                                          +
                                  |    FROM pg_database d;
+ pg_stat_logical_decoding        |  SELECT l.slot_name,                                                                                                                                                                                           +
+                                 |     l.plugin,                                                                                                                                                                                                  +
+                                 |     l.database,                                                                                                                                                                                                +
+                                 |     l.active,                                                                                                                                                                                                  +
+                                 |     l.xmin,                                                                                                                                                                                                    +
+                                 |     l.restart_decoding_lsn                                                                                                                                                                                     +
+                                 |    FROM pg_stat_get_logical_decoding_slots() l(slot_name, plugin, database, active, xmin, restart_decoding_lsn);
  pg_stat_replication             |  SELECT s.pid,                                                                                                                                                                                                 +
                                  |     s.usesysid,                                                                                                                                                                                                +
                                  |     u.rolname AS usename,                                                                                                                                                                                      +
@@ -2139,7 +2146,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
                                  |    FROM tv;
  tvvmv                           |  SELECT tvvm.grandtot                                                                                                                                                                                          +
                                  |    FROM tvvm;
-(64 rows)
+(65 rows)
 
 SELECT tablename, rulename, definition FROM pg_rules
 	ORDER BY tablename, rulename;
diff --git a/src/test/regress/sql/logical.sql b/src/test/regress/sql/logical.sql
new file mode 100644
index 0000000..0c7fd2b
--- /dev/null
+++ b/src/test/regress/sql/logical.sql
@@ -0,0 +1,3 @@
+--CHECKPOINT;
+CREATE EXTENSION test_logical_decoding;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 452235d..3a6e465 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -621,6 +621,7 @@ Form_pg_ts_template
 Form_pg_type
 Form_pg_user_mapping
 FormatNode
+FreeLogicalReplicationCmd
 FromCharDateMode
 FromExpr
 FuncCall
@@ -791,6 +792,7 @@ IdentifySystemCmd
 IncrementVarSublevelsUp_context
 Index
 IndexArrayKeyInfo
+IndexAttrBitmapKind
 IndexBuildCallback
 IndexBuildResult
 IndexBulkDeleteCallback
@@ -818,6 +820,7 @@ IndxInfo
 InfoItem
 InhInfo
 InhOption
+InitLogicalReplicationCmd
 InheritableSocket
 InlineCodeBlock
 InsertStmt
@@ -937,6 +940,17 @@ LockTupleMode
 LockingClause
 LogOpts
 LogStmtLevel
+LogicalDecodeBeginCB
+LogicalDecodeChangeCB
+LogicalDecodeCleanupCB
+LogicalDecodeCommitCB
+LogicalDecodeInitCB
+LogicalDecodingCheckpointData
+LogicalDecodingContext
+LogicalDecodingCtlData
+LogicalDecodingSlot
+LogicalOutputPluginWriterPrepareWrite
+LogicalOutputPluginWriterWrite
 LogicalTape
 LogicalTapeSet
 MAGIC
@@ -1050,6 +1064,7 @@ OprInfo
 OprProofCacheEntry
 OprProofCacheKey
 OutputContext
+OutputPluginCallbacks
 OverrideSearchPath
 OverrideStackEntry
 PACE_HEADER
@@ -1464,6 +1479,21 @@ Relids
 RelocationBufferInfo
 RenameStmt
 ReopenPtr
+ReorderBuffer
+ReorderBufferApplyChangeCB
+ReorderBufferBeginCB
+ReorderBufferChange
+ReorderBufferChangeTypeInternal
+ReorderBufferCommitCB
+ReorderBufferDiskChange
+ReorderBufferIterTXNEntry
+ReorderBufferIterTXNState
+ReorderBufferToastEnt
+ReorderBufferTupleBuf
+ReorderBufferTupleCidEnt
+ReorderBufferTupleCidKey
+ReorderBufferTXN
+ReorderBufferTXNByIdEnt
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
 ResTarget
@@ -1518,6 +1548,8 @@ SID_NAME_USE
 SISeg
 SMgrRelation
 SMgrRelationData
+SnapBuildAction
+SnapBuildState
 SOCKADDR
 SOCKET
 SPELL
@@ -1609,6 +1641,8 @@ SlruSharedData
 Snapshot
 SnapshotData
 SnapshotSatisfiesFunc
+Snapstate
+SnapstateOnDisk
 SockAddr
 Sort
 SortBy
@@ -1651,6 +1685,7 @@ StandardChunkHeader
 StartBlobPtr
 StartBlobsPtr
 StartDataPtr
+StartLogicalReplicationCmd
 StartReplicationCmd
 StartupPacket
 StatEntry
@@ -1874,6 +1909,7 @@ WalRcvData
 WalRcvState
 WalSnd
 WalSndCtlData
+WalSndSendData
 WalSndState
 WholeRowVarExprState
 WindowAgg
@@ -1925,6 +1961,7 @@ XLogReaderState
 XLogRecData
 XLogRecPtr
 XLogRecord
+XLogRecordBuffer
 XLogSegNo
 XLogSource
 XLogwrtResult
@@ -2348,6 +2385,7 @@ symbol
 tablespaceinfo
 teReqs
 teSection
+TestDecodingData
 temp_tablespaces_extra
 text
 timeKEY
@@ -2420,11 +2458,13 @@ xl_heap_cleanup_info
 xl_heap_delete
 xl_heap_freeze
 xl_heap_header
+xl_heap_header_len
 xl_heap_inplace
 xl_heap_insert
 xl_heap_lock
 xl_heap_lock_updated
 xl_heap_multi_insert
+xl_heap_new_cid
 xl_heap_newpage
 xl_heap_update
 xl_heap_visible
-- 
1.8.2.rc2.4.g7799588.dirty

0014-wal_decoding-test_decoding-Add-a-simple-decoding-mod.patchtext/x-patch; charset=us-asciiDownload
>From c4b278fc30f34863cdddef5b4fe7fa0b37c50e76 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 14/17] wal_decoding: test_decoding: Add a simple decoding
 module in contrib

This is mostly useful for testing, demonstration and documentation purposes.
---
 contrib/Makefile                      |   1 +
 contrib/test_decoding/Makefile        |  16 ++
 contrib/test_decoding/test_decoding.c | 325 ++++++++++++++++++++++++++++++++++
 3 files changed, 342 insertions(+)
 create mode 100644 contrib/test_decoding/Makefile
 create mode 100644 contrib/test_decoding/test_decoding.c

diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..6d2fe32 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -50,6 +50,7 @@ SUBDIRS = \
 		tablefunc	\
 		tcn		\
 		test_parser	\
+		test_decoding	\
 		tsearch2	\
 		unaccent	\
 		vacuumlo	\
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
new file mode 100644
index 0000000..2ac9653
--- /dev/null
+++ b/contrib/test_decoding/Makefile
@@ -0,0 +1,16 @@
+# contrib/test_decoding/Makefile
+
+MODULE_big = test_decoding
+OBJS = test_decoding.o
+
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_decoding
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
new file mode 100644
index 0000000..fc846bc
--- /dev/null
+++ b/contrib/test_decoding/test_decoding.c
@@ -0,0 +1,325 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_decoding.c
+ *		  example output plugin for the logical replication functionality
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  contrib/test_decoding/test_decoding.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/sysattr.h"
+
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "catalog/index.h"
+
+#include "nodes/parsenodes.h"
+
+#include "replication/output_plugin.h"
+#include "replication/logical.h"
+
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relcache.h"
+#include "utils/syscache.h"
+#include "utils/typcache.h"
+
+
+PG_MODULE_MAGIC;
+
+void		_PG_init(void);
+
+typedef struct
+{
+	MemoryContext context;
+	bool		include_xids;
+} TestDecodingData;
+
+/* These must be available to pg_dlsym() */
+extern void pg_decode_init(LogicalDecodingContext *ctx, bool is_init);
+extern bool pg_decode_begin_txn(LogicalDecodingContext *ctx,
+					ReorderBufferTXN *txn);
+extern bool pg_decode_commit_txn(LogicalDecodingContext *ctx,
+					 ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+extern bool pg_decode_change(LogicalDecodingContext *ctx,
+				 ReorderBufferTXN *txn, Relation rel,
+				 ReorderBufferChange *change);
+
+void
+_PG_init(void)
+{
+}
+
+/* initialize this plugin */
+void
+pg_decode_init(LogicalDecodingContext *ctx, bool is_init)
+{
+	ListCell   *option;
+	TestDecodingData *data;
+
+	AssertVariableIsOfType(&pg_decode_init, LogicalDecodeInitCB);
+
+	data = palloc(sizeof(TestDecodingData));
+	data->context = AllocSetContextCreate(TopMemoryContext,
+										  "text conversion context",
+										  ALLOCSET_DEFAULT_MINSIZE,
+										  ALLOCSET_DEFAULT_INITSIZE,
+										  ALLOCSET_DEFAULT_MAXSIZE);
+	data->include_xids = true;
+
+	ctx->output_plugin_private = data;
+
+	foreach(option, ctx->output_plugin_options)
+	{
+		DefElem    *elem = lfirst(option);
+
+		Assert(elem->arg == NULL || IsA(elem->arg, String));
+
+		if (strcmp(elem->defname, "hide-xids") == 0)
+		{
+			/* FIXME: parse argument */
+			data->include_xids = false;
+		}
+		else
+		{
+			elog(WARNING, "option %s = %s is unknown",
+				 elem->defname, elem->arg ? strVal(elem->arg) : "(null)");
+		}
+	}
+}
+
+/* BEGIN callback */
+bool
+pg_decode_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+	TestDecodingData *data = ctx->output_plugin_private;
+
+	AssertVariableIsOfType(&pg_decode_begin_txn, LogicalDecodeBeginCB);
+
+	ctx->prepare_write(ctx, txn->lsn, txn->xid);
+	if (data->include_xids)
+		appendStringInfo(ctx->out, "BEGIN %u", txn->xid);
+	else
+		appendStringInfoString(ctx->out, "BEGIN");
+	ctx->write(ctx, txn->lsn, txn->xid);
+
+	return true;
+}
+
+/* COMMIT callback */
+bool
+pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+					 XLogRecPtr commit_lsn)
+{
+	TestDecodingData *data = ctx->output_plugin_private;
+
+	AssertVariableIsOfType(&pg_decode_commit_txn, LogicalDecodeCommitCB);
+
+	ctx->prepare_write(ctx, txn->lsn, txn->xid);
+	if (data->include_xids)
+		appendStringInfo(ctx->out, "COMMIT %u", txn->xid);
+	else
+		appendStringInfoString(ctx->out, "COMMIT");
+	ctx->write(ctx, txn->lsn, txn->xid);
+
+	return true;
+}
+
+/* print the tuple 'tuple' into the StringInfo s */
+static void
+tuple_to_stringinfo(StringInfo s, TupleDesc tupdesc, HeapTuple tuple)
+{
+	int			natt;
+	Oid			oid;
+
+	/* print oid of tuple, it's not included in the TupleDesc */
+	if ((oid = HeapTupleHeaderGetOid(tuple->t_data)) != InvalidOid)
+	{
+		appendStringInfo(s, " oid[oid]:%u", oid);
+	}
+
+	/* print all columns individually */
+	for (natt = 0; natt < tupdesc->natts; natt++)
+	{
+		Form_pg_attribute attr; /* the attribute itself */
+		Oid			typid;		/* type of current attribute */
+		HeapTuple	type_tuple; /* information about a type */
+		Form_pg_type type_form;
+		Oid			typoutput;	/* output function */
+		bool		typisvarlena;
+		Datum		origval;	/* possibly toasted Datum */
+		Datum		val;		/* definitely detoasted Datum */
+		char	   *outputstr = NULL;
+		bool		isnull;		/* column is null? */
+
+		attr = tupdesc->attrs[natt];
+
+		/*
+		 * don't print dropped columns, we can't be sure everything is
+		 * available for them
+		 */
+		if (attr->attisdropped)
+			continue;
+
+		/*
+		 * Don't print system columns
+		 */
+		if (attr->attnum < 0)
+			continue;
+
+		typid = attr->atttypid;
+
+		/* gather type name */
+		type_tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
+		if (!HeapTupleIsValid(type_tuple))
+			elog(ERROR, "cache lookup failed for type %u", typid);
+		type_form = (Form_pg_type) GETSTRUCT(type_tuple);
+
+		/* print attribute name */
+		appendStringInfoChar(s, ' ');
+		appendStringInfoString(s, NameStr(attr->attname));
+
+		/* print attribute type */
+		appendStringInfoChar(s, '[');
+		appendStringInfoString(s, NameStr(type_form->typname));
+		appendStringInfoChar(s, ']');
+
+		/* query output function */
+		getTypeOutputInfo(typid,
+						  &typoutput, &typisvarlena);
+
+		ReleaseSysCache(type_tuple);
+
+		/* get Datum from tuple */
+		origval = fastgetattr(tuple, natt + 1, tupdesc, &isnull);
+
+		if (isnull)
+			outputstr = "(null)";
+		else if (typisvarlena && VARATT_IS_EXTERNAL_ONDISK(origval))
+			outputstr = "(unchanged-toast-datum)";
+		else if (typisvarlena)
+			val = PointerGetDatum(PG_DETOAST_DATUM(origval));
+		else
+			val = origval;
+
+		/* print data */
+		if (outputstr == NULL)
+			outputstr = OidOutputFunctionCall(typoutput, val);
+
+		appendStringInfoChar(s, ':');
+		appendStringInfoString(s, outputstr);
+	}
+}
+
+/*
+ * callback for individual changed tuples
+ */
+bool
+pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				 Relation relation, ReorderBufferChange *change)
+{
+	TestDecodingData *data;
+	Form_pg_class class_form;
+	TupleDesc	tupdesc;
+	MemoryContext old;
+
+	AssertVariableIsOfType(&pg_decode_change, LogicalDecodeChangeCB);
+
+	data = ctx->output_plugin_private;
+	class_form = RelationGetForm(relation);
+	tupdesc = RelationGetDescr(relation);
+
+	/* Avoid leaking memory by using and resetting our own context */
+	old = MemoryContextSwitchTo(data->context);
+
+	ctx->prepare_write(ctx, change->lsn, txn->xid);
+
+	appendStringInfoString(ctx->out, "table \"");
+	appendStringInfoString(ctx->out, NameStr(class_form->relname));
+	appendStringInfoString(ctx->out, "\":");
+
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			appendStringInfoString(ctx->out, " INSERT:");
+			if (change->newtuple == NULL)
+				appendStringInfoString(ctx->out, " (no-tuple-data)");
+			else
+				tuple_to_stringinfo(ctx->out, tupdesc, &change->newtuple->tuple);
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			appendStringInfoString(ctx->out, " UPDATE:");
+			if (change->oldtuple != NULL)
+			{
+				Relation	indexrel;
+				TupleDesc	indexdesc;
+
+				appendStringInfoString(ctx->out, " old-pkey:");
+				RelationGetIndexList(relation);
+
+				if (!OidIsValid(relation->rd_primary))
+				{
+					elog(LOG, "tuple in table with oid: %u without primary key",
+						 RelationGetRelid(relation));
+					break;
+				}
+
+				indexrel = RelationIdGetRelation(relation->rd_primary);
+
+				indexdesc = RelationGetDescr(indexrel);
+
+				tuple_to_stringinfo(ctx->out, indexdesc, &change->oldtuple->tuple);
+
+				RelationClose(indexrel);
+				appendStringInfoString(ctx->out, " new-tuple:");
+			}
+
+			if (change->newtuple == NULL)
+				appendStringInfoString(ctx->out, " (no-tuple-data)");
+			else
+				tuple_to_stringinfo(ctx->out, tupdesc, &change->newtuple->tuple);
+
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			appendStringInfoString(ctx->out, " DELETE:");
+
+			/* if there was no PK, we only know that a delete happened */
+			if (change->oldtuple == NULL)
+				appendStringInfoString(ctx->out, " (no-tuple-data)");
+			/* In DELETE, only the PK is present; display that */
+			else
+			{
+				Relation	indexrel;
+
+				/* make sure rd_primary is set */
+				RelationGetIndexList(relation);
+
+				if (!OidIsValid(relation->rd_primary))
+				{
+					elog(LOG, "tuple in table with oid: %u without primary key",
+						 RelationGetRelid(relation));
+					break;
+				}
+
+				indexrel = RelationIdGetRelation(relation->rd_primary);
+
+				tuple_to_stringinfo(ctx->out, RelationGetDescr(indexrel),
+									&change->oldtuple->tuple);
+
+				RelationClose(indexrel);
+			}
+			break;
+	}
+
+	MemoryContextSwitchTo(old);
+	MemoryContextReset(data->context);
+
+	ctx->write(ctx, change->lsn, txn->xid);
+	return true;
+}
-- 
1.8.2.rc2.4.g7799588.dirty

0015-wal_decoding-pg_receivellog-Introduce-pg_receivexlog.patchtext/x-patch; charset=us-asciiDownload
>From a16f2b824b3fb8de9662d83d5610aa8b2b32f261 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 15/17] wal_decoding: pg_receivellog: Introduce pg_receivexlog
 equivalent for logical changes

---
 src/bin/pg_basebackup/.gitignore       |   1 +
 src/bin/pg_basebackup/Makefile         |   8 +-
 src/bin/pg_basebackup/pg_receivellog.c | 870 +++++++++++++++++++++++++++++++++
 src/bin/pg_basebackup/streamutil.c     |   3 +-
 src/bin/pg_basebackup/streamutil.h     |   1 +
 5 files changed, 880 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_basebackup/pg_receivellog.c

diff --git a/src/bin/pg_basebackup/.gitignore b/src/bin/pg_basebackup/.gitignore
index 1334a1f..eb2978c 100644
--- a/src/bin/pg_basebackup/.gitignore
+++ b/src/bin/pg_basebackup/.gitignore
@@ -1,2 +1,3 @@
 /pg_basebackup
 /pg_receivexlog
+/pg_receivellog
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a707c93..a41b73c 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -20,7 +20,7 @@ override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 
 OBJS=receivelog.o streamutil.o $(WIN32RES)
 
-all: pg_basebackup pg_receivexlog
+all: pg_basebackup pg_receivexlog pg_receivellog
 
 pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport
 	$(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -28,9 +28,13 @@ pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport
 pg_receivexlog: pg_receivexlog.o $(OBJS) | submake-libpq submake-libpgport
 	$(CC) $(CFLAGS) pg_receivexlog.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
+pg_receivellog: pg_receivellog.o $(OBJS) | submake-libpq submake-libpgport
+	$(CC) $(CFLAGS) pg_receivellog.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
 install: all installdirs
 	$(INSTALL_PROGRAM) pg_basebackup$(X) '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
 	$(INSTALL_PROGRAM) pg_receivexlog$(X) '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
+	$(INSTALL_PROGRAM) pg_receivellog$(X) '$(DESTDIR)$(bindir)/pg_receivellog$(X)'
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
@@ -40,4 +44,4 @@ uninstall:
 	rm -f '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
 
 clean distclean maintainer-clean:
-	rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o
+	rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o pg_receivellog.o
diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c
new file mode 100644
index 0000000..e98452d
--- /dev/null
+++ b/src/bin/pg_basebackup/pg_receivellog.c
@@ -0,0 +1,870 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_receivellog.c - receive streaming logical log data and write it
+ *					  to a local file.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/bin/pg_basebackup/pg_receivellog.c
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * We have to use postgres.h not postgres_fe.h here, because there's so much
+ * backend-only stuff in the XLOG include files we need.  But we need a
+ * frontend-ish environment otherwise.	Hence this ugly hack.
+ */
+#define FRONTEND 1
+#include "postgres.h"
+
+#include "common/fe_memutils.h"
+#include "libpq-fe.h"
+#include "libpq/pqsignal.h"
+#include "access/xlog_internal.h"
+#include "utils/datetime.h"
+#include "utils/timestamp.h"
+
+#include "receivelog.h"
+#include "streamutil.h"
+
+#include <dirent.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "getopt_long.h"
+
+/* Time to sleep between reconnection attempts */
+#define RECONNECT_SLEEP_TIME 5
+
+/* Global options */
+static char *outfile = NULL;
+static int	outfd = -1;
+static int	verbose = 0;
+static int	noloop = 0;
+static int	standby_message_timeout = 10 * 1000;		/* 10 sec = default */
+static volatile bool time_to_abort = false;
+static const char *plugin = "test_decoding";
+static const char *slot = NULL;
+static XLogRecPtr startpos;
+static bool do_init_slot = false;
+static bool do_start_slot = false;
+static bool do_stop_slot = false;
+
+
+static void usage(void);
+static void StreamLog();
+
+static void
+usage(void)
+{
+	printf(_("%s receives PostgreSQL logical change stream.\n\n"),
+		   progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]...\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -f, --file=FILE        receive log into this file. - for stdout\n"));
+	printf(_("  -n, --no-loop          do not loop on connection lost\n"));
+	printf(_("  -v, --verbose          output verbose messages\n"));
+	printf(_("  -V, --version          output version information, then exit\n"));
+	printf(_("  -?, --help             show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -d, --database=DBNAME  database to connect to\n"));
+	printf(_("  -h, --host=HOSTNAME    database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT        database server port number\n"));
+	printf(_("  -U, --username=NAME    connect as specified database user\n"));
+	printf(_("  -w, --no-password      never prompt for password\n"));
+	printf(_("  -W, --password         force password prompt (should happen automatically)\n"));
+	printf(_("\nReplication options:\n"));
+	printf(_("  -P, --plugin=PLUGIN    use output plugin PLUGIN (defaults to test_decoding)\n"));
+	printf(_("  -s, --status-interval=INTERVAL\n"
+			 "                         time between status packets sent to server (in seconds)\n"));
+	printf(_("  -S, --slot=SLOT        use existing replication slot SLOT instead of starting a new one\n"));
+	printf(_("\nAction to be performed:\n"));
+	printf(_("      --init             initiate a new replication slot (for the slotname see --slot)\n"));
+	printf(_("      --start            start streaming in a replication slot (for the slotname see --slot)\n"));
+	printf(_("      --stop             stop the replication slot (for the slotname see --slot)\n"));
+	printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+
+/*
+ * Local version of GetCurrentTimestamp(), since we are not linked with
+ * backend code. The protocol always uses integer timestamps, regardless of
+ * server setting.
+ */
+static int64
+localGetCurrentTimestamp(void)
+{
+	int64		result;
+	struct timeval tp;
+
+	gettimeofday(&tp, NULL);
+
+	result = (int64) tp.tv_sec -
+		((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY);
+
+	result = (result * USECS_PER_SEC) + tp.tv_usec;
+
+	return result;
+}
+
+/*
+ * Local version of TimestampDifference(), since we are not linked with
+ * backend code.
+ */
+static void
+localTimestampDifference(int64 start_time, int64 stop_time,
+						 long *secs, int *microsecs)
+{
+	int64		diff = stop_time - start_time;
+
+	if (diff <= 0)
+	{
+		*secs = 0;
+		*microsecs = 0;
+	}
+	else
+	{
+		*secs = (long) (diff / USECS_PER_SEC);
+		*microsecs = (int) (diff % USECS_PER_SEC);
+	}
+}
+
+/*
+ * Local version of TimestampDifferenceExceeds(), since we are not
+ * linked with backend code.
+ */
+static bool
+localTimestampDifferenceExceeds(int64 start_time,
+								int64 stop_time,
+								int msec)
+{
+	int64		diff = stop_time - start_time;
+
+	return (diff >= msec * INT64CONST(1000));
+}
+
+/*
+ * Converts an int64 to network byte order.
+ */
+static void
+sendint64(int64 i, char *buf)
+{
+	uint32		n32;
+
+	/* High order half first, since we're doing MSB-first */
+	n32 = (uint32) (i >> 32);
+	n32 = htonl(n32);
+	memcpy(&buf[0], &n32, 4);
+
+	/* Now the low order half */
+	n32 = (uint32) i;
+	n32 = htonl(n32);
+	memcpy(&buf[4], &n32, 4);
+}
+
+/*
+ * Converts an int64 from network byte order to native format.
+ */
+static int64
+recvint64(char *buf)
+{
+	int64		result;
+	uint32		h32;
+	uint32		l32;
+
+	memcpy(&h32, buf, 4);
+	memcpy(&l32, buf + 4, 4);
+	h32 = ntohl(h32);
+	l32 = ntohl(l32);
+
+	result = h32;
+	result <<= 32;
+	result |= l32;
+
+	return result;
+}
+
+/*
+ * Send a Standby Status Update message to server.
+ */
+static bool
+sendFeedback(PGconn *conn, XLogRecPtr blockpos, int64 now, bool replyRequested)
+{
+	char		replybuf[1 + 8 + 8 + 8 + 8 + 1];
+	int			len = 0;
+
+	if (blockpos == startpos)
+		return true;
+
+	if (verbose)
+		fprintf(stderr,
+				_("%s: confirming flush up to %X/%X (slot %s)\n"),
+				progname, (uint32) (blockpos >> 32), (uint32) blockpos,
+				slot);
+
+	replybuf[len] = 'r';
+	len += 1;
+	sendint64(blockpos, &replybuf[len]);		/* write */
+	len += 8;
+	sendint64(blockpos, &replybuf[len]);		/* flush */
+	len += 8;
+	sendint64(InvalidXLogRecPtr, &replybuf[len]);		/* apply */
+	len += 8;
+	sendint64(now, &replybuf[len]);		/* sendTime */
+	len += 8;
+	replybuf[len] = replyRequested ? 1 : 0;		/* replyRequested */
+	len += 1;
+
+	startpos = blockpos;
+
+	if (PQputCopyData(conn, replybuf, len) <= 0 || PQflush(conn))
+	{
+		fprintf(stderr, _("%s: could not send feedback packet: %s"),
+				progname, PQerrorMessage(conn));
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * Start the log streaming
+ */
+static void
+StreamLog(void)
+{
+	PGresult   *res;
+	char		query[256];
+	char	   *copybuf = NULL;
+	int64		last_status = -1;
+	XLogRecPtr	logoff = InvalidXLogRecPtr;
+
+	/*
+	 * Connect in replication mode to the server
+	 */
+	if (!conn)
+		conn = GetConnection();
+	if (!conn)
+		/* Error message already written in GetConnection() */
+		return;
+
+	/*
+	 * Start the replication
+	 */
+	if (verbose)
+		fprintf(stderr,
+				_("%s: starting log streaming at %X/%X (slot %s)\n"),
+				progname, (uint32) (startpos >> 32), (uint32) startpos,
+				slot);
+
+	/* Initiate the replication stream at specified location */
+	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION \"%s\" %X/%X",
+			 slot, (uint32) (startpos >> 32), (uint32) startpos);
+	res = PQexec(conn, query);
+	if (PQresultStatus(res) != PGRES_COPY_BOTH)
+	{
+		fprintf(stderr, _("%s: could not send replication command \"%s\": %s\n"),
+				progname, query, PQresultErrorMessage(res));
+		PQclear(res);
+		goto error;
+	}
+	PQclear(res);
+
+	if (verbose)
+		fprintf(stderr,
+				_("%s: initiated streaming\n"),
+				progname);
+
+	while (!time_to_abort)
+	{
+		int			r;
+		int			bytes_left;
+		int			bytes_written;
+		int64		now;
+		int			hdr_len;
+
+		if (copybuf != NULL)
+		{
+			PQfreemem(copybuf);
+			copybuf = NULL;
+		}
+
+		/*
+		 * Potentially send a status message to the master
+		 */
+		now = localGetCurrentTimestamp();
+		if (standby_message_timeout > 0 &&
+			localTimestampDifferenceExceeds(last_status, now,
+											standby_message_timeout))
+		{
+			/* Time to send feedback! */
+			if (!sendFeedback(conn, logoff, now, false))
+				goto error;
+
+			last_status = now;
+		}
+
+		r = PQgetCopyData(conn, &copybuf, 1);
+		if (r == 0)
+		{
+			/*
+			 * In async mode, and no data available. We block on reading but
+			 * not more than the specified timeout, so that we can send a
+			 * response back to the client.
+			 */
+			fd_set		input_mask;
+			struct timeval timeout;
+			struct timeval *timeoutptr;
+
+			FD_ZERO(&input_mask);
+			FD_SET(PQsocket(conn), &input_mask);
+			if (standby_message_timeout)
+			{
+				int64		targettime;
+				long		secs;
+				int			usecs;
+
+				targettime = last_status + (standby_message_timeout - 1) *
+					((int64) 1000);
+				localTimestampDifference(now,
+										 targettime,
+										 &secs,
+										 &usecs);
+				if (secs <= 0)
+					timeout.tv_sec = 1; /* Always sleep at least 1 sec */
+				else
+					timeout.tv_sec = secs;
+				timeout.tv_usec = usecs;
+				timeoutptr = &timeout;
+			}
+			else
+				timeoutptr = NULL;
+
+			r = select(PQsocket(conn) + 1, &input_mask, NULL, NULL, timeoutptr);
+			if (r == 0 || (r < 0 && errno == EINTR))
+			{
+				/*
+				 * Got a timeout or signal. Continue the loop and either
+				 * deliver a status packet to the server or just go back into
+				 * blocking.
+				 */
+				continue;
+			}
+			else if (r < 0)
+			{
+				fprintf(stderr, _("%s: select() failed: %s\n"),
+						progname, strerror(errno));
+				goto error;
+			}
+			/* Else there is actually data on the socket */
+			if (PQconsumeInput(conn) == 0)
+			{
+				fprintf(stderr,
+						_("%s: could not receive data from WAL stream: %s"),
+						progname, PQerrorMessage(conn));
+				goto error;
+			}
+			continue;
+		}
+		if (r == -1)
+			/* End of copy stream */
+			break;
+		if (r == -2)
+		{
+			fprintf(stderr, _("%s: could not read COPY data: %s"),
+					progname, PQerrorMessage(conn));
+			goto error;
+		}
+
+		/* Check the message type. */
+		if (copybuf[0] == 'k')
+		{
+			int			pos;
+			bool		replyRequested;
+
+			/*
+			 * Parse the keepalive message, enclosed in the CopyData message.
+			 * We just check if the server requested a reply, and ignore the
+			 * rest.
+			 */
+			pos = 1;			/* skip msgtype 'k' */
+			pos += 8;			/* skip walEnd */
+			pos += 8;			/* skip sendTime */
+
+			if (r < pos + 1)
+			{
+				fprintf(stderr, _("%s: streaming header too small: %d\n"),
+						progname, r);
+				goto error;
+			}
+			replyRequested = copybuf[pos];
+
+			/* If the server requested an immediate reply, send one. */
+			if (replyRequested)
+			{
+				now = localGetCurrentTimestamp();
+				if (!sendFeedback(conn, logoff, now, false))
+					goto error;
+				last_status = now;
+			}
+			continue;
+		}
+		else if (copybuf[0] != 'w')
+		{
+			fprintf(stderr, _("%s: unrecognized streaming header: \"%c\"\n"),
+					progname, copybuf[0]);
+			goto error;
+		}
+
+
+		/*
+		 * Read the header of the XLogData message, enclosed in the CopyData
+		 * message. We only need the WAL location field (dataStart), the rest
+		 * of the header is ignored.
+		 */
+		hdr_len = 1;			/* msgtype 'w' */
+		hdr_len += 8;			/* dataStart */
+		hdr_len += 8;			/* walEnd */
+		hdr_len += 8;			/* sendTime */
+		if (r < hdr_len + 1)
+		{
+			fprintf(stderr, _("%s: streaming header too small: %d\n"),
+					progname, r);
+			goto error;
+		}
+
+		/* Extract WAL location for this block */
+		{
+			XLogRecPtr	temp = recvint64(&copybuf[1]);
+
+			logoff = Max(temp, logoff);
+		}
+
+		if (outfd == -1 && strcmp(outfile, "-") == 0)
+		{
+			outfd = 1;
+		}
+		else if (outfd == -1)
+		{
+			outfd = open(outfile, O_CREAT | O_APPEND | O_WRONLY | PG_BINARY,
+						 S_IRUSR | S_IWUSR);
+			if (outfd == -1)
+			{
+				fprintf(stderr,
+						_("%s: could not open log file \"%s\": %s\n"),
+						progname, outfile, strerror(errno));
+				goto error;
+			}
+		}
+
+		bytes_left = r - hdr_len;
+		bytes_written = 0;
+
+
+		while (bytes_left)
+		{
+			int			ret;
+
+			ret = write(outfd,
+						copybuf + hdr_len + bytes_written,
+						bytes_left);
+
+			if (ret < 0)
+			{
+				fprintf(stderr,
+				  _("%s: could not write %u bytes to log file \"%s\": %s\n"),
+						progname, bytes_left, outfile,
+						strerror(errno));
+				goto error;
+			}
+
+			/* Write was successful, advance our position */
+			bytes_written += ret;
+			bytes_left -= ret;
+		}
+
+		if (write(outfd, "\n", 1) != 1)
+		{
+			fprintf(stderr,
+				  _("%s: could not write %u bytes to log file \"%s\": %s\n"),
+					progname, 1, outfile,
+					strerror(errno));
+			goto error;
+		}
+	}
+
+	res = PQgetResult(conn);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		fprintf(stderr,
+				_("%s: unexpected termination of replication stream: %s"),
+				progname, PQresultErrorMessage(res));
+		goto error;
+	}
+	PQclear(res);
+
+	if (copybuf != NULL)
+		PQfreemem(copybuf);
+
+	if (outfd != -1 && close(outfd) != 0)
+		fprintf(stderr, _("%s: could not close file \"%s\": %s\n"),
+				progname, outfile, strerror(errno));
+	outfd = -1;
+error:
+	PQfinish(conn);
+	conn = NULL;
+}
+
+/*
+ * When sigint is called, just tell the system to exit at the next possible
+ * moment.
+ */
+#ifndef WIN32
+
+static void
+sigint_handler(int signum)
+{
+	time_to_abort = true;
+}
+#endif
+
+int
+main(int argc, char **argv)
+{
+	PGresult   *res;
+	static struct option long_options[] = {
+/* general options */
+		{"file", required_argument, NULL, 'f'},
+		{"no-loop", no_argument, NULL, 'n'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"version", no_argument, NULL, 'V'},
+		{"help", no_argument, NULL, '?'},
+/* connnection options */
+		{"database", required_argument, NULL, 'd'},
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+/* replication options */
+		{"plugin", required_argument, NULL, 'P'},
+		{"status-interval", required_argument, NULL, 's'},
+		{"slot", required_argument, NULL, 'S'},
+		{"startpos", required_argument, NULL, 'I'},
+/* action */
+		{"init", no_argument, NULL, 1},
+		{"start", no_argument, NULL, 2},
+		{"stop", no_argument, NULL, 3},
+		{NULL, 0, NULL, 0}
+	};
+	int			c;
+	int			option_index;
+	uint32		hi,
+				lo;
+
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_receivellog"));
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		else if (strcmp(argv[1], "-V") == 0 ||
+				 strcmp(argv[1], "--version") == 0)
+		{
+			puts("pg_receivellog (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:",
+							long_options, &option_index)) != -1)
+	{
+		switch (c)
+		{
+/* general options */
+			case 'f':
+				outfile = pg_strdup(optarg);
+				break;
+			case 'n':
+				noloop = 1;
+				break;
+			case 'v':
+				verbose++;
+				break;
+/* connnection options */
+			case 'd':
+				dbname = pg_strdup(optarg);
+				break;
+			case 'h':
+				dbhost = pg_strdup(optarg);
+				break;
+			case 'p':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid port number \"%s\"\n"),
+							progname, optarg);
+					exit(1);
+				}
+				dbport = pg_strdup(optarg);
+				break;
+			case 'U':
+				dbuser = pg_strdup(optarg);
+				break;
+			case 'w':
+				dbgetpassword = -1;
+				break;
+			case 'W':
+				dbgetpassword = 1;
+				break;
+/* replication options */
+			case 'P':
+				plugin = pg_strdup(optarg);
+				break;
+			case 's':
+				standby_message_timeout = atoi(optarg) * 1000;
+				if (standby_message_timeout < 0)
+				{
+					fprintf(stderr, _("%s: invalid status interval \"%s\"\n"),
+							progname, optarg);
+					exit(1);
+				}
+				break;
+			case 'S':
+				slot = pg_strdup(optarg);
+				break;
+			case 'I':
+				if (sscanf(optarg, "%X/%X", &hi, &lo) != 2)
+				{
+					fprintf(stderr,
+							_("%s: could not parse start position \"%s\"\n"),
+							progname, optarg);
+					exit(1);
+				}
+				startpos = ((uint64) hi) << 32 | lo;
+				break;
+			case 1:
+				do_init_slot = true;
+				break;
+			case 2:
+				do_start_slot = true;
+				break;
+			case 3:
+				do_stop_slot = true;
+				break;
+/* action */
+
+			default:
+
+				/*
+				 * getopt_long already emitted a complaint
+				 */
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+						progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Any non-option arguments?
+	 */
+	if (optind < argc)
+	{
+		fprintf(stderr,
+				_("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/*
+	 * Required arguments
+	 */
+	if (slot == NULL)
+	{
+		fprintf(stderr, _("%s: no slot specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (!do_stop_slot && outfile == NULL)
+	{
+		fprintf(stderr, _("%s: no target file specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (!do_stop_slot && dbname == NULL)
+	{
+		fprintf(stderr, _("%s: no database specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (!do_stop_slot && !do_init_slot && !do_start_slot)
+	{
+		fprintf(stderr, _("%s: at least one action needs to be specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+
+	}
+
+	if (do_stop_slot && (do_init_slot || do_start_slot))
+	{
+		fprintf(stderr, _("%s: --stop cannot be combined with --init or --start\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+
+	}
+#ifndef WIN32
+	pqsignal(SIGINT, sigint_handler);
+#endif
+
+
+	/*
+	 * don't really need this but it actually helps to get more precise error
+	 * messages about authentication and such.
+	 */
+	{
+		conn = GetConnection();
+		if (!conn)
+			/* Error message already written in GetConnection() */
+			exit(1);
+
+		/*
+		 * Run IDENTIFY_SYSTEM so we can get the timeline and current xlog
+		 * position.
+		 */
+		res = PQexec(conn, "IDENTIFY_SYSTEM");
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+		{
+			fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+					progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
+			disconnect_and_exit(1);
+		}
+
+		if (PQntuples(res) != 1 || PQnfields(res) != 4)
+		{
+			fprintf(stderr,
+					_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
+					progname, PQntuples(res), PQnfields(res), 1, 4);
+			disconnect_and_exit(1);
+		}
+		PQclear(res);
+	}
+
+
+	/*
+	 * stop a replication slot
+	 */
+	if (do_stop_slot)
+	{
+		char		query[256];
+
+		snprintf(query, sizeof(query), "FREE_LOGICAL_REPLICATION \"%s\"",
+				 slot);
+		res = PQexec(conn, query);
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		{
+			fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+					progname, query, PQerrorMessage(conn));
+			disconnect_and_exit(1);
+		}
+
+		if (PQntuples(res) != 0 || PQnfields(res) != 0)
+		{
+			fprintf(stderr,
+					_("%s: could not stop logical rep: got %d rows and %d fields, expected %d rows and %d fields\n"),
+					progname, PQntuples(res), PQnfields(res), 0, 0);
+			disconnect_and_exit(1);
+		}
+
+		PQclear(res);
+		disconnect_and_exit(0);
+	}
+
+	/*
+	 * init a replication slot
+	 */
+	if (do_init_slot)
+	{
+		char		query[256];
+
+		if (verbose)
+			fprintf(stderr,
+					_("%s: init replication slot\n"),
+					progname);
+
+		snprintf(query, sizeof(query), "INIT_LOGICAL_REPLICATION \"%s\" \"%s\"",
+				 slot, plugin);
+
+		res = PQexec(conn, query);
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+		{
+			fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+					progname, query, PQerrorMessage(conn));
+			disconnect_and_exit(1);
+		}
+
+		if (PQntuples(res) != 1 || PQnfields(res) != 4)
+		{
+			fprintf(stderr,
+					_("%s: could not init logical rep: got %d rows and %d fields, expected %d rows and %d fields\n"),
+					progname, PQntuples(res), PQnfields(res), 1, 4);
+			disconnect_and_exit(1);
+		}
+
+		if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+		{
+			fprintf(stderr,
+					_("%s: could not parse log location \"%s\"\n"),
+					progname, PQgetvalue(res, 0, 1));
+			disconnect_and_exit(1);
+		}
+		startpos = ((uint64) hi) << 32 | lo;
+
+		slot = strdup(PQgetvalue(res, 0, 0));
+		PQclear(res);
+	}
+
+
+	if (!do_start_slot)
+		disconnect_and_exit(0);
+
+	while (true)
+	{
+		StreamLog();
+		if (time_to_abort)
+		{
+			/*
+			 * We've been Ctrl-C'ed. That's not an error, so exit without an
+			 * errorcode.
+			 */
+			exit(0);
+		}
+		else if (noloop)
+		{
+			fprintf(stderr, _("%s: disconnected.\n"), progname);
+			exit(1);
+		}
+		else
+		{
+			fprintf(stderr,
+			/* translator: check source for value for %d */
+					_("%s: disconnected. Waiting %d seconds to try again.\n"),
+					progname, RECONNECT_SLEEP_TIME);
+			pg_usleep(RECONNECT_SLEEP_TIME * 1000000);
+		}
+	}
+}
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index 6891c2c..64b2e003 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -22,6 +22,7 @@ char	   *connection_string = NULL;
 char	   *dbhost = NULL;
 char	   *dbuser = NULL;
 char	   *dbport = NULL;
+char	   *dbname = NULL;
 int			dbgetpassword = 0;	/* 0=auto, -1=never, 1=always */
 static char *dbpassword = NULL;
 PGconn	   *conn = NULL;
@@ -86,7 +87,7 @@ GetConnection(void)
 	}
 
 	keywords[i] = "dbname";
-	values[i] = "replication";
+	values[i] = dbname == NULL ? "replication" : dbname;
 	i++;
 	keywords[i] = "replication";
 	values[i] = "true";
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 77d6b86..78f20da 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -5,6 +5,7 @@ extern char *connection_string;
 extern char *dbhost;
 extern char *dbuser;
 extern char *dbport;
+extern char *dbname;
 extern int	dbgetpassword;
 
 /* Connection kept global so we can disconnect easily */
-- 
1.8.2.rc2.4.g7799588.dirty

0016-wal_decoding-test_logical_decoding-Add-extension-for.patchtext/x-patch; charset=us-asciiDownload
From a3a59fa972f211aad37826ee0a6b280a5c71f916 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen <ams@2ndQuadrant.com>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 16/17] wal_decoding: test_logical_decoding: Add extension for
 easier testing of logical decoding

This extension provides three functions for manipulating replication slots:
* init_logical_replication - initiate a replication slot and wait for consistent state
* start_logical_replication - return all changes since the last call up to now, without blocking
* free_logical_replication - free the logical slot again

Those are pretty direct synonyms for the replication connection commands.

Due to questions about how to integrate logical replication tests this module
also contains the current tests of logical replication itself.

Author: Abhijit Menon-Sen
---
 contrib/Makefile                                   |   1 +
 contrib/test_logical_decoding/Makefile             |  37 ++
 contrib/test_logical_decoding/expected/ddl.out     | 587 +++++++++++++++++++++
 contrib/test_logical_decoding/logical.conf         |   2 +
 contrib/test_logical_decoding/sql/ddl.sql          | 291 ++++++++++
 .../test_logical_decoding--1.0.sql                 |   6 +
 .../test_logical_decoding/test_logical_decoding.c  | 237 +++++++++
 .../test_logical_decoding.control                  |   5 +
 8 files changed, 1166 insertions(+)
 create mode 100644 contrib/test_logical_decoding/Makefile
 create mode 100644 contrib/test_logical_decoding/expected/ddl.out
 create mode 100644 contrib/test_logical_decoding/logical.conf
 create mode 100644 contrib/test_logical_decoding/sql/ddl.sql
 create mode 100644 contrib/test_logical_decoding/test_logical_decoding--1.0.sql
 create mode 100644 contrib/test_logical_decoding/test_logical_decoding.c
 create mode 100644 contrib/test_logical_decoding/test_logical_decoding.control

diff --git a/contrib/Makefile b/contrib/Makefile
index 6d2fe32..41cb892 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -51,6 +51,7 @@ SUBDIRS = \
 		tcn		\
 		test_parser	\
 		test_decoding	\
+		test_logical_decoding \
 		tsearch2	\
 		unaccent	\
 		vacuumlo	\
diff --git a/contrib/test_logical_decoding/Makefile b/contrib/test_logical_decoding/Makefile
new file mode 100644
index 0000000..0e7d5d3
--- /dev/null
+++ b/contrib/test_logical_decoding/Makefile
@@ -0,0 +1,37 @@
+MODULE_big = test_logical_decoding
+OBJS = test_logical_decoding.o
+
+EXTENSION = test_logical_decoding
+DATA = test_logical_decoding--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_logical_decoding
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+test_logical_decoding.o: test_logical_decoding.c
+
+# Disabled because these tests require "wal_level=logical", which
+# typical installcheck users do not have (e.g. buildfarm clients).
+installcheck:;
+
+submake-regress:
+	$(MAKE) -C $(top_builddir)/src/test/regress
+
+submake-test_decoding:
+	$(MAKE) -C $(top_builddir)/contrib/test_decoding
+
+check: all | submake-regress submake-test_decoding
+	$(pg_regress_check) --temp-config $(top_srcdir)/contrib/test_logical_decoding/logical.conf \
+	    --temp-install=./tmp_check \
+	    --extra-install=contrib/test_decoding \
+	    --extra-install=contrib/test_logical_decoding \
+	    ddl
+
+PHONY: submake-test_decoding submake-regress
diff --git a/contrib/test_logical_decoding/expected/ddl.out b/contrib/test_logical_decoding/expected/ddl.out
new file mode 100644
index 0000000..3947093
--- /dev/null
+++ b/contrib/test_logical_decoding/expected/ddl.out
@@ -0,0 +1,587 @@
+CREATE EXTENSION test_logical_decoding;
+-- predictability
+SET synchronous_commit = on;
+-- faster startup
+CHECKPOINT;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column? 
+----------
+ init
+(1 row)
+
+-- fail because of an already existing slot
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ERROR:  There already is a logical slot named "regression_slot"
+-- succeed once
+SELECT stop_logical_replication('regression_slot');
+ stop_logical_replication 
+--------------------------
+                        0
+(1 row)
+
+-- fail
+SELECT stop_logical_replication('regression_slot');
+ERROR:  couldn't find logical slot "regression_slot"
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column? 
+----------
+ init
+(1 row)
+
+/* check whether status function reports us, only reproduceable columns */
+SELECT slot_name, plugin, active,
+    xmin::xid IS NOT NULL,
+    pg_xlog_location_diff(restart_decoding_lsn, '0/01000000') > 0
+FROM pg_stat_logical_decoding;
+    slot_name    |    plugin     | active | ?column? | ?column? 
+-----------------+---------------+--------+----------+----------
+ regression_slot | test_decoding | f      | t        | t
+(1 row)
+
+/*
+ * Check that changes are handled correctly when interleaved with ddl
+ */
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+COMMIT;
+ALTER TABLE replication_example ADD COLUMN bar int;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+COMMIT;
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+-- collect all changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+                                               data                                                
+---------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:1 somedata[int4]:1 text[varchar]:1
+ table "replication_example": INSERT: id[int4]:2 somedata[int4]:1 text[varchar]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:3 somedata[int4]:2 text[varchar]:1 bar[int4]:4
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:4 somedata[int4]:2 text[varchar]:2 bar[int4]:4
+ table "replication_example": INSERT: id[int4]:5 somedata[int4]:2 text[varchar]:3 bar[int4]:4
+ table "replication_example": INSERT: id[int4]:6 somedata[int4]:2 text[varchar]:4 bar[int4]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:7 somedata[int4]:3 text[varchar]:1
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:8 somedata[int4]:3 text[varchar]:2
+ table "replication_example": INSERT: id[int4]:9 somedata[int4]:3 text[varchar]:3
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:10 somedata[int4]:4 somenum[varchar]:1
+ COMMIT
+(30 rows)
+
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
+-- throw away changes, they contain oids
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ count 
+-------
+    12
+(1 row)
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+BEGIN;
+INSERT INTO replication_example(somedata, somenum) VALUES (6, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod1 int;
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 2, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod2 int;
+INSERT INTO replication_example(somedata, somenum, zaphod2) VALUES (6, 3, 1);
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 4, 2);
+COMMIT;
+/*
+ * check whether the correct indexes are chosen for deletions
+ */
+CREATE TABLE tr_unique(id2 serial unique NOT NULL, data int);
+INSERT INTO tr_unique(data) VALUES(10);
+--show deletion with unique index
+DELETE FROM tr_unique;
+ALTER TABLE tr_unique RENAME TO tr_pkey;
+-- show changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+                                                          data                                                          
+------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "replication_example": INSERT: id[int4]:11 somedata[int4]:5 somenum[int4]:1
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:12 somedata[int4]:6 somenum[int4]:1
+ table "replication_example": INSERT: id[int4]:13 somedata[int4]:6 somenum[int4]:2 zaphod1[int4]:1
+ table "replication_example": INSERT: id[int4]:14 somedata[int4]:6 somenum[int4]:3 zaphod1[int4]:(null) zaphod2[int4]:1
+ table "replication_example": INSERT: id[int4]:15 somedata[int4]:6 somenum[int4]:4 zaphod1[int4]:2 zaphod2[int4]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "tr_unique": INSERT: id2[int4]:1 data[int4]:10
+ COMMIT
+ BEGIN
+ table "tr_unique": DELETE: id2[int4]:1
+ COMMIT
+ BEGIN
+ COMMIT
+(19 rows)
+
+-- hide changes bc of oid visible in full table rewrites
+ALTER TABLE tr_pkey ADD COLUMN id serial primary key;
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ count 
+-------
+     2
+(1 row)
+
+INSERT INTO tr_pkey(data) VALUES(1);
+--show deletion with primary key
+DELETE FROM tr_pkey;
+/* display results */
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+                             data                             
+--------------------------------------------------------------
+ BEGIN
+ table "tr_pkey": INSERT: id2[int4]:2 data[int4]:1 id[int4]:1
+ COMMIT
+ BEGIN
+ table "tr_pkey": DELETE: id[int4]:1
+ COMMIT
+(6 rows)
+
+/*
+ * check that disk spooling works
+ */
+BEGIN;
+CREATE TABLE tr_etoomuch (id serial primary key, data int);
+INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+DELETE FROM tr_etoomuch WHERE id < 5000;
+UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+COMMIT;
+/* display results, but hide most of the output */
+SELECT count(*), min(data), max(data)
+FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1')
+GROUP BY substring(data, 1, 24)
+ORDER BY 1;
+ count |                              min                              |                             max                             
+-------+---------------------------------------------------------------+-------------------------------------------------------------
+     1 | COMMIT                                                        | COMMIT
+     1 | BEGIN                                                         | BEGIN
+  4999 | table "tr_etoomuch": DELETE: id[int4]:1                       | table "tr_etoomuch": DELETE: id[int4]:999
+  5234 | table "tr_etoomuch": UPDATE: id[int4]:10000 data[int4]:-10000 | table "tr_etoomuch": UPDATE: id[int4]:9999 data[int4]:-9999
+ 10234 | table "tr_etoomuch": INSERT: id[int4]:10000 data[int4]:10000  | table "tr_etoomuch": INSERT: id[int4]:9 data[int4]:9
+(5 rows)
+
+/*
+ * check whether we subtransactions correctly in relation with each other
+ */
+CREATE TABLE tr_sub (id serial primary key, path text);
+-- toplevel, subtxn, toplevel, subtxn, subtxn
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('1-top-#1');
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#2');
+RELEASE SAVEPOINT a;
+SAVEPOINT b;
+SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#2');
+RELEASE SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-#1');
+RELEASE SAVEPOINT b;
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+                            data                            
+------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:1 path[text]:1-top-#1
+ table "tr_sub": INSERT: id[int4]:2 path[text]:1-top-1-#1
+ table "tr_sub": INSERT: id[int4]:3 path[text]:1-top-1-#2
+ table "tr_sub": INSERT: id[int4]:4 path[text]:1-top-2-1-#1
+ table "tr_sub": INSERT: id[int4]:5 path[text]:1-top-2-1-#2
+ table "tr_sub": INSERT: id[int4]:6 path[text]:1-top-2-#1
+ COMMIT
+(10 rows)
+
+-- check that we handle xlog assignments correctly
+BEGIN;
+-- nest 80 subtxns
+SAVEPOINT subtop;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+-- assign xid by inserting
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#1');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#2');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#3');
+RELEASE SAVEPOINT subtop;
+INSERT INTO tr_sub(path) VALUES ('2-top-#1');
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+                             data                             
+--------------------------------------------------------------
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:7 path[text]:2-top-1...--#1
+ table "tr_sub": INSERT: id[int4]:8 path[text]:2-top-1...--#2
+ table "tr_sub": INSERT: id[int4]:9 path[text]:2-top-1...--#3
+ table "tr_sub": INSERT: id[int4]:10 path[text]:2-top-#1
+ COMMIT
+(6 rows)
+
+/*
+ * Check whether treating a table as a catalog table works somewhat
+ */
+CREATE TABLE replication_metadata (
+    id serial primary key,
+    relation name NOT NULL,
+    options text[]
+)
+WITH (treat_as_catalog_table = true)
+;
+\d+ replication_metadata
+                                              Table "public.replication_metadata"
+  Column  |  Type   |                             Modifiers                             | Storage  | Stats target | Description 
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id       | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain    |              | 
+ relation | name    | not null                                                          | plain    |              | 
+ options  | text[]  |                                                                   | extended |              | 
+Indexes:
+    "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=true
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('foo', ARRAY['a', 'b']);
+ALTER TABLE replication_metadata RESET (treat_as_catalog_table);
+\d+ replication_metadata
+                                              Table "public.replication_metadata"
+  Column  |  Type   |                             Modifiers                             | Storage  | Stats target | Description 
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id       | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain    |              | 
+ relation | name    | not null                                                          | plain    |              | 
+ options  | text[]  |                                                                   | extended |              | 
+Indexes:
+    "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('bar', ARRAY['a', 'b']);
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);
+\d+ replication_metadata
+                                              Table "public.replication_metadata"
+  Column  |  Type   |                             Modifiers                             | Storage  | Stats target | Description 
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id       | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain    |              | 
+ relation | name    | not null                                                          | plain    |              | 
+ options  | text[]  |                                                                   | extended |              | 
+Indexes:
+    "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=true
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('blub', NULL);
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = false);
+\d+ replication_metadata
+                                              Table "public.replication_metadata"
+  Column  |  Type   |                             Modifiers                             | Storage  | Stats target | Description 
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id       | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain    |              | 
+ relation | name    | not null                                                          | plain    |              | 
+ options  | text[]  |                                                                   | extended |              | 
+Indexes:
+    "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=false
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('zaphod', NULL);
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+                                             data                                             
+----------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:1 relation[name]:foo options[_text]:{a,b}
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:2 relation[name]:bar options[_text]:{a,b}
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:3 relation[name]:blub options[_text]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:4 relation[name]:zaphod options[_text]:(null)
+ COMMIT
+(20 rows)
+
+/*
+ * check whether we handle updates/deletes correct with & without a pkey
+ */
+/* we should handle the case without a key at all more gracefully */
+CREATE TABLE table_without_key(id serial, data int);
+INSERT INTO table_without_key(data) VALUES(1),(2);
+DELETE FROM table_without_key WHERE data = 1;
+UPDATE table_without_key SET data = 3 WHERE data = 2;
+UPDATE table_without_key SET id = -id;
+UPDATE table_without_key SET id = -id;
+DELETE FROM table_without_key WHERE data = 3;
+CREATE TABLE table_with_pkey(id serial primary key, data int);
+INSERT INTO table_with_pkey(data) VALUES(1), (2);
+DELETE FROM table_with_pkey WHERE data = 1;
+UPDATE table_with_pkey SET data = 3 WHERE data = 2;
+UPDATE table_with_pkey SET id = -id;
+UPDATE table_with_pkey SET id = -id;
+DELETE FROM table_with_pkey WHERE data = 3;
+CREATE TABLE table_with_unique(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id DROP NOT NULL;
+INSERT INTO table_with_unique(data) VALUES(1), (2);
+DELETE FROM table_with_unique WHERE data = 1;
+UPDATE table_with_unique SET data = 3 WHERE data = 2;
+UPDATE table_with_unique SET id = -id;
+UPDATE table_with_unique SET id = -id;
+DELETE FROM table_with_unique WHERE data = 3;
+CREATE TABLE table_with_unique_not_null(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id SET NOT NULL; --already set
+INSERT INTO table_with_unique_not_null(data) VALUES(1), (2);
+DELETE FROM table_with_unique_not_null WHERE data = 1;
+UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
+UPDATE table_with_unique_not_null SET id = -id;
+UPDATE table_with_unique_not_null SET id = -id;
+DELETE FROM table_with_unique_not_null WHERE data = 3;
+CREATE TABLE table_with_oid(id serial, data int) WITH oids;
+CREATE UNIQUE INDEX table_with_oid_oid ON table_with_oid(oid);
+INSERT INTO table_with_oid(data) VALUES(1), (2);
+DELETE FROM table_with_oid WHERE data = 1;
+UPDATE table_with_oid SET data = 3 WHERE data = 2;
+DELETE FROM table_with_oid WHERE data = 3;
+UPDATE table_with_oid SET id = -id;
+UPDATE table_with_oid SET id = -id;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+                                                 data                                                 
+------------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_without_key": INSERT: id[int4]:1 data[int4]:1
+ table "table_without_key": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_without_key": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_pkey": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_pkey": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_pkey": DELETE: id[int4]:1
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: old-pkey: id[int4]:2 new-tuple: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: old-pkey: id[int4]:-2 new-tuple: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": DELETE: id[int4]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_unique": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_unique": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_unique": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_unique_not_null": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": DELETE: id[int4]:1
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:2 new-tuple: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:-2 new-tuple: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": DELETE: id[int4]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_oid": INSERT: oid[oid]:16484 id[int4]:1 data[int4]:1
+ table "table_with_oid": INSERT: oid[oid]:16485 id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_oid": DELETE: oid[oid]:16484
+ COMMIT
+ BEGIN
+ table "table_with_oid": UPDATE: oid[oid]:16485 id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_oid": DELETE: oid[oid]:16485
+ COMMIT
+(105 rows)
+
+-- check toast support
+SELECT setseed(0);
+ setseed 
+---------
+ 
+(1 row)
+
+CREATE TABLE toasttable(
+       id serial primary key,
+       toasted_col1 text,
+       rand1 float8 DEFAULT random(),
+       toasted_col2 text,
+       rand2 float8 DEFAULT random()
+       );
+-- uncompressed external toast data
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+-- compressed external toast data
+INSERT INTO toasttable(toasted_col2) SELECT repeat(string_agg(to_char(g.i, 'FM0000'), ''), 50) FROM generate_series(1, 500) g(i);
+-- update of existing column
+UPDATE toasttable
+    SET toasted_col1 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         data                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "toasttable": INSERT: id[int4]:1 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.840187716763467 toasted_col2[text]:(null) rand2[float8]:0.394382926635444
+ COMMIT
+ BEGIN
+ table "toasttable": INSERT: id[int4]:2 toasted_col1[text]:(null) rand1[float8]:0.783099223393947 toasted_col2[text]:0001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500 rand2[float8]:0.798440033104271
+ COMMIT
+ BEGIN
+ table "toasttable": UPDATE: id[int4]:1 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.840187716763467 toasted_col2[text]:(null) rand2[float8]:0.394382926635444
+ COMMIT
+(11 rows)
+
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+-- update of second column, first column unchanged
+UPDATE toasttable
+    SET toasted_col2 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+-- make sure we decode correctly even if the toast table is gone
+DROP TABLE toasttable;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        data                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "toasttable": INSERT: id[int4]:3 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.911647357512265 toasted_col2[text]:(null) rand2[float8]:0.197551369201392
+ COMMIT
+ BEGIN
+ table "toasttable": UPDATE: id[int4]:1 toasted_col1[text]:(unchanged-toast-datum) rand1[float8]:0.840187716763467 toasted_col2[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand2[float8]:0.394382926635444
+ COMMIT
+ BEGIN
+ COMMIT
+(8 rows)
+
+-- done, free logical replication slot
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data 
+------
+(0 rows)
+
+SELECT stop_logical_replication('regression_slot');
+ stop_logical_replication 
+--------------------------
+                        0
+(1 row)
+
+/* check whether we aren't visible anymore now */
+SELECT * FROM pg_stat_logical_decoding;
+ slot_name | plugin | database | active | xmin | restart_decoding_lsn 
+-----------+--------+----------+--------+------+----------------------
+(0 rows)
+
diff --git a/contrib/test_logical_decoding/logical.conf b/contrib/test_logical_decoding/logical.conf
new file mode 100644
index 0000000..a7c6c86
--- /dev/null
+++ b/contrib/test_logical_decoding/logical.conf
@@ -0,0 +1,2 @@
+wal_level = logical
+max_logical_slots = 4
diff --git a/contrib/test_logical_decoding/sql/ddl.sql b/contrib/test_logical_decoding/sql/ddl.sql
new file mode 100644
index 0000000..1e46584
--- /dev/null
+++ b/contrib/test_logical_decoding/sql/ddl.sql
@@ -0,0 +1,291 @@
+CREATE EXTENSION test_logical_decoding;
+-- predictability
+SET synchronous_commit = on;
+
+-- faster startup
+CHECKPOINT;
+
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+-- fail because of an already existing slot
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+-- succeed once
+SELECT stop_logical_replication('regression_slot');
+-- fail
+SELECT stop_logical_replication('regression_slot');
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+
+/* check whether status function reports us, only reproduceable columns */
+SELECT slot_name, plugin, active,
+    xmin::xid IS NOT NULL,
+    pg_xlog_location_diff(restart_decoding_lsn, '0/01000000') > 0
+FROM pg_stat_logical_decoding;
+
+/*
+ * Check that changes are handled correctly when interleaved with ddl
+ */
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+COMMIT;
+
+ALTER TABLE replication_example ADD COLUMN bar int;
+
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+COMMIT;
+
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+
+-- collect all changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
+-- throw away changes, they contain oids
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, somenum) VALUES (6, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod1 int;
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 2, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod2 int;
+INSERT INTO replication_example(somedata, somenum, zaphod2) VALUES (6, 3, 1);
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 4, 2);
+COMMIT;
+
+/*
+ * check whether the correct indexes are chosen for deletions
+ */
+
+CREATE TABLE tr_unique(id2 serial unique NOT NULL, data int);
+INSERT INTO tr_unique(data) VALUES(10);
+--show deletion with unique index
+DELETE FROM tr_unique;
+
+ALTER TABLE tr_unique RENAME TO tr_pkey;
+
+-- show changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- hide changes bc of oid visible in full table rewrites
+ALTER TABLE tr_pkey ADD COLUMN id serial primary key;
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO tr_pkey(data) VALUES(1);
+--show deletion with primary key
+DELETE FROM tr_pkey;
+
+/* display results */
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+/*
+ * check that disk spooling works
+ */
+BEGIN;
+CREATE TABLE tr_etoomuch (id serial primary key, data int);
+INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+DELETE FROM tr_etoomuch WHERE id < 5000;
+UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+COMMIT;
+
+/* display results, but hide most of the output */
+SELECT count(*), min(data), max(data)
+FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1')
+GROUP BY substring(data, 1, 24)
+ORDER BY 1;
+
+/*
+ * check whether we subtransactions correctly in relation with each other
+ */
+CREATE TABLE tr_sub (id serial primary key, path text);
+
+-- toplevel, subtxn, toplevel, subtxn, subtxn
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('1-top-#1');
+
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#2');
+RELEASE SAVEPOINT a;
+
+SAVEPOINT b;
+SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#2');
+RELEASE SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-#1');
+RELEASE SAVEPOINT b;
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- check that we handle xlog assignments correctly
+BEGIN;
+-- nest 80 subtxns
+SAVEPOINT subtop;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+-- assign xid by inserting
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#1');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#2');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#3');
+RELEASE SAVEPOINT subtop;
+INSERT INTO tr_sub(path) VALUES ('2-top-#1');
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+/*
+ * Check whether treating a table as a catalog table works somewhat
+ */
+CREATE TABLE replication_metadata (
+    id serial primary key,
+    relation name NOT NULL,
+    options text[]
+)
+WITH (treat_as_catalog_table = true)
+;
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('foo', ARRAY['a', 'b']);
+
+ALTER TABLE replication_metadata RESET (treat_as_catalog_table);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('bar', ARRAY['a', 'b']);
+
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('blub', NULL);
+
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = false);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('zaphod', NULL);
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+/*
+ * check whether we handle updates/deletes correct with & without a pkey
+ */
+
+/* we should handle the case without a key at all more gracefully */
+CREATE TABLE table_without_key(id serial, data int);
+INSERT INTO table_without_key(data) VALUES(1),(2);
+DELETE FROM table_without_key WHERE data = 1;
+UPDATE table_without_key SET data = 3 WHERE data = 2;
+UPDATE table_without_key SET id = -id;
+UPDATE table_without_key SET id = -id;
+DELETE FROM table_without_key WHERE data = 3;
+
+CREATE TABLE table_with_pkey(id serial primary key, data int);
+INSERT INTO table_with_pkey(data) VALUES(1), (2);
+DELETE FROM table_with_pkey WHERE data = 1;
+UPDATE table_with_pkey SET data = 3 WHERE data = 2;
+UPDATE table_with_pkey SET id = -id;
+UPDATE table_with_pkey SET id = -id;
+DELETE FROM table_with_pkey WHERE data = 3;
+
+CREATE TABLE table_with_unique(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id DROP NOT NULL;
+INSERT INTO table_with_unique(data) VALUES(1), (2);
+DELETE FROM table_with_unique WHERE data = 1;
+UPDATE table_with_unique SET data = 3 WHERE data = 2;
+UPDATE table_with_unique SET id = -id;
+UPDATE table_with_unique SET id = -id;
+DELETE FROM table_with_unique WHERE data = 3;
+
+CREATE TABLE table_with_unique_not_null(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id SET NOT NULL; --already set
+INSERT INTO table_with_unique_not_null(data) VALUES(1), (2);
+DELETE FROM table_with_unique_not_null WHERE data = 1;
+UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
+UPDATE table_with_unique_not_null SET id = -id;
+UPDATE table_with_unique_not_null SET id = -id;
+DELETE FROM table_with_unique_not_null WHERE data = 3;
+
+CREATE TABLE table_with_oid(id serial, data int) WITH oids;
+CREATE UNIQUE INDEX table_with_oid_oid ON table_with_oid(oid);
+INSERT INTO table_with_oid(data) VALUES(1), (2);
+DELETE FROM table_with_oid WHERE data = 1;
+UPDATE table_with_oid SET data = 3 WHERE data = 2;
+DELETE FROM table_with_oid WHERE data = 3;
+UPDATE table_with_oid SET id = -id;
+UPDATE table_with_oid SET id = -id;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- check toast support
+SELECT setseed(0);
+CREATE TABLE toasttable(
+       id serial primary key,
+       toasted_col1 text,
+       rand1 float8 DEFAULT random(),
+       toasted_col2 text,
+       rand2 float8 DEFAULT random()
+       );
+
+-- uncompressed external toast data
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+
+-- compressed external toast data
+INSERT INTO toasttable(toasted_col2) SELECT repeat(string_agg(to_char(g.i, 'FM0000'), ''), 50) FROM generate_series(1, 500) g(i);
+
+-- update of existing column
+UPDATE toasttable
+    SET toasted_col1 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+
+-- update of second column, first column unchanged
+UPDATE toasttable
+    SET toasted_col2 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+
+-- make sure we decode correctly even if the toast table is gone
+DROP TABLE toasttable;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- done, free logical replication slot
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+SELECT stop_logical_replication('regression_slot');
+
+/* check whether we aren't visible anymore now */
+SELECT * FROM pg_stat_logical_decoding;
diff --git a/contrib/test_logical_decoding/test_logical_decoding--1.0.sql b/contrib/test_logical_decoding/test_logical_decoding--1.0.sql
new file mode 100644
index 0000000..b6e048c
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding--1.0.sql
@@ -0,0 +1,6 @@
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_logical_decoding" to load this file. \quit
+
+CREATE FUNCTION start_logical_replication (slotname name, pos text, VARIADIC options text[] DEFAULT '{}', OUT location text, OUT xid bigint, OUT data text) RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'start_logical_replication'
+LANGUAGE C IMMUTABLE STRICT;
diff --git a/contrib/test_logical_decoding/test_logical_decoding.c b/contrib/test_logical_decoding/test_logical_decoding.c
new file mode 100644
index 0000000..6c78319
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding.c
@@ -0,0 +1,237 @@
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "storage/fd.h"
+#include "miscadmin.h"
+#include "funcapi.h"
+
+PG_MODULE_MAGIC;
+
+Datum		start_logical_replication(PG_FUNCTION_ARGS);
+
+static Tuplestorestate *tupstore = NULL;
+static TupleDesc tupdesc;
+
+static void
+LogicalOutputPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+	resetStringInfo(ctx->out);
+}
+
+static void
+LogicalOutputWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+	Datum		values[3];
+	bool		nulls[3];
+	char		buf[60];
+
+	sprintf(buf, "%X/%X", (uint32) (lsn >> 32), (uint32) lsn);
+
+	memset(nulls, 0, sizeof(nulls));
+	values[0] = CStringGetTextDatum(buf);
+	values[1] = Int64GetDatum(xid);
+	values[2] = CStringGetTextDatum(ctx->out->data);
+
+	tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+}
+
+PG_FUNCTION_INFO_V1(start_logical_replication);
+
+Datum
+start_logical_replication(PG_FUNCTION_ARGS)
+{
+	Name		name = PG_GETARG_NAME(0);
+
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	XLogRecPtr	now;
+	XLogRecPtr	startptr;
+	XLogRecPtr	rp;
+
+	LogicalDecodingContext *ctx;
+
+	ResourceOwner old_resowner = CurrentResourceOwner;
+	ArrayType  *arr;
+	Size		ndim;
+	List	   *options = NIL;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	arr = PG_GETARG_ARRAYTYPE_P(2);
+	ndim = ARR_NDIM(arr);
+
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	if (ndim > 1)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("start_logical_replication only accept one dimension of arguments")));
+	}
+	else if (array_contains_nulls(arr))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			  errmsg("start_logical_replication expects NOT NULL options")));
+	}
+	else if (ndim == 1)
+	{
+		int			nelems;
+		Datum	   *datum_opts;
+		int			i;
+
+		Assert(ARR_ELEMTYPE(arr) == TEXTOID);
+
+		deconstruct_array(arr, TEXTOID, -1, false, 'i',
+						  &datum_opts, NULL, &nelems);
+
+		if (nelems % 2 != 0)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("options need to be specified pairwise")));
+		}
+
+		for (i = 0; i < nelems; i += 2)
+		{
+			char	   *name = VARDATA(DatumGetTextP(datum_opts[i]));
+			char	   *opt = VARDATA(DatumGetTextP(datum_opts[i + 1]));
+
+			options = lappend(options, makeDefElem(name, (Node *) makeString(opt)));
+		}
+	}
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * XXX: It's impolite to ignore our argument and keep decoding until the
+	 * current position.
+	 */
+	now = GetFlushRecPtr();
+
+	/*
+	 * We need to create a normal_snapshot_reader, but adjust it to use our
+	 * page_read callback, and also make its reorder buffer use our callback
+	 * wrappers that don't depend on walsender.
+	 */
+
+	CheckLogicalReplicationRequirements();
+	LogicalDecodingReAcquireSlot(NameStr(*name));
+
+	ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false,
+									   MyLogicalDecodingSlot->confirmed_flush,
+									   options,
+									   logical_read_local_xlog_page,
+									   LogicalOutputPrepareWrite,
+									   LogicalOutputWrite);
+
+	startptr = MyLogicalDecodingSlot->restart_decoding;
+
+	elog(DEBUG1, "Starting logical replication from %X/%X to %X/%X",
+		 (uint32) (MyLogicalDecodingSlot->restart_decoding >> 32),
+		 (uint32) MyLogicalDecodingSlot->restart_decoding,
+		 (uint32) (now >> 32), (uint32) now);
+
+	CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner, "logical decoding");
+
+	/* invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	PG_TRY();
+	{
+
+		while ((startptr != InvalidXLogRecPtr && startptr < now) ||
+			   (ctx->reader->EndRecPtr && ctx->reader->EndRecPtr < now))
+		{
+			XLogRecord *record;
+			char	   *errm = NULL;
+
+			record = XLogReadRecord(ctx->reader, startptr, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			startptr = InvalidXLogRecPtr;
+
+			if (record != NULL)
+			{
+				XLogRecordBuffer buf;
+
+				buf.origptr = ctx->reader->ReadRecPtr;
+				buf.record = *record;
+				buf.record_data = XLogRecGetData(record);
+
+				/*
+				 * The {begin_txn,change,commit_txn}_wrapper callbacks above
+				 * will store the description into our tuplestore.
+				 */
+				DecodeRecordIntoReorderBuffer(ctx, &buf);
+			}
+		}
+	}
+	PG_CATCH();
+	{
+		LogicalDecodingReleaseSlot();
+
+		/*
+		 * clear timetravel entries: XXX allowed in aborted TXN?
+		 */
+		InvalidateSystemCaches();
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	rp = ctx->reader->EndRecPtr;
+	if (rp >= now)
+	{
+		elog(DEBUG1, "Reached endpoint (wanted: %X/%X, got: %X/%X)",
+			 (uint32) (now >> 32), (uint32) now,
+			 (uint32) (rp >> 32), (uint32) rp);
+	}
+
+	tuplestore_donestoring(tupstore);
+
+	CurrentResourceOwner = old_resowner;
+
+	/*
+	 * Next time, start where we left off. (Hunting things, the family
+	 * business..)
+	 */
+	MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+
+	LogicalDecodingReleaseSlot();
+
+	return (Datum) 0;
+}
diff --git a/contrib/test_logical_decoding/test_logical_decoding.control b/contrib/test_logical_decoding/test_logical_decoding.control
new file mode 100644
index 0000000..0dce19f
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding.control
@@ -0,0 +1,5 @@
+# test_logical_decoding extension
+comment = 'test logical decoding'
+default_version = '1.0'
+module_pathname = '$libdir/test_logical_decoding'
+relocatable = true
-- 
1.8.2.rc2.4.g7799588.dirty

0017-wal_decoding-design-document-v2.4-and-snapshot-build.patchtext/x-patch; charset=us-asciiDownload
>From b4e663f53f92a727f6f4d9832542546cbff977c8 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 17/17] wal_decoding: design document v2.4 and snapshot
 building design doc v0.5

---
 src/backend/replication/logical/DESIGN.txt         | 593 +++++++++++++++++++++
 src/backend/replication/logical/Makefile           |   6 +
 .../replication/logical/README.SNAPBUILD.txt       | 241 +++++++++
 3 files changed, 840 insertions(+)
 create mode 100644 src/backend/replication/logical/DESIGN.txt
 create mode 100644 src/backend/replication/logical/README.SNAPBUILD.txt

diff --git a/src/backend/replication/logical/DESIGN.txt b/src/backend/replication/logical/DESIGN.txt
new file mode 100644
index 0000000..d76fdb4
--- /dev/null
+++ b/src/backend/replication/logical/DESIGN.txt
@@ -0,0 +1,593 @@
+//-*- mode: adoc -*-
+= High Level Design for Logical Replication in Postgres =
+:copyright: PostgreSQL Global Development Group 2012
+:author: Andres Freund, 2ndQuadrant Ltd.
+:email: andres@2ndQuadrant.com
+
+== Introduction ==
+
+This document aims to first explain why we think postgres needs another
+replication solution and what that solution needs to offer in our opinion. Then
+it sketches out our proposed implementation.
+
+In contrast to an earlier version of the design document which talked about the
+implementation of four parts of replication solutions:
+
+1. Source data generation
+1. Transportation of that data
+1. Applying the changes
+1. Conflict resolution
+
+this version only plans to talk about the first part in detail as it is an
+independent and complex part usable for a wide range of use cases which we want
+to get included into postgres in a first step.
+
+=== Previous discussions ===
+
+There are two rather large threads discussing several parts of the initial
+prototype and proposed architecture:
+
+- http://archives.postgresql.org/message-id/201206131327.24092.andres@2ndquadrant.com[Logical Replication/BDR prototype and architecture]
+- http://archives.postgresql.org/message-id/201206211341.25322.andres@2ndquadrant.com[Catalog/Metadata consistency during changeset extraction from WAL]
+
+Those discussions lead to some fundamental design changes which are presented in this document.
+
+=== Changes from v1 ===
+* At least a partial decoding step required/possible on the source system
+* No intermediate ("schema only") instances required
+* DDL handling, without event triggers
+* A very simple text conversion is provided for debugging/demo purposes
+* Smaller scope
+
+== Existing approaches to replication in Postgres ==
+
+If any currently used approach to replication can be made to support every
+use-case/feature we need, it likely is not a good idea to implement something
+different. Currently three basic approaches are in use in/around postgres
+today:
+
+. Trigger based
+. Recovery based/Physical footnote:[Often referred to by terms like Hot Standby, Streaming Replication, Point In Time Recovery]
+. Statement based
+
+Statement based replication has obvious and known problems with consistency and
+correctness making it hard to use in the general case so we will not further
+discuss it here.
+
+Lets have a look at the advantages/disadvantages of the other approaches:
+
+=== Trigger based Replication ===
+
+This variant has a multitude of significant advantages:
+
+* implementable in userspace
+* easy to customize
+* just about everything can be made configurable
+* cross version support
+* cross architecture support
+* can feed into systems other than postgres
+* no overhead from writes to non-replicated tables
+* writable standbys
+* mature solutions
+* multimaster implementations possible & existing
+
+But also a number of disadvantages, some of them very hard to solve:
+
+* essentially duplicates the amount of writes (or even more!)
+* synchronous replication hard or impossible to implement
+* noticeable CPU overhead
+** trigger functions
+** text conversion of data
+* complex parts implemented in several solutions
+* not in core
+
+Especially the higher amount of writes might seem easy to solve at a first
+glance but a solution not using a normal transactional table for its log/queue
+has to solve a lot of problems. The major ones are:
+
+* crash safety, restartability & spilling to disk
+* consistency with the commit status of transactions
+* only a minimal amount of synchronous work should be done inside individual
+transactions
+
+In our opinion those problems are restricting progress/wider distribution of
+these class of solutions. It is our aim though that existing solutions in this
+space - most prominently slony and londiste - can benefit from the work we are
+doing & planning to do by incorporating at least parts of the changeset
+generation infrastructure.
+
+=== Recovery based Replication ===
+
+This type of solution, being built into postgres and of increasing popularity,
+has and will have its use cases and we do not aim to replace but to complement
+it. We plan to reuse some of the infrastructure and to make it possible to mix
+both modes of replication
+
+Advantages:
+
+* builtin
+* built on existing infrastructure from crash recovery
+* efficient
+** minimal CPU, memory overhead on primary
+** low amount of additional writes
+* synchronous operation mode
+* low maintenance once setup
+* handles DDL
+
+Disadvantages:
+
+* standbys are read only
+* no cross version support
+* no cross architecture support
+* no replication into foreign systems
+* hard to customize
+* not configurable on the level of database, tables, ...
+
+== Goals ==
+
+As seen in the previous short survey of the two major interesting classes of
+replication solution there is a significant gap between those. Our aim is to
+make it smaller.
+
+We aim for:
+
+* in core
+* low CPU overhead
+* low storage overhead
+* asynchronous, optionally synchronous operation modes
+* robust
+* modular
+* basis for other technologies (sharding, replication into other DBMS's, ...)
+* basis for at least one multi-master solution
+* make the implementation as unintrusive as possible, but not more
+
+== New Architecture ==
+
+=== Overview ===
+
+Our proposal is to reuse the basic principle of WAL based replication, namely
+reusing data that already needs to be written for another purpose, and extend
+it to allow most, but not all, the flexibility of trigger based solutions.
+We want to do that by decoding the WAL back into a non-physical form.
+
+To get the flexibility we and others want we propose that the last step of
+changeset generation, transforming it into a format that can be used by the
+replication consumer, is done in an extensible manner. In the schema the part
+that does that is described as 'Output Plugin'. To keep the amount of
+duplication between different plugins as low as possible the plugin should only
+do a a very limited amount of work.
+
+The following paragraphs contain reasoning for the individual design decisions
+made and their highlevel design.
+
+=== Schematics ===
+
+The basic proposed architecture for changeset extraction is presented in the
+following diagram. The first part should look familiar to anyone knowing
+postgres' architecture. The second is where most of the new magic happens.
+
+[[basic-schema]]
+.Architecture Schema
+["ditaa"]
+------------------------------------------------------------------------------
+        Traditional Stuff
+
+ +---------+---------+---------+---------+----+
+ | Backend | Backend | Backend | Autovac | ...|
+ +----+----+---+-----+----+----+----+----+-+--+
+      |        |          |         |      |
+      +------+ | +--------+         |      |
+    +-+      | | | +----------------+      |
+    |        | | | |                       |
+    |        v v v v                       |
+    |     +------------+                   |
+    |     | WAL writer |<------------------+
+    |     +------------+
+    |       | | | | |
+    v       v v v v v       +-------------------+
++--------+ +---------+   +->| Startup/Recovery  |
+|{s}     | |{s}      |   |  +-------------------+
+|Catalog | |   WAL   |---+->| SR/Hot Standby    |
+|        | |         |   |  +-------------------+
++--------+ +---------+   +->| Point in Time     |
+    ^          |            +-------------------+
+ ---|----------|--------------------------------
+    |       New Stuff
++---+          |
+|              v            Running separately
+| +----------------+  +=-------------------------+
+| | Walsender  |   |  |                          |
+| |            v   |  |    +-------------------+ |
+| +-------------+  |  | +->| Logical Rep.      | |
+| |     WAL     |  |  | |  +-------------------+ |
++-|  decoding   |  |  | +->| Multimaster       | |
+| +------+------/  |  | |  +-------------------+ |
+| |            |   |  | +->| Slony             | |
+| |            v   |  | |  +-------------------+ |
+| +-------------+  |  | +->| Auditing          | |
+| |     TX      |  |  | |  +-------------------+ |
++-| reassembly  |  |  | +->| Mysql/...         | |
+| +-------------/  |  | |  +-------------------+ |
+| |            |   |  | +->| Custom Solutions  | |
+| |            v   |  | |  +-------------------+ |
+| +-------------+  |  | +->| Debugging         | |
+| |   Output    |  |  | |  +-------------------+ |
++-|   Plugin    |--|--|-+->| Data Recovery     | |
+  +-------------/  |  |    +-------------------+ |
+  |                |  |                          |
+  +----------------+  +--------------------------|
+------------------------------------------------------------------------------
+
+=== WAL enrichement ===
+
+To be able to decode individual WAL records at the very minimal they need to
+contain enough information to reconstruct what has happened to which row. The
+action is already encoded in the WAL records header in most of the cases.
+
+As an example of missing data, the WAL record emitted when a row gets deleted,
+only contains its physical location. At the very least we need a way to
+identify the deleted row: in a relational database the minimal amount of data
+that does that should be the primary key footnote:[Yes, there are use cases
+where the whole row is needed, or where no primary key can be found].
+
+We propose that for now it is enough to extend the relevant WAL record with
+additional data when the newly introduced 'WAL_level = logical' is set.
+
+Previously it has been argued on the hackers mailing list that a generic 'WAL
+record annotation' mechanism might be a good thing. That mechanism would allow
+to attach arbitrary data to individual wal records making it easier to extend
+postgres to support something like what we propose.. While we don't oppose that
+idea we think it is largely orthogonal issue to this proposal as a whole
+because the format of a WAL records is version dependent by nature and the
+necessary changes for our easy way are small, so not much effort is lost.
+
+A full annotation capability is a complex endeavour on its own as the parts of
+the code generating the relevant WAL records has somewhat complex requirements
+and cannot easily be configured from the outside.
+
+Currently this is contained in the http://archives.postgresql.org/message-id/1347669575-14371-6-git-send-email-andres@2ndquadrant.com[Log enough data into the wal to reconstruct logical changes from it] patch.
+
+=== WAL parsing & decoding ===
+
+The main complexity when reading the WAL as stored on disk is that the format
+is somewhat complex and the existing parser is too deeply integrated in the
+recovery system to be directly reusable. Once a reusable parser exists decoding
+the binary data into individual WAL records is a small problem.
+
+Currently two competing proposals for this module exist, each having its own
+merits. In the grand scheme of this proposal it is irrelevant which one gets
+picked as long as the functionality gets integrated.
+
+The mailing list post
+http:http://archives.postgresql.org/message-id/1347669575-14371-3-git-send-email-andres@2ndquadrant.com[Add
+support for a generic wal reading facility dubbed XLogReader] contains both
+competing patches and discussion around which one is preferable.
+
+Once the WAL has been decoded into individual records two major issues exist:
+
+1. records from different transactions and even individual user level actions
+are intermingled
+1. the data attached to records cannot be interpreted on its own, it is only
+meaningful with a lot of required information (including table, columns, types
+and more)
+
+The solution to the first issue is described in the next section: <<tx-reassembly>>
+
+The second problem is probably the reason why no mature solution to reuse the
+WAL for logical changeset generation exists today. See the <<snapbuilder>>
+paragraph for some details.
+
+As decoding, Transaction reassembly and Snapshot building are interdependent
+they currently are implemented in the same patch:
+http://archives.postgresql.org/message-id/1347669575-14371-8-git-send-email-andres@2ndquadrant.com[Introduce
+wal decoding via catalog timetravel]
+
+That patch also includes a small demonstration that the approach works in the
+presence of DDL:
+
+[[example-of-decoding]]
+.Decoding example
+[NOTE]
+---------------------------
+/* just so we keep a sensible xmin horizon */
+ROLLBACK PREPARED 'f';
+BEGIN;
+CREATE TABLE keepalive();
+PREPARE TRANSACTION 'f';
+
+DROP TABLE IF EXISTS replication_example;
+
+SELECT pg_current_xlog_insert_location();
+CHECKPOINT;
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text
+varchar(120));
+begin;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+commit;
+
+
+ALTER TABLE replication_example ADD COLUMN bar int;
+
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+
+/* slightly more complex schema change, still no table rewrite */
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+commit;
+
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+
+/* complex schema change, changing types of existing column, rewriting the table */
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING
+(somenum::int4);
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+
+SELECT pg_current_xlog_insert_location();
+
+/* now decode what has been written to the WAL during that time */
+
+SELECT decode_xlog('0/1893D78', '0/18BE398');
+
+WARNING:  BEGIN
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  tuple is: id[int4]:1 somedata[int4]:1 text[varchar]:1
+WARNING:  tuple is: id[int4]:2 somedata[int4]:1 text[varchar]:2
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  tuple is: id[int4]:3 somedata[int4]:2 text[varchar]:1 bar[int4]:4
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  tuple is: id[int4]:4 somedata[int4]:2 text[varchar]:2 bar[int4]:4
+WARNING:  tuple is: id[int4]:5 somedata[int4]:2 text[varchar]:3 bar[int4]:4
+WARNING:  tuple is: id[int4]:6 somedata[int4]:2 text[varchar]:4 bar[int4]:
+(null)
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  tuple is: id[int4]:7 somedata[int4]:3 text[varchar]:1
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  tuple is: id[int4]:8 somedata[int4]:3 text[varchar]:2
+WARNING:  tuple is: id[int4]:9 somedata[int4]:3 text[varchar]:3
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  tuple is: id[int4]:10 somedata[int4]:4 somenum[varchar]:1
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  COMMIT
+WARNING:  BEGIN
+WARNING:  tuple is: id[int4]:11 somedata[int4]:5 somenum[int4]:1
+WARNING:  COMMIT
+
+---------------------------
+
+[[tx-reassembly]]
+=== TX reassembly ===
+
+In order to make usage of the decoded stream easy we want to present the user
+level code with a correctly ordered image of individual transactions at once
+because otherwise every user will have to reassemble transactions themselves.
+
+Transaction reassembly needs to solve several problems:
+
+1. changes inside a transaction can be interspersed with other transactions
+1. a top level transaction only knows which subtransactions belong to it when
+it reads the commit record
+1. individual user level actions can be smeared over multiple records (TOAST)
+
+Our proposed module solves 1) and 2) by building individual streams of records
+split by xid. While not fully implemented yet we plan to spill those individual
+xid streams to disk after a certain amount of memory is used. This can be
+implemented without any change in the external interface.
+
+As all the individual streams are already sorted by LSN by definition - we
+build them from the wal in a FIFO manner, and the position in the WAL is the
+definition of the LSN footnote:[the LSN is just the byte position int the WAL
+stream] - the individual changes can be merged efficiently by a k-way merge
+(without sorting!) by keeping the individual streams in a binary heap.
+
+To manipulate the binary heap a generic implementation is proposed. Several
+independent implementations of binary heaps already exist in the postgres code,
+but none of them is generic.  The patch is available at
+http://archives.postgresql.org/message-id/1347669575-14371-2-git-send-email-andres@2ndquadrant.com[Add
+minimal binary heap implementation].
+
+[NOTE]
+============
+The reassembly component was previously coined ApplyCache because it was
+proposed to run on replication consumers just before applying changes. This is
+not the case anymore.
+
+It is still called that way in the source of the patch recently submitted.
+============
+
+[[snapbuilder]]
+=== Snapshot building  ===
+
+To decode the contents of wal records describing data changes we need to decode
+and transform their contents. A single tuple is stored in a data structure
+called HeapTuple. As stored on disk that structure doesn't contain any
+information about the format of its contents.
+
+The basic problem is twofold:
+
+1. The wal records only contain the relfilenode not the relation oid of a table
+11. The relfilenode changes when an action performing a full table rewrite is performed
+1. To interpret a HeapTuple correctly the exact schema definition from back
+when the wal record was inserted into the wal stream needs to be available
+
+We chose to implement timetraveling access to the system catalog using
+postgres' MVCC nature & implementation because of the following advantages:
+
+* low amount of additional data in wal
+* genericity
+* similarity of implementation to Hot Standby, quite a bit of the infrastructure is reusable
+* all kinds of DDL can be handled in reliable manner
+* extensibility to user defined catalog like tables
+
+Timetravel access to the catalog means that we are able to look at the catalog
+just as it looked when changes were generated. That allows us to get the
+correct information about the contents of the aforementioned HeapTuple's so we
+can decode them reliably.
+
+Other solutions we thought about that fell through:
+* catalog only proxy instances that apply schema changes exactly to the point
+  were decoding using ``old fashioned'' wal replay
+* do the decoding on a 2nd machine, replicating all DDL exactly, rely on the catalog there
+* do not allow DDL at all
+* always add enough data into the WAL to allow decoding
+* build a fully versioned catalog
+
+The email thread available under
+http://archives.postgresql.org/message-id/201206211341.25322.andres@2ndquadrant.com[Catalog/Metadata
+consistency during changeset extraction from WAL] contains some details,
+advantages and disadvantages about the different possible implementations.
+
+How we build snapshots is somewhat intricate and complicated and seems to be
+out of scope for this document. We will provide a second document discussing
+the implementation in detail. Let's just assume it is possible from here on.
+
+[NOTE]
+Some details are already available in comments inside 'src/backend/replication/logical/snapbuild.{c,h}'.
+
+=== Output Plugin ===
+
+As already mentioned previously our aim is to make the implementation of output
+plugins as simple and non-redundant as possible as we expect several different
+ones with different use cases to emerge quickly. See <<basic-schema>> for a
+list of possible output plugins that we think might emerge.
+
+Although we for now only plan to tackle logical replication and based on that a
+multi-master implementation in the near future we definitely aim to provide all
+use-cases with something easily useable!
+
+To decode and translate local transaction an output plugin needs to be able to
+transform transactions as a whole so it can apply them as a meaningful
+transaction at the other side.
+
+What we do to provide that is, that very time we find a transaction commit and
+thus have completed reassembling the transaction we start to provide the
+individual changes to the output plugin. It currently only has to fill out 3
+callbacks:
+[options="header"]
+|=====================================================================================================================================
+|Callback |Passed Parameters                    |Called per TX  | Use
+|begin    |xid                                  |once           |Begin of a reassembled transaction
+|change   |xid, subxid, change, mvcc snapshot   |every change   |Gets passed every change so it can transform it to the target format
+|commit   |xid                                  |once           |End of a reassembled transaction
+|=====================================================================================================================================
+
+During each of those callback an appropriate timetraveling SnapshotNow snapshot
+is setup so the callbacks can perform all read-only catalog accesses they need,
+including using the sys/rel/catcache. For obvious reasons only read access is
+allowed.
+
+The snapshot guarantees that the result of lookups are be the same as they
+were/would have been when the change was originally created.
+
+Additionally they get passed a MVCC snapshot, to e.g. run sql queries on
+catalogs or similar.
+
+[IMPORTANT]
+============
+At the moment none of these snapshots can be used to access normal user
+tables. Adding additional tables to the allowed set is easy implementation
+wise, but every transaction changing such tables incurs a noticeably higher
+overhead.
+============
+
+For now transactions won't be decoded/output in parallel. There are ideas to
+improve on this, but we don't think the complexity is appropriate for the first
+release of this feature.
+
+This is an adoption barrier for databases where large amounts of data get
+loaded/written in one transaction.
+
+=== Setup of replication nodes ===
+
+When setting up a new standby/consumer of a primary some problem exist
+independent of the implementation of the consumer. The gist of the problem is
+that when making a base backup and starting to stream all changes since that
+point transactions that were running during all this cannot be included:
+
+* Transaction that have not committed before starting to dump a database are
+  invisible to the dumping process
+
+* Transactions that began before the point from which on the WAL is being
+  decoded are incomplete and cannot be replayed
+
+Our proposal for a solution to this is to detect points in the WAL stream where we can provide:
+
+. A snapshot exported similarly to pg_export_snapshot() footnote:[http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION] that can be imported with +SET TRANSACTION SNAPSHOT+ footnote:[http://www.postgresql.org/docs/devel/static/sql-set-transaction.html]
+. A stream of changes that will include the complete data of all transactions seen as running by the snapshot generated in 1)
+
+See the diagram.
+
+[[setup-schema]]
+.Control flow during setup of a new node
+["ditaa",scaling="0.7"]
+------------------------------------------------------------------------------
++----------------+
+| Walsender  |   |                               +------------+
+|            v   |                               | Consumer   |
++-------------+  |<--IDENTIFY_SYSTEM-------------|            |
+|     WAL     |  |                               |            |
+|  decoding   |  |----....---------------------->|            |
++------+------/  |                               |            |
+|            |   |                               |            |
+|            v   |                               |            |
++-------------+  |<--INIT_LOGICAL $PLUGIN--------|            |
+|     TX      |  |                               |            |
+| reassembly  |  |---FOUND_STARTING %X/%X------->|            |
++-------------/  |                               |            |
+|            |   |---FOUND_CONSISTENT %X/%X----->|            |
+|            v   |---pg_dump snapshot----------->|            |
++-------------+  |---replication slot %P-------->|            |
+|   Output    |  |                               |            |
+|   Plugin    |  |    ^                          |            |
++-------------/  |    |                          |            |
+|                |    +-run pg_dump separately --|            |
+|                |                               |            |
+|                |<--STREAM_DATA-----------------|            |
+|                |                               |            |
+|                |---data ---------------------->|            |
+|                |                               |            |
+|                |                               |            |
+|                |  ---- SHUTDOWN -------------  |            |
+|                |                               |            |
+|                |                               |            |
+|                |<--RESTART_LOGICAL $PLUGIN %P--|            |
+|                |                               |            |
+|                |---data----------------------->|            |
+|                |                               |            |
+|                |                               |            |
++----------------+                               +------------+
+
+------------------------------------------------------------------------------
+
+=== Disadvantages of the approach ===
+
+* somewhat intricate code for snapshot timetravel
+* output plugins/walsenders need to work per database as they access the catalog
+* when sending to multiple standbys some work is done multiple times
+* decoding/applying multiple transactions in parallel is somewhat hard
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 310a45c..6fae278 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -17,3 +17,9 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
 OBJS = decode.o logical.o logicalfuncs.o reorderbuffer.o snapbuild.o
 
 include $(top_srcdir)/src/backend/common.mk
+
+DESIGN.pdf: DESIGN.txt
+	a2x -v --fop -f pdf -D $(shell pwd) $<
+
+README.SNAPBUILD.pdf: README.SNAPBUILD.txt
+	a2x -v --fop -f pdf -D $(shell pwd) $<
diff --git a/src/backend/replication/logical/README.SNAPBUILD.txt b/src/backend/replication/logical/README.SNAPBUILD.txt
new file mode 100644
index 0000000..b6c7470
--- /dev/null
+++ b/src/backend/replication/logical/README.SNAPBUILD.txt
@@ -0,0 +1,241 @@
+= Snapshot Building =
+:author: Andres Freund, 2nQuadrant Ltd
+
+== Why do we need timetravel catalog access ==
+
+When doing WAL decoding (see DESIGN.txt for reasons to do so), we need to know
+how the catalog looked at the point a record was inserted into the WAL, because
+without that information we don't know much more about the record other than
+its length.  It's just an arbitrary bunch of bytes without further information.
+Unfortunately, due the possibility that the table definition might change we
+cannot just access a newer version of the catalog and assume the table
+definition continues to be the same.
+
+If only the type information were required, it might be enough to annotate the
+wal records with a bit more information (table oid, table name, column name,
+column type) --- but as we want to be able to convert the output to more useful
+formats such as text, we additionally need to be able to call output functions.
+Those need a normal environment including the usual caches and normal catalog
+access to lookup operators, functions and other types.
+
+Our solution to this is to add the capability to access the catalog such as it
+was at the time the record was inserted into the WAL. The locking used during
+WAL generation guarantees the catalog is/was in a consistent state at that
+point.  We call this 'time-travel catalog access'.
+
+Interesting cases include:
+
+- enums
+- composite types
+- extension types
+- non-C functions
+- relfilenode to table OID mapping
+
+Due to postgres' non-overwriting storage manager, regular modifications of a
+table's content are theoretically non-destructive. The problem is that there is
+no way to access an arbitrary point in time even if the data for it is there.
+
+This module adds the capability to do so in the very limited set of
+circumstances we need it in for WAL decoding. It does *not* provide a general
+time-travelling facility.
+
+A 'Snapshot' is the data structure used in postgres to describe which tuples
+are visible and which are not. We need to build a Snapshot which can be used to
+access the catalog the way it looked when the wal record was inserted.
+
+Restrictions:
+
+- Only works for catalog tables or tables explicitly marked as such.
+- Snapshot modifications are somewhat expensive
+- it cannot build initial visibility information for every point in time, it
+  needs a specific circumstances to start.
+
+== How are time-travel snapshots built ==
+
+'Hot Standby' added infrastructure to build snapshots from WAL during recovery in
+the 9.0 release. Most of that can be reused for our purposes.
+
+We cannot reuse all of the hot standby infrastructure because:
+
+- we are not in recovery
+- we need to look at interim states *inside* a transaction
+- we need the capability to have multiple different snapshots arround at the same time
+
+Normally the catalog is accessed using SnapshotNow which can legally be
+replaced by SnapshotMVCC that has been taken at the start of a scan. So catalog
+timetravel contains infrastructure to make SnapshotNow catalog access use
+appropriate MVCC snapshots. They aren't generated with GetSnapshotData()
+though, but reassembled from WAL contents.
+
+We collect our data in a normal struct SnapshotData, repurposing some fields
+creatively:
+
+- +Snapshot->xip+ contains all transaction we consider committed
+- +Snapshot->subxip+ contains all transactions belonging to our transaction,
+  including the toplevel one
+- +Snapshot->active_count+ is used as a refcount
+
+The meaning of +xip+ is inverted in comparison with non-timetravel snapshots in
+the sense that members of the array are the committed transactions, not the in
+progress ones. Because usually only a tiny percentage of comitted transactions
+will have modified the catalog between xmin and xmax this allows us to keep the
+array small in the usual cases. It also makes subtransaction handling easier
+since we neither need to query pg_subtrans (which we couldn't anyway since it's
+truncated at restart) nor have problems with suboverflowed snapshots.
+
+== Building of initial snapshot ==
+
+We can start building an initial snapshot as soon as we find either an
++XLOG_RUNNING_XACTS+ or an +XLOG_CHECKPOINT_SHUTDOWN+ record because they allow us
+to know how many transactions are running.
+
+We need to know which transactions were running when we start to build a
+snapshot/start decoding as we don't have enough information about them (they
+could have done catalog modifications before we started watching). Also, we
+wouldn't have the complete contents of those transactions, because we started
+reading after they began.  (The latter is also important when building
+snapshots that can be used to build a consistent initial clone.)
+
+There also is the problem that +XLOG_RUNNING_XACT+ records can be
+'suboverflowed' which means there were more running subtransactions than
+fitting into shared memory. In that case we use the same incremental building
+trick hot standby uses which is either
+
+1. wait till further +XLOG_RUNNING_XACT+ records have a running->oldestRunningXid
+after the initial xl_runnign_xacts->nextXid
+2. wait for a further +XLOG_RUNNING_XACT+ that is not overflowed or
+a +XLOG_CHECKPOINT_SHUTDOWN+
+
+When we start building a snapshot we are in the +SNAPBUILD_START+ state. As
+soon as we find any visibility information, even if incomplete, we change to
++SNAPBUILD_INITIAL_POINT+.
+
+When we have collected enough information to decode any transaction starting
+after that point in time we fall over to +SNAPBUILD_FULL_SNAPSHOT+. If those
+transactions commit before the next state is reached, we throw their complete
+contents away.
+
+As soon as all transactions that were running when we switched over to
++SNAPBUILD_FULL_SNAPSHOT+ commit, we change state to +SNAPBUILD_CONSISTENT+.
+Every transaction that commits from now on gets handed to the output plugin.
+When doing the switch to +SNAPBUILD_CONSISTENT+ we optionally export a snapshot
+which makes all transactions that committed up to this point visible.  This
+exported snapshot can be used to run pg_dump; replaying all changes emitted
+by the output plugin on a database restored from such a dump will result in
+a consistent clone.
+
+["ditaa",scaling="0.8"]
+---------------
+
+        +-------------------------+
+   +----|SNAPBUILD_START          |-------------+
+   |    +-------------------------+             |
+   |                 |                          |
+   |                 |                          |
+   |     running_xacts with running xacts       |
+   |                 |                          |
+   |                 |                          |
+   |                 v                          |
+   |    +-------------------------+             v
+   |    |SNAPBUILD_FULL_SNAPSHOT  |------------>|
+   |    +-------------------------+             |
+XLOG_RUNNING_XACTS   |                      saved snapshot
+  with zero xacts    |                 at running_xacts's lsn
+   |                 |                          |
+   |     all running toplevel TXNs finished     |
+   |                 |                          |
+   |                 v                          |
+   |    +-------------------------+             |
+   +--->|SNAPBUILD_CONSISTENT     |<------------+
+        +-------------------------+
+
+---------------
+
+== Snapshot Management ==
+
+Whenever a transaction is detected as having started during decoding in
++SNAPBUILD_FULL_SNAPSHOT+ state, we distribute the currently maintained
+snapshot to it (i.e. call ReorderBufferSetBaseSnapshot). This serves as its
+initial snapshot. Unless there are concurrent catalog changes that snapshot
+will be used for the decoding the entire transaction's changes.
+
+Whenever a transaction-with-catalog-changes commits, we iterate over all
+concurrently active transactions and add a new SnapshotNow to it
+(ReorderBufferAddSnapshot(current_lsn)). This is required because any row
+written from now that point on will have used the changed catalog contents.
+
+When decoding a transaction that made catalog changes itself we tell that
+transaction that (ReorderBufferAddNewCommandId(current_lsn)) which will cause
+the decoding to use the appropriate command id from that point on.
+
+SnapshotNow's need to be setup globally so the syscache and other pieces access
+it transparently. This is done using two new tqual.h functions:
+SetupDecodingSnapshots() and RevertFromDecodingSnapshots().
+
+== Catalog/User Table Detection ==
+
+Since we only want to store committed transactions that actually modified the
+catalog we need a way to detect that from WAL:
+
+Right now, we assume that every transaction that commits before we reach
++SNAPBUILD_CONSISTENT+ state has made catalog modifications since we can't rely
+on having seen the entire transaction before that. That's not harmful beside
+incurring some price in memory usage and runtime.
+
+After having reached consistency we recognize catalog modifying transactions
+via HEAP2_NEW_CID and HEAP_INPLACE that are logged by catalog modifying
+actions.
+
+== mixed DDL/DML transaction handling  ==
+
+When a transactions uses DDL and DML in the same transaction things get a bit
+more complicated because we need to handle CommandIds and ComboCids as we need
+to use the correct version of the catalog when decoding the individual tuples.
+
+For that we emit the new HEAP2_NEW_CID records which contain the physical tuple
+location, cmin and cmax when the catalog is modified. If we need to detect
+visibility of a catalog tuple that has been modified in our own transaction -
+which we can detect via xmin/xmax - we look in a hash table using the location
+as key to get correct cmin/cmax values.
+From those values we can also extract the commandid that generated the record.
+
+All this only needs to happen in the transaction performing the DDL.
+
+== Cache Handling ==
+
+As we allow usage of the normal {sys,cat,rel,..}cache we also need to integrate
+cache invalidation. For transactions that only do DDL thats easy as everything
+is already provided by HS. Everytime we read a commit record we apply the
+sinval messages contained therein.
+
+For transactions that contain DDL and DML cache invalidation needs to happen
+more frequently because we need to all tore down all caches that just got
+modified. To do that we simply apply all invalidation messages that got
+collected at the end of transaction and apply them everytime we've decoded
+single change. At some point this can get optimized by generating new local
+invalidation messages, but that seems too complicated for now.
+
+XXX: talk about syscache handling of relmapped relation.
+
+== xmin Horizon Handling ==
+
+Reusing MVCC for timetravel access has one obvious major problem: VACUUM. Rows
+we still need for decoding cannot be removed but at the same time we cannot
+keep data in the catalog indefinitely.
+
+For that we peg the xmin horizon that's used to decide which rows can be
+removed. We only need to prevent removal of those rows for catalog like
+relations, not for all user tables. For that reason a separate xmin horizon
+RecentGlobalDataXmin got introduced.
+
+Since we need to persist that knowledge across restarts we keep the xmin for a
+in the logical slots which are safed in a crashsafe manner. They are restored
+from disk into memory at server startup.
+
+== Restartable Decoding ==
+
+As we want to generate a consistent stream of changes we need to have the
+ability to start from a previously decoded location without waiting possibly
+very long to reach consistency. For that reason we dump the current visibility
+information to disk everytime we read an xl_running_xacts record.
+
-- 
1.8.2.rc2.4.g7799588.dirty

0001-Add-support-for-multiple-kinds-of-external-toast-dat.patchtext/x-patch; charset=us-asciiDownload
>From 654e24e9a615dcacea4d9714cf8cdbf6953983d5 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 01/17] Add support for multiple kinds of external toast datums

There are several usecases where our current representation of external toast
datums is limiting:
* adding new compression schemes
* avoidance of repeated detoasting
* externally decoded toast tuples

For that support 'tags' on external (varattrib_1b_e) varlenas which recoin the
current va_len_1be field to store the tag (or type) of a varlena. To determine
the actual length a macro VARTAG_SIZE(tag) is added which can be used to map
from a tag to the actual length.

This patch adds support for 'indirect' tuples which point to some externally
allocated memory containing a toast tuple. It also implements the stub for a
different compression algorithm.
---
 src/backend/access/heap/tuptoaster.c | 100 +++++++++++++++++++++++++++++++----
 src/include/c.h                      |   2 +
 src/include/postgres.h               |  83 +++++++++++++++++++++--------
 3 files changed, 153 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..99044d0 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -128,7 +128,7 @@ heap_tuple_fetch_attr(struct varlena * attr)
 struct varlena *
 heap_tuple_untoast_attr(struct varlena * attr)
 {
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/*
 		 * This is an externally stored datum --- fetch it back from there
@@ -145,6 +145,15 @@ heap_tuple_untoast_attr(struct varlena * attr)
 			pfree(tmp);
 		}
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *)redirect.pointer;
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		attr = heap_tuple_untoast_attr(attr);
+	}
 	else if (VARATT_IS_COMPRESSED(attr))
 	{
 		/*
@@ -191,7 +200,7 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 	char	   *attrdata;
 	int32		attrsize;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		struct varatt_external toast_pointer;
 
@@ -204,6 +213,13 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 		/* fetch it back (compressed marker will get set automatically) */
 		preslice = toast_fetch_datum(attr);
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		return heap_tuple_untoast_attr_slice(redirect.pointer,
+											 sliceoffset, slicelength);
+	}
 	else
 		preslice = attr;
 
@@ -267,7 +283,7 @@ toast_raw_datum_size(Datum value)
 	struct varlena *attr = (struct varlena *) DatumGetPointer(value);
 	Size		result;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/* va_rawsize is the size of the original datum -- including header */
 		struct varatt_external toast_pointer;
@@ -275,6 +291,13 @@ toast_raw_datum_size(Datum value)
 		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 		result = toast_pointer.va_rawsize;
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect toast_pointer;
+
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		return toast_raw_datum_size(PointerGetDatum(toast_pointer.pointer));
+	}
 	else if (VARATT_IS_COMPRESSED(attr))
 	{
 		/* here, va_rawsize is just the payload size */
@@ -308,7 +331,7 @@ toast_datum_size(Datum value)
 	struct varlena *attr = (struct varlena *) DatumGetPointer(value);
 	Size		result;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/*
 		 * Attribute is stored externally - return the extsize whether
@@ -320,6 +343,13 @@ toast_datum_size(Datum value)
 		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 		result = toast_pointer.va_extsize;
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect toast_pointer;
+
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		return toast_datum_size(PointerGetDatum(toast_pointer.pointer));
+	}
 	else if (VARATT_IS_SHORT(attr))
 	{
 		result = VARSIZE_SHORT(attr);
@@ -387,12 +417,56 @@ toast_delete(Relation rel, HeapTuple oldtup)
 		{
 			Datum		value = toast_values[i];
 
-			if (!toast_isnull[i] && VARATT_IS_EXTERNAL(PointerGetDatum(value)))
+			if (toast_isnull[i])
+				continue;
+			else if (VARATT_IS_EXTERNAL_ONDISK(PointerGetDatum(value)))
 				toast_delete_datum(rel, value);
+			else if (VARATT_IS_EXTERNAL_INDIRECT(PointerGetDatum(value)))
+				elog(ERROR, "cannot delete tuples with indirect toast tuples for now");
 		}
 	}
 }
 
+/* ----------
+ * toast_datum_differs -
+ *
+ *  Determine whether two toasted datums are the same and don't have to be
+ *  stored again.
+ * ----------
+ */
+static bool
+toast_datum_differs(struct varlena *old_value, struct varlena *new_value)
+{
+	Assert(VARATT_IS_EXTERNAL(old_value));
+	Assert(VARATT_IS_EXTERNAL(new_value));
+
+	/* fast path for the common case where we have the toast oid available */
+	if (VARATT_IS_EXTERNAL_ONDISK(old_value) &&
+		VARATT_IS_EXTERNAL_ONDISK(new_value))
+		return memcmp((char *) old_value, (char *) new_value,
+					  VARSIZE_EXTERNAL(old_value)) != 0;
+
+	/*
+	 * compare size of tuples, so we don't uselessly detoast/decompress tuples
+	 * if they can't be the same anyway.
+	 */
+	if (toast_raw_datum_size(PointerGetDatum(old_value)) !=
+		toast_raw_datum_size(PointerGetDatum(new_value)))
+		return false;
+
+	old_value = heap_tuple_untoast_attr(old_value);
+	new_value = heap_tuple_untoast_attr(new_value);
+
+	Assert(!VARATT_IS_EXTERNAL(old_value));
+	Assert(!VARATT_IS_EXTERNAL(new_value));
+	Assert(!VARATT_IS_COMPRESSED(old_value));
+	Assert(!VARATT_IS_COMPRESSED(new_value));
+	Assert(VARSIZE_ANY_EXHDR(old_value) == VARSIZE_ANY_EXHDR(new_value));
+
+	/* compare payload, we're fine with unaligned data */
+	return memcmp(VARDATA_ANY(old_value), VARDATA_ANY(new_value),
+				  VARSIZE_ANY_EXHDR(old_value)) != 0;
+}
 
 /* ----------
  * toast_insert_or_update -
@@ -497,8 +571,7 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 				VARATT_IS_EXTERNAL(old_value))
 			{
 				if (toast_isnull[i] || !VARATT_IS_EXTERNAL(new_value) ||
-					memcmp((char *) old_value, (char *) new_value,
-						   VARSIZE_EXTERNAL(old_value)) != 0)
+					toast_datum_differs(old_value, new_value))
 				{
 					/*
 					 * The old external stored value isn't needed any more
@@ -1258,6 +1331,8 @@ toast_save_datum(Relation rel, Datum value,
 	int32		data_todo;
 	Pointer		dval = DatumGetPointer(value);
 
+	Assert(!VARATT_IS_EXTERNAL(value));
+
 	/*
 	 * Open the toast relation and its index.  We can use the index to check
 	 * uniqueness of the OID we assign to the toasted item, even though it has
@@ -1341,7 +1416,7 @@ toast_save_datum(Relation rel, Datum value,
 		{
 			struct varatt_external old_toast_pointer;
 
-			Assert(VARATT_IS_EXTERNAL(oldexternal));
+			Assert(VARATT_IS_EXTERNAL_ONDISK(oldexternal));
 			/* Must copy to access aligned fields */
 			VARATT_EXTERNAL_GET_POINTER(old_toast_pointer, oldexternal);
 			if (old_toast_pointer.va_toastrelid == rel->rd_toastoid)
@@ -1456,7 +1531,7 @@ toast_save_datum(Relation rel, Datum value,
 	 * Create the TOAST pointer value that we'll return
 	 */
 	result = (struct varlena *) palloc(TOAST_POINTER_SIZE);
-	SET_VARSIZE_EXTERNAL(result, TOAST_POINTER_SIZE);
+	SET_VARTAG_EXTERNAL(result, VARTAG_ONDISK);
 	memcpy(VARDATA_EXTERNAL(result), &toast_pointer, sizeof(toast_pointer));
 
 	return PointerGetDatum(result);
@@ -1483,6 +1558,8 @@ toast_delete_datum(Relation rel, Datum value)
 	if (!VARATT_IS_EXTERNAL(attr))
 		return;
 
+	Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
 	/* Must copy to access aligned fields */
 	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
@@ -1608,6 +1685,9 @@ toast_fetch_datum(struct varlena * attr)
 	char	   *chunkdata;
 	int32		chunksize;
 
+	if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+		elog(ERROR, "shouldn't be called this way");
+
 	/* Must copy to access aligned fields */
 	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
@@ -1775,7 +1855,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
 	int32		chcpystrt;
 	int32		chcpyend;
 
-	Assert(VARATT_IS_EXTERNAL(attr));
+	Assert(VARATT_IS_EXTERNAL_ONDISK(attr));
 
 	/* Must copy to access aligned fields */
 	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
diff --git a/src/include/c.h b/src/include/c.h
index f2c9e12..7193af6 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -573,6 +573,8 @@ typedef NameData *Name;
 #define AssertMacro(condition)	((void)true)
 #define AssertArg(condition)
 #define AssertState(condition)
+#define TrapMacro(condition, errorType)	(true)
+
 #elif defined(FRONTEND)
 
 #include <assert.h>
diff --git a/src/include/postgres.h b/src/include/postgres.h
index 30e1dee..d982e93 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -54,23 +54,52 @@
  */
 
 /*
- * struct varatt_external is a "TOAST pointer", that is, the information
- * needed to fetch a stored-out-of-line Datum.	The data is compressed
- * if and only if va_extsize < va_rawsize - VARHDRSZ.  This struct must not
- * contain any padding, because we sometimes compare pointers using memcmp.
+ * struct varatt_external is a "TOAST pointer", that is, the information needed
+ * to fetch a Datum stored in an out-of-line on-disk Datum. The data is
+ * compressed if and only if va_extsize < va_rawsize - VARHDRSZ.  This struct
+ * must not contain any padding, because we sometimes compare pointers using
+ * memcmp.
  *
  * Note that this information is stored unaligned within actual tuples, so
  * you need to memcpy from the tuple into a local struct variable before
  * you can look at these fields!  (The reason we use memcmp is to avoid
  * having to do that just to detect equality of two TOAST pointers...)
  */
-struct varatt_external
+typedef struct varatt_external
 {
 	int32		va_rawsize;		/* Original data size (includes header) */
 	int32		va_extsize;		/* External saved size (doesn't) */
 	Oid			va_valueid;		/* Unique ID of value within TOAST table */
 	Oid			va_toastrelid;	/* RelID of TOAST table containing it */
-};
+} varatt_external;
+
+/*
+ * Out-of-line Datum thats stored in memory in contrast to varatt_external
+ * pointers which points to data in an external toast relation.
+ *
+ * Note that just as varatt_external's this is stored unaligned within the
+ * tuple.
+ */
+typedef struct varatt_indirect
+{
+	struct varlena *pointer;	/* Pointer to in-memory varlena */
+} varatt_indirect;
+
+
+/*
+ * Type of external toast datum stored. The peculiar value for VARTAG_ONDISK
+ * comes from the requirement for on-disk compatibility with the older
+ * definitions of varattrib_1b_e where v_tag was named va_len_1be...
+ */
+typedef enum vartag_external {
+	VARTAG_INDIRECT = 1,
+	VARTAG_ONDISK = 18
+} vartag_external;
+
+#define VARTAG_SIZE(tag) \
+	((tag) == VARTAG_INDIRECT ? sizeof(varatt_indirect) :		\
+	 (tag) == VARTAG_ONDISK ? sizeof(varatt_external) : \
+	 TrapMacro(false, "unknown vartag"))
 
 /*
  * These structs describe the header of a varlena object that may have been
@@ -102,11 +131,12 @@ typedef struct
 	char		va_data[1];		/* Data begins here */
 } varattrib_1b;
 
+/* inline portion of a short varlena pointing to an external resource */
 typedef struct
 {
 	uint8		va_header;		/* Always 0x80 or 0x01 */
-	uint8		va_len_1be;		/* Physical length of datum */
-	char		va_data[1];		/* Data (for now always a TOAST pointer) */
+	uint8		va_tag;			/* Type of datum */
+	char		va_data[1];		/* Data (of the type indicated by va_tag) */
 } varattrib_1b_e;
 
 /*
@@ -130,6 +160,9 @@ typedef struct
  * first byte.	Also, it is not possible for a 1-byte length word to be zero;
  * this lets us disambiguate alignment padding bytes from the start of an
  * unaligned datum.  (We now *require* pad bytes to be filled with zero!)
+ *
+ * In TOAST datums the tag field in varattrib_1b_e is used to discern whether
+ * its an indirection pointer or more commonly an on-disk tuple.
  */
 
 /*
@@ -161,8 +194,8 @@ typedef struct
 	(((varattrib_4b *) (PTR))->va_4byte.va_header & 0x3FFFFFFF)
 #define VARSIZE_1B(PTR) \
 	(((varattrib_1b *) (PTR))->va_header & 0x7F)
-#define VARSIZE_1B_E(PTR) \
-	(((varattrib_1b_e *) (PTR))->va_len_1be)
+#define VARTAG_1B_E(PTR) \
+	(((varattrib_1b_e *) (PTR))->va_tag)
 
 #define SET_VARSIZE_4B(PTR,len) \
 	(((varattrib_4b *) (PTR))->va_4byte.va_header = (len) & 0x3FFFFFFF)
@@ -170,9 +203,9 @@ typedef struct
 	(((varattrib_4b *) (PTR))->va_4byte.va_header = ((len) & 0x3FFFFFFF) | 0x40000000)
 #define SET_VARSIZE_1B(PTR,len) \
 	(((varattrib_1b *) (PTR))->va_header = (len) | 0x80)
-#define SET_VARSIZE_1B_E(PTR,len) \
+#define SET_VARTAG_1B_E(PTR,tag) \
 	(((varattrib_1b_e *) (PTR))->va_header = 0x80, \
-	 ((varattrib_1b_e *) (PTR))->va_len_1be = (len))
+	 ((varattrib_1b_e *) (PTR))->va_tag = (tag))
 #else							/* !WORDS_BIGENDIAN */
 
 #define VARATT_IS_4B(PTR) \
@@ -193,8 +226,8 @@ typedef struct
 	((((varattrib_4b *) (PTR))->va_4byte.va_header >> 2) & 0x3FFFFFFF)
 #define VARSIZE_1B(PTR) \
 	((((varattrib_1b *) (PTR))->va_header >> 1) & 0x7F)
-#define VARSIZE_1B_E(PTR) \
-	(((varattrib_1b_e *) (PTR))->va_len_1be)
+#define VARTAG_1B_E(PTR) \
+	(((varattrib_1b_e *) (PTR))->va_tag)
 
 #define SET_VARSIZE_4B(PTR,len) \
 	(((varattrib_4b *) (PTR))->va_4byte.va_header = (((uint32) (len)) << 2))
@@ -202,12 +235,12 @@ typedef struct
 	(((varattrib_4b *) (PTR))->va_4byte.va_header = (((uint32) (len)) << 2) | 0x02)
 #define SET_VARSIZE_1B(PTR,len) \
 	(((varattrib_1b *) (PTR))->va_header = (((uint8) (len)) << 1) | 0x01)
-#define SET_VARSIZE_1B_E(PTR,len) \
+#define SET_VARTAG_1B_E(PTR,tag) \
 	(((varattrib_1b_e *) (PTR))->va_header = 0x01, \
-	 ((varattrib_1b_e *) (PTR))->va_len_1be = (len))
+	 ((varattrib_1b_e *) (PTR))->va_tag = (tag))
 #endif   /* WORDS_BIGENDIAN */
 
-#define VARHDRSZ_SHORT			1
+#define VARHDRSZ_SHORT			offsetof(varattrib_1b, va_data)
 #define VARATT_SHORT_MAX		0x7F
 #define VARATT_CAN_MAKE_SHORT(PTR) \
 	(VARATT_IS_4B_U(PTR) && \
@@ -215,7 +248,7 @@ typedef struct
 #define VARATT_CONVERTED_SHORT_SIZE(PTR) \
 	(VARSIZE(PTR) - VARHDRSZ + VARHDRSZ_SHORT)
 
-#define VARHDRSZ_EXTERNAL		2
+#define VARHDRSZ_EXTERNAL		offsetof(varattrib_1b_e, va_data)
 
 #define VARDATA_4B(PTR)		(((varattrib_4b *) (PTR))->va_4byte.va_data)
 #define VARDATA_4B_C(PTR)	(((varattrib_4b *) (PTR))->va_compressed.va_data)
@@ -249,26 +282,32 @@ typedef struct
 #define VARSIZE_SHORT(PTR)					VARSIZE_1B(PTR)
 #define VARDATA_SHORT(PTR)					VARDATA_1B(PTR)
 
-#define VARSIZE_EXTERNAL(PTR)				VARSIZE_1B_E(PTR)
+#define VARTAG_EXTERNAL(PTR)				VARTAG_1B_E(PTR)
+#define VARSIZE_EXTERNAL(PTR)				(VARHDRSZ_EXTERNAL + VARTAG_SIZE(VARTAG_EXTERNAL(PTR)))
 #define VARDATA_EXTERNAL(PTR)				VARDATA_1B_E(PTR)
 
 #define VARATT_IS_COMPRESSED(PTR)			VARATT_IS_4B_C(PTR)
 #define VARATT_IS_EXTERNAL(PTR)				VARATT_IS_1B_E(PTR)
+#define VARATT_IS_EXTERNAL_ONDISK(PTR) \
+	(VARATT_IS_EXTERNAL(PTR) && VARTAG_EXTERNAL(PTR) == VARTAG_ONDISK)
+#define VARATT_IS_EXTERNAL_INDIRECT(PTR) \
+	(VARATT_IS_EXTERNAL(PTR) && VARTAG_EXTERNAL(PTR) == VARTAG_INDIRECT)
 #define VARATT_IS_SHORT(PTR)				VARATT_IS_1B(PTR)
 #define VARATT_IS_EXTENDED(PTR)				(!VARATT_IS_4B_U(PTR))
 
 #define SET_VARSIZE(PTR, len)				SET_VARSIZE_4B(PTR, len)
 #define SET_VARSIZE_SHORT(PTR, len)			SET_VARSIZE_1B(PTR, len)
 #define SET_VARSIZE_COMPRESSED(PTR, len)	SET_VARSIZE_4B_C(PTR, len)
-#define SET_VARSIZE_EXTERNAL(PTR, len)		SET_VARSIZE_1B_E(PTR, len)
+
+#define SET_VARTAG_EXTERNAL(PTR, tag)		SET_VARTAG_1B_E(PTR, tag)
 
 #define VARSIZE_ANY(PTR) \
-	(VARATT_IS_1B_E(PTR) ? VARSIZE_1B_E(PTR) : \
+	(VARATT_IS_1B_E(PTR) ? VARSIZE_EXTERNAL(PTR) : \
 	 (VARATT_IS_1B(PTR) ? VARSIZE_1B(PTR) : \
 	  VARSIZE_4B(PTR)))
 
 #define VARSIZE_ANY_EXHDR(PTR) \
-	(VARATT_IS_1B_E(PTR) ? VARSIZE_1B_E(PTR)-VARHDRSZ_EXTERNAL : \
+	(VARATT_IS_1B_E(PTR) ? VARSIZE_EXTERNAL(PTR)-VARHDRSZ_EXTERNAL : \
 	 (VARATT_IS_1B(PTR) ? VARSIZE_1B(PTR)-VARHDRSZ_SHORT : \
 	  VARSIZE_4B(PTR)-VARHDRSZ))
 
-- 
1.8.2.rc2.4.g7799588.dirty

0002-wal_decoding-Add-pg_xlog_wait_remote_-apply-receive-.patchtext/x-patch; charset=us-asciiDownload
>From d86b884c00fbb0eb52523b322c6d4cb83e0e351f Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen <ams@2ndQuadrant.com>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 02/17] wal_decoding: Add pg_xlog_wait_remote_{apply,receive}
 functions

We want to use these in isolationtester tests, but they're more
generally useful for "inter-node synchronisation".
---
 src/backend/replication/walsender.c | 73 +++++++++++++++++++++++++++++++++++++
 src/include/catalog/pg_proc.h       |  5 +++
 src/include/replication/walsender.h |  2 +
 3 files changed, 80 insertions(+)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 717cbfd..9f5f766 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2083,3 +2083,76 @@ GetOldestWALSendPointer(void)
 }
 
 #endif
+
+static XLogRecPtr
+text_to_xlogrecptr(text *str)
+{
+	uint32 hi, lo;
+	char *pos = text_to_cstring(str);
+
+	if (sscanf(pos, "%X/%X", &hi, &lo) != 2)
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("could not parse transaction log location \"%s\"",
+                        pos)));
+
+	return ((uint64) hi) << 32 | lo;
+}
+
+static void
+wait_for_remote_lsn(int32 pid, XLogRecPtr ptr, bool wait_for_apply)
+{
+	int i;
+	bool done;
+
+	do {
+		done = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid != 0 && (pid == 0 || pid == walsnd->pid))
+			{
+				XLogRecPtr rptr = wait_for_apply ? walsnd->apply : walsnd->flush;
+				if (rptr < ptr)
+					done = false;
+			}
+
+			SpinLockRelease(&walsnd->mutex);
+
+			if (!done)
+				break;
+		}
+
+		if (!done)
+			pg_usleep(10*1000);
+	}
+	while (!done);
+}
+
+Datum
+pg_xlog_wait_remote_apply(PG_FUNCTION_ARGS)
+{
+	text *pos = PG_GETARG_TEXT_P(0);
+	int32 pid = PG_GETARG_INT32(1);
+
+	XLogRecPtr startpos = text_to_xlogrecptr(pos);
+	wait_for_remote_lsn(pid, startpos, true);
+
+	PG_RETURN_VOID();
+}
+
+Datum
+pg_xlog_wait_remote_receive(PG_FUNCTION_ARGS)
+{
+	text *pos = PG_GETARG_TEXT_P(0);
+	int32 pid = PG_GETARG_INT32(1);
+
+	XLogRecPtr startpos = text_to_xlogrecptr(pos);
+	wait_for_remote_lsn(pid, startpos, false);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b5be075..6d3d702 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4722,6 +4722,11 @@ DATA(insert OID = 3473 (  spg_range_quad_leaf_consistent	PGNSP PGUID 12 1 0 0 0
 DESCR("SP-GiST support for quad tree over range");
 
 
+DATA(insert OID = 3781 (  pg_xlog_wait_remote_apply PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2278 "25 23" _null_ _null_ _null_ _null_ pg_xlog_wait_remote_apply _null_ _null_ _null_ ));
+DESCR("wait for an lsn to be applied by a remote node");
+DATA(insert OID = 3782 (  pg_xlog_wait_remote_receive PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2278 "25 23" _null_ _null_ _null_ _null_ pg_xlog_wait_remote_receive _null_ _null_ _null_ ));
+DESCR("wait for an lsn to be received by a remote node");
+
 /* event triggers */
 DATA(insert OID = 3566 (  pg_event_trigger_dropped_objects		PGNSP PGUID 12 10 100 0 0 f f f f t t s 0 0 2249 "" "{26,26,23,25,25,25,25}" "{o,o,o,o,o,o,o}" "{classid, objid, objsubid, object_type, schema_name, object_name, object_identity}" _null_ pg_event_trigger_dropped_objects _null_ _null_ _null_ ));
 DESCR("list objects dropped by the current command");
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2cc7ddf..84a418a 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -35,6 +35,8 @@ extern void WalSndWakeup(void);
 extern void WalSndRqstFileReload(void);
 
 extern Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS);
+extern Datum pg_xlog_wait_remote_apply(PG_FUNCTION_ARGS);
+extern Datum pg_xlog_wait_remote_receive(PG_FUNCTION_ARGS);
 
 /*
  * Remember that we want to wakeup walsenders later
-- 
1.8.2.rc2.4.g7799588.dirty

0003-wal_decoding-Add-a-new-RELFILENODE-syscache-to-fetch.patchtext/x-patch; charset=us-asciiDownload
>From 6ee904e27e4e01c4e46f671fc807ece5da40ff28 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 03/17] wal_decoding: Add a new RELFILENODE syscache to fetch a
 pg_class entry via (reltablespace, relfilenode)

This cache is theoretically problematic because formally indexes used by
syscaches needs to be unique, this one is not. This is "just" because of
0/InvalidOid are stored in pg_class.relfilenode for nailed/shared catalog
relations. This syscache will never be queried for InvalidOid relfilenodes
however so it seems to be safe even if it bends the rules somewhat.

It might be nicer to add infrastructure to do this properly, like using a
partial index, its not clear what the best way to do this is though and the
benefit very well might not be worth the overhead.

Needs a CATVERSION bump.
---
 src/backend/utils/cache/syscache.c | 11 +++++++++++
 src/include/catalog/indexing.h     |  2 ++
 src/include/utils/syscache.h       |  1 +
 3 files changed, 14 insertions(+)

diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index ecb0f96..e83b5f1 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -591,6 +591,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		64
 	},
+	{RelationRelationId,		/* RELFILENODE */
+		ClassTblspcRelfilenodeIndexId,
+		2,
+		{
+			Anum_pg_class_reltablespace,
+			Anum_pg_class_relfilenode,
+			0,
+			0
+		},
+		1024
+	},
 	{RelationRelationId,		/* RELNAMENSP */
 		ClassNameNspIndexId,
 		2,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 19268fb..4860e98 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -106,6 +106,8 @@ DECLARE_UNIQUE_INDEX(pg_class_oid_index, 2662, on pg_class using btree(oid oid_o
 #define ClassOidIndexId  2662
 DECLARE_UNIQUE_INDEX(pg_class_relname_nsp_index, 2663, on pg_class using btree(relname name_ops, relnamespace oid_ops));
 #define ClassNameNspIndexId  2663
+DECLARE_INDEX(pg_class_tblspc_relfilenode_index, 3455, on pg_class using btree(reltablespace oid_ops, relfilenode oid_ops));
+#define ClassTblspcRelfilenodeIndexId  3455
 
 DECLARE_UNIQUE_INDEX(pg_collation_name_enc_nsp_index, 3164, on pg_collation using btree(collname name_ops, collencoding int4_ops, collnamespace oid_ops));
 #define CollationNameEncNspIndexId 3164
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index d1d8abe..2a14905 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -75,6 +75,7 @@ enum SysCacheIdentifier
 	PROCNAMEARGSNSP,
 	PROCOID,
 	RANGETYPE,
+	RELFILENODE,
 	RELNAMENSP,
 	RELOID,
 	RULERELNAME,
-- 
1.8.2.rc2.4.g7799588.dirty

0004-wal_decoding-Add-RelationMapFilenodeToOid-function-t.patchtext/x-patch; charset=us-asciiDownload
>From b0ea75b0e4e594b645ba7e779b6f630c3628b5f7 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 04/17] wal_decoding: Add RelationMapFilenodeToOid function to
 relmapper.c

This function maps (reltablespace, relfilenode) to the table oid and thus acts
as a reverse of RelationMapOidToFilenode.
---
 src/backend/utils/cache/relmapper.c | 53 +++++++++++++++++++++++++++++++++++++
 src/include/utils/relmapper.h       |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/src/backend/utils/cache/relmapper.c b/src/backend/utils/cache/relmapper.c
index 2c7d9f3..039aa29 100644
--- a/src/backend/utils/cache/relmapper.c
+++ b/src/backend/utils/cache/relmapper.c
@@ -180,6 +180,59 @@ RelationMapOidToFilenode(Oid relationId, bool shared)
 	return InvalidOid;
 }
 
+/* RelationMapFilenodeToOid
+ *
+ * Do the reverse of the normal direction of mapping done in
+ * RelationMapOidToFilenode.
+ *
+ * This is not supposed to be used during normal running but rather for
+ * information purposes when looking at the filesystem or the xlog.
+ *
+ * Returns InvalidOid if the OID is not know which can easily happen if the
+ * filenode is not of a relation that is nailed or shared or if it simply
+ * doesn't exists anywhere.
+ */
+Oid
+RelationMapFilenodeToOid(Oid filenode, bool shared)
+{
+	const RelMapFile *map;
+	int32		i;
+
+	/* If there are active updates, believe those over the main maps */
+	if (shared)
+	{
+		map = &active_shared_updates;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+		map = &shared_map;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+	}
+	else
+	{
+		map = &active_local_updates;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+		map = &local_map;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+	}
+
+	return InvalidOid;
+}
+
 /*
  * RelationMapUpdateMap
  *
diff --git a/src/include/utils/relmapper.h b/src/include/utils/relmapper.h
index 8f0b438..071bc98 100644
--- a/src/include/utils/relmapper.h
+++ b/src/include/utils/relmapper.h
@@ -36,6 +36,8 @@ typedef struct xl_relmap_update
 
 extern Oid	RelationMapOidToFilenode(Oid relationId, bool shared);
 
+extern Oid	RelationMapFilenodeToOid(Oid relationId, bool shared);
+
 extern void RelationMapUpdateMap(Oid relationId, Oid fileNode, bool shared,
 					 bool immediate);
 
-- 
1.8.2.rc2.4.g7799588.dirty

0005-wal_decoding-Add-pg_relation_by_filenode-to-lookup-u.patchtext/x-patch; charset=us-asciiDownload
>From f77a55bdf01c6997428bbf7e1bedac771998a95c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 05/17] wal_decoding: Add pg_relation_by_filenode to lookup up
 a relation by (tablespace, filenode)

This requires the previously added RELFILENODE syscache and the added
RelationMapFilenodeToOid function added in previous two commits.
---
 doc/src/sgml/func.sgml         | 23 ++++++++++++++-
 src/backend/utils/adt/dbsize.c | 63 ++++++++++++++++++++++++++++++++++++++++++
 src/include/catalog/pg_proc.h  |  2 ++
 src/include/utils/builtins.h   |  2 ++
 4 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4c5af4b..a8f83e2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15726,7 +15726,7 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
 
    <para>
     The functions shown in <xref linkend="functions-admin-dblocation"> assist
-    in identifying the specific disk files associated with database objects.
+    in identifying the specific disk files associated with database objects or doing the reverse.
    </para>
 
    <indexterm>
@@ -15735,6 +15735,9 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
    <indexterm>
     <primary>pg_relation_filepath</primary>
    </indexterm>
+   <indexterm>
+    <primary>pg_relation_by_filenode</primary>
+   </indexterm>
 
    <table id="functions-admin-dblocation">
     <title>Database Object Location Functions</title>
@@ -15763,6 +15766,15 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
         File path name of the specified relation
        </entry>
       </row>
+      <row>
+       <entry>
+        <literal><function>pg_relation_by_filenode(<parameter>tablespace</parameter> <type>oid</type>, <parameter>filenode</parameter> <type>oid</type>)</function></literal>
+        </entry>
+       <entry><type>regclass</type></entry>
+       <entry>
+        Find the associated relation of a filenode
+       </entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
@@ -15786,6 +15798,15 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
     the relation.
    </para>
 
+   <para>
+    <function>pg_relation_by_filenode</> is the reverse of
+    <function>pg_relation_filenode</>. Given a <quote>tablespace</> OID and
+    a <quote>filenode</> it returns the associated relation. The default
+    tablespace for user tables can be replaced with 0. Check the
+    documentation of <function>pg_relation_filenode</> for an explanation why
+    this cannot always easily answered by querying <structname>pg_class</>.
+   </para>
+
   </sect2>
 
   <sect2 id="functions-admin-genfile">
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4c4e1ed..ce5f49e 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -746,6 +746,69 @@ pg_relation_filenode(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Get the relation via (reltablespace, relfilenode)
+ *
+ * This is expected to be used when somebody wants to match an individual file
+ * on the filesystem back to its table. Thats not trivially possible via
+ * pg_class because that doesn't contain the relfilenodes of shared and nailed
+ * tables.
+ *
+ * We don't fail but return NULL if we cannot find a mapping.
+ *
+ * Instead of knowing DEFAULTTABLESPACE_OID you can pass 0.
+ */
+Datum
+pg_relation_by_filenode(PG_FUNCTION_ARGS)
+{
+	Oid			reltablespace = PG_GETARG_OID(0);
+	Oid			relfilenode = PG_GETARG_OID(1);
+	Oid			lookup_tablespace;
+	Oid         heaprel = InvalidOid;
+	HeapTuple	tuple;
+
+	if (reltablespace == 0)
+		reltablespace = DEFAULTTABLESPACE_OID;
+
+	/* in global tablespace, has to be a shared table */
+	if (reltablespace == GLOBALTABLESPACE_OID)
+	{
+		heaprel = RelationMapFilenodeToOid(relfilenode, true);
+	}
+	else
+	{
+		/*
+		 * relations in the default tablespace are stored with InvalidOid as
+		 * pg_class."reltablespace".
+		 */
+		if (reltablespace == DEFAULTTABLESPACE_OID)
+			lookup_tablespace = InvalidOid;
+		else
+			lookup_tablespace = reltablespace;
+
+
+		tuple = SearchSysCache2(RELFILENODE,
+								lookup_tablespace,
+								relfilenode);
+		/* ok, found it */
+		if (HeapTupleIsValid(tuple))
+		{
+			heaprel = HeapTupleHeaderGetOid(tuple->t_data);
+			ReleaseSysCache(tuple);
+		}
+		/* has to be nonexistant or a nailed table, but not shared */
+		else
+		{
+			heaprel = RelationMapFilenodeToOid(relfilenode, false);
+		}
+	}
+
+	if (!OidIsValid(heaprel))
+		PG_RETURN_NULL();
+	else
+		PG_RETURN_OID(heaprel);
+}
+
+/*
  * Get the pathname (relative to $PGDATA) of a relation
  *
  * See comments for pg_relation_filenode.
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 6d3d702..8d268dd 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3446,6 +3446,8 @@ DATA(insert OID = 2998 ( pg_indexes_size		PGNSP PGUID 12 1 0 0 0 f f f f t f v 1
 DESCR("disk space usage for all indexes attached to the specified table");
 DATA(insert OID = 2999 ( pg_relation_filenode	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 26 "2205" _null_ _null_ _null_ _null_ pg_relation_filenode _null_ _null_ _null_ ));
 DESCR("filenode identifier of relation");
+DATA(insert OID = 3454 ( pg_relation_by_filenode PGNSP PGUID 12 1 0 0 0 f f f f t f s 2 0 2205 "26 26" _null_ _null_ _null_ _null_ pg_relation_by_filenode _null_ _null_ _null_ ));
+DESCR("filenode identifier of relation");
 DATA(insert OID = 3034 ( pg_relation_filepath	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 25 "2205" _null_ _null_ _null_ _null_ pg_relation_filepath _null_ _null_ _null_ ));
 DESCR("file path of relation");
 
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 667c58b..ddbedea 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -459,8 +459,10 @@ extern Datum pg_size_pretty(PG_FUNCTION_ARGS);
 extern Datum pg_size_pretty_numeric(PG_FUNCTION_ARGS);
 extern Datum pg_table_size(PG_FUNCTION_ARGS);
 extern Datum pg_indexes_size(PG_FUNCTION_ARGS);
+extern Datum pg_relation_by_filenode(PG_FUNCTION_ARGS);
 extern Datum pg_relation_filenode(PG_FUNCTION_ARGS);
 extern Datum pg_relation_filepath(PG_FUNCTION_ARGS);
+extern Datum pg_relation_is_scannable(PG_FUNCTION_ARGS);
 
 /* genfile.c */
 extern bytea *read_binary_file(const char *filename,
-- 
1.8.2.rc2.4.g7799588.dirty

0006-wal_decoding-Introduce-InvalidCommandId-and-declare-.patchtext/x-patch; charset=us-asciiDownload
>From cb12f56b401bba484ad82f14079450cd83dfe673 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 06/17] wal_decoding: Introduce InvalidCommandId and declare
 that to be the new maximum for CommandCounterIncrement

This is useful to be able to represent a CommandId thats invalid. There was no
such value before.

This decreases the possible number of subtransactions by one which seems
unproblematic. Its also not a problem for pg_upgrade because cmin/cmax are
never looked at outside the context of their own transaction (spare timetravel
access, but thats new anyway).
---
 src/backend/access/transam/xact.c | 4 ++--
 src/include/c.h                   | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 31e868d..0591f3f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -766,12 +766,12 @@ CommandCounterIncrement(void)
 	if (currentCommandIdUsed)
 	{
 		currentCommandId += 1;
-		if (currentCommandId == FirstCommandId) /* check for overflow */
+		if (currentCommandId == InvalidCommandId)
 		{
 			currentCommandId -= 1;
 			ereport(ERROR,
 					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-					 errmsg("cannot have more than 2^32-1 commands in a transaction")));
+					 errmsg("cannot have more than 2^32-2 commands in a transaction")));
 		}
 		currentCommandIdUsed = false;
 
diff --git a/src/include/c.h b/src/include/c.h
index 7193af6..e4940a9 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -368,6 +368,7 @@ typedef uint32 MultiXactOffset;
 typedef uint32 CommandId;
 
 #define FirstCommandId	((CommandId) 0)
+#define InvalidCommandId	(~(CommandId)0)
 
 /*
  * Array indexing support
-- 
1.8.2.rc2.4.g7799588.dirty

0007-wal_decoding-Adjust-all-Satisfies-routines-to-take-a.patchtext/x-patch; charset=us-asciiDownload
>From 01b26c322b3f02beea0bfb42ab783c70e4a9c970 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 07/17] wal_decoding: Adjust all *Satisfies routines to take a
 HeapTuple instead of a HeapTupleHeader

For the regular satisfies routines this is needed in prepareation of logical
decoding. I changed the non-regular ones for consistency as well.

The naming between htup, tuple and similar is rather confused, I could not find
any consistent naming anywhere.

This is preparatory work for the logical decoding feature which needs to be
able to get to a valid relfilenode from when checking the visibility of a
tuple.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/heap/heapam.c          | 13 ++++----
 src/backend/access/heap/pruneheap.c       | 17 +++++++++--
 src/backend/catalog/index.c               |  2 +-
 src/backend/commands/analyze.c            |  3 +-
 src/backend/commands/cluster.c            |  2 +-
 src/backend/commands/vacuumlazy.c         | 11 ++++---
 src/backend/executor/nodeBitmapHeapscan.c |  1 +
 src/backend/storage/lmgr/predicate.c      |  2 +-
 src/backend/utils/time/tqual.c            | 50 +++++++++++++++++++++++++------
 src/include/utils/snapshot.h              |  4 +--
 src/include/utils/tqual.h                 | 20 ++++++-------
 12 files changed, 90 insertions(+), 37 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 075d781..8d8e78e 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -131,7 +131,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 		/* must hold a buffer lock to call HeapTupleSatisfiesUpdate */
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		htsu = HeapTupleSatisfiesUpdate(tuple->t_data,
+		htsu = HeapTupleSatisfiesUpdate(tuple,
 										GetCurrentCommandId(false),
 										scan->rs_cbuf);
 		xmax = HeapTupleHeaderGetRawXmax(tuple->t_data);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e88dd30..fdf0ccd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -384,6 +384,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 			HeapTupleData loctup;
 			bool		valid;
 
+			loctup.t_tableOid = RelationGetRelid(scan->rs_rd);
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
 			loctup.t_len = ItemIdGetLength(lpp);
 			ItemPointerSet(&(loctup.t_self), page, lineoff);
@@ -1698,7 +1699,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 
 		heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
 		heapTuple->t_len = ItemIdGetLength(lp);
-		heapTuple->t_tableOid = relation->rd_id;
+		heapTuple->t_tableOid = RelationGetRelid(relation);
 		heapTuple->t_self = *tid;
 
 		/*
@@ -1746,7 +1747,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 		 * transactions.
 		 */
 		if (all_dead && *all_dead &&
-			!HeapTupleIsSurelyDead(heapTuple->t_data, RecentGlobalXmin))
+			!HeapTupleIsSurelyDead(heapTuple, RecentGlobalXmin))
 			*all_dead = false;
 
 		/*
@@ -1876,6 +1877,7 @@ heap_get_latest_tid(Relation relation,
 		tp.t_self = ctid;
 		tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 		tp.t_len = ItemIdGetLength(lp);
+		tp.t_tableOid = RelationGetRelid(relation);
 
 		/*
 		 * After following a t_ctid link, we might arrive at an unrelated
@@ -2574,12 +2576,13 @@ heap_delete(Relation relation, ItemPointer tid,
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
+	tp.t_tableOid = RelationGetRelid(relation);
 	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tp.t_len = ItemIdGetLength(lp);
 	tp.t_self = *tid;
 
 l1:
-	result = HeapTupleSatisfiesUpdate(tp.t_data, cid, buffer);
+	result = HeapTupleSatisfiesUpdate(&tp, cid, buffer);
 
 	if (result == HeapTupleInvisible)
 	{
@@ -3053,7 +3056,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 l2:
 	checked_lockers = false;
 	locker_remains = false;
-	result = HeapTupleSatisfiesUpdate(oldtup.t_data, cid, buffer);
+	result = HeapTupleSatisfiesUpdate(&oldtup, cid, buffer);
 
 	/* see below about the "no wait" case */
 	Assert(result != HeapTupleBeingUpdated || wait);
@@ -3924,7 +3927,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
 	tuple->t_tableOid = RelationGetRelid(relation);
 
 l3:
-	result = HeapTupleSatisfiesUpdate(tuple->t_data, cid, *buffer);
+	result = HeapTupleSatisfiesUpdate(tuple, cid, *buffer);
 
 	if (result == HeapTupleInvisible)
 	{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2ab723d..3b68705 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -339,6 +339,9 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 	OffsetNumber chainitems[MaxHeapTuplesPerPage];
 	int			nchain = 0,
 				i;
+	HeapTupleData tup;
+
+	tup.t_tableOid = RelationGetRelid(relation);
 
 	rootlp = PageGetItemId(dp, rootoffnum);
 
@@ -348,6 +351,12 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 	if (ItemIdIsNormal(rootlp))
 	{
 		htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
+
+		tup.t_data = htup;
+		tup.t_len = ItemIdGetLength(rootlp);
+		tup.t_tableOid = RelationGetRelid(relation);
+		ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), rootoffnum);
+
 		if (HeapTupleHeaderIsHeapOnly(htup))
 		{
 			/*
@@ -368,7 +377,7 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 			 * either here or while following a chain below.  Whichever path
 			 * gets there first will mark the tuple unused.
 			 */
-			if (HeapTupleSatisfiesVacuum(htup, OldestXmin, buffer)
+			if (HeapTupleSatisfiesVacuum(&tup, OldestXmin, buffer)
 				== HEAPTUPLE_DEAD && !HeapTupleHeaderIsHotUpdated(htup))
 			{
 				heap_prune_record_unused(prstate, rootoffnum);
@@ -431,6 +440,10 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
 
+		tup.t_data = htup;
+		tup.t_len = ItemIdGetLength(lp);
+		ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), offnum);
+
 		/*
 		 * Check the tuple XMIN against prior XMAX, if any
 		 */
@@ -448,7 +461,7 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 		 */
 		tupdead = recent_dead = false;
 
-		switch (HeapTupleSatisfiesVacuum(htup, OldestXmin, buffer))
+		switch (HeapTupleSatisfiesVacuum(&tup, OldestXmin, buffer))
 		{
 			case HEAPTUPLE_DEAD:
 				tupdead = true;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5f61ecb..ba5c84b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2271,7 +2271,7 @@ IndexBuildHeapScan(Relation heapRelation,
 			 */
 			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-			switch (HeapTupleSatisfiesVacuum(heapTuple->t_data, OldestXmin,
+			switch (HeapTupleSatisfiesVacuum(heapTuple, OldestXmin,
 											 scan->rs_cbuf))
 			{
 				case HEAPTUPLE_DEAD:
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d6d20fd..9845b0b 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1138,10 +1138,11 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			ItemPointerSet(&targtuple.t_self, targblock, targoffset);
 
+			targtuple.t_tableOid = RelationGetRelid(onerel);
 			targtuple.t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
 			targtuple.t_len = ItemIdGetLength(itemid);
 
-			switch (HeapTupleSatisfiesVacuum(targtuple.t_data,
+			switch (HeapTupleSatisfiesVacuum(&targtuple,
 											 OldestXmin,
 											 targbuffer))
 			{
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 095d5e4..5064081 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -958,7 +958,7 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-		switch (HeapTupleSatisfiesVacuum(tuple->t_data, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
 		{
 			case HEAPTUPLE_DEAD:
 				/* Definitely dead */
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 078b822..2ea0590 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -151,7 +151,7 @@ static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
 					   ItemPointer itemptr);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
-static bool heap_page_is_all_visible(Buffer buf,
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid);
 
 
@@ -756,10 +756,11 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
+			tuple.t_tableOid = RelationGetRelid(onerel);
 
 			tupgone = false;
 
-			switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
+			switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
 			{
 				case HEAPTUPLE_DEAD:
 
@@ -1168,7 +1169,7 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 	 * check if the page has become all-visible.
 	 */
 	if (!visibilitymap_test(onerel, blkno, vmbuffer) &&
-		heap_page_is_all_visible(buffer, &visibility_cutoff_xid))
+		heap_page_is_all_visible(onerel, buffer, &visibility_cutoff_xid))
 	{
 		Assert(BufferIsValid(*vmbuffer));
 		PageSetAllVisible(page);
@@ -1676,7 +1677,7 @@ vac_cmp_itemptr(const void *left, const void *right)
  * xmin amongst the visible tuples.
  */
 static bool
-heap_page_is_all_visible(Buffer buf, TransactionId *visibility_cutoff_xid)
+heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId *visibility_cutoff_xid)
 {
 	Page		page = BufferGetPage(buf);
 	OffsetNumber offnum,
@@ -1718,6 +1719,8 @@ heap_page_is_all_visible(Buffer buf, TransactionId *visibility_cutoff_xid)
 		Assert(ItemIdIsNormal(itemid));
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+		tuple.t_len = ItemIdGetLength(itemid);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
 		switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
 		{
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index d2b2721..9534439 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -258,6 +258,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 		scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 		scan->rs_ctup.t_len = ItemIdGetLength(lp);
+		scan->rs_ctup.t_tableOid = scan->rs_rd->rd_id;
 		ItemPointerSet(&scan->rs_ctup.t_self, tbmres->blockno, targoffset);
 
 		pgstat_count_heap_fetch(scan->rs_rd);
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index b012df1..d656d62 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -3895,7 +3895,7 @@ CheckForSerializableConflictOut(bool visible, Relation relation,
 	 * tuple is visible to us, while HeapTupleSatisfiesVacuum checks what else
 	 * is going on with it.
 	 */
-	htsvResult = HeapTupleSatisfiesVacuum(tuple->t_data, TransactionXmin, buffer);
+	htsvResult = HeapTupleSatisfiesVacuum(tuple, TransactionXmin, buffer);
 	switch (htsvResult)
 	{
 		case HEAPTUPLE_LIVE:
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index ab4020a..3254a2d 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -163,8 +163,12 @@ HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
  *			 Xmax is not committed)))			that has not been committed
  */
 bool
-HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
 	{
 		if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -351,8 +355,12 @@ HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  *
  */
 bool
-HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesNow(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
 	{
 		if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -526,7 +534,7 @@ HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  *		Dummy "satisfies" routine: any tuple satisfies SnapshotAny.
  */
 bool
-HeapTupleSatisfiesAny(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesAny(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 {
 	return true;
 }
@@ -546,9 +554,13 @@ HeapTupleSatisfiesAny(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  * table.
  */
 bool
-HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
+HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
 						Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
 	{
 		if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -627,9 +639,13 @@ HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
  *	distinguish that case must test for it themselves.)
  */
 HTSU_Result
-HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
+HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 						 Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
 	{
 		if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -849,9 +865,13 @@ HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
  * for snapshot->xmax and the tuple's xmax.
  */
 bool
-HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
+HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 						Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
 
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
@@ -1040,9 +1060,13 @@ HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
  * can't see it.)
  */
 bool
-HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
+HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 					   Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
 	{
 		if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -1233,9 +1257,13 @@ HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
  * even if we see that the deleting transaction has committed.
  */
 HTSV_Result
-HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
+HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 						 Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	/*
 	 * Has inserting transaction committed?
 	 *
@@ -1464,8 +1492,12 @@ HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
  *	just whether or not the tuple is surely dead).
  */
 bool
-HeapTupleIsSurelyDead(HeapTupleHeader tuple, TransactionId OldestXmin)
+HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	/*
 	 * If the inserting transaction is marked invalid, then it aborted, and
 	 * the tuple is definitely dead.  If it's marked neither committed nor
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index e747191..ed3f586 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -27,8 +27,8 @@ typedef struct SnapshotData *Snapshot;
  * The specific semantics of a snapshot are encoded by the "satisfies"
  * function.
  */
-typedef bool (*SnapshotSatisfiesFunc) (HeapTupleHeader tuple,
-										   Snapshot snapshot, Buffer buffer);
+typedef bool (*SnapshotSatisfiesFunc) (HeapTuple htup,
+									   Snapshot snapshot, Buffer buffer);
 
 typedef struct SnapshotData
 {
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 465231c..800e366 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -52,7 +52,7 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
  *	if so, the indicated buffer is marked dirty.
  */
 #define HeapTupleSatisfiesVisibility(tuple, snapshot, buffer) \
-	((*(snapshot)->satisfies) ((tuple)->t_data, snapshot, buffer))
+	((*(snapshot)->satisfies) (tuple, snapshot, buffer))
 
 /* Result codes for HeapTupleSatisfiesVacuum */
 typedef enum
@@ -65,25 +65,25 @@ typedef enum
 } HTSV_Result;
 
 /* These are the "satisfies" test routines for the various snapshot types */
-extern bool HeapTupleSatisfiesMVCC(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesMVCC(HeapTuple htup,
 					   Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesNow(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesNow(HeapTuple htup,
 					  Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesSelf(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesSelf(HeapTuple htup,
 					   Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesAny(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesAny(HeapTuple htup,
 					  Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesToast(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesToast(HeapTuple htup,
 						Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesDirty(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesDirty(HeapTuple htup,
 						Snapshot snapshot, Buffer buffer);
 
 /* Special "satisfies" routines with different APIs */
-extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTupleHeader tuple,
+extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple htup,
 						 CommandId curcid, Buffer buffer);
-extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple,
+extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup,
 						 TransactionId OldestXmin, Buffer buffer);
-extern bool HeapTupleIsSurelyDead(HeapTupleHeader tuple,
+extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 					  TransactionId OldestXmin);
 
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
1.8.2.rc2.4.g7799588.dirty

0008-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patchtext/x-patch; charset=us-asciiDownload
>From 19bb80af95eee295361dc8882e7032e6c3505898 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 08/17] wal_decoding: Allow walsender's to connect to a
 specific database

Currently the decision whether to connect to a database or not is made by
checking whether the passed "dbname" parameter is "replication". Unfortunately
this makes it impossible to connect a to a database named replication...

This is useful for future walsender commands which need database interaction.
---
 src/backend/postmaster/postmaster.c                |  7 ++++--
 .../libpqwalreceiver/libpqwalreceiver.c            |  4 ++--
 src/backend/replication/walsender.c                | 27 ++++++++++++++++++----
 src/backend/utils/init/postinit.c                  |  5 ++++
 src/bin/pg_basebackup/pg_basebackup.c              |  4 ++--
 src/bin/pg_basebackup/pg_receivexlog.c             |  4 ++--
 src/bin/pg_basebackup/receivelog.c                 |  4 ++--
 7 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 87e6062..86f0686 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1955,10 +1955,13 @@ retry1:
 	if (strlen(port->user_name) >= NAMEDATALEN)
 		port->user_name[NAMEDATALEN - 1] = '\0';
 
-	/* Walsender is not related to a particular database */
-	if (am_walsender)
+	/* Generic Walsender is not related to a particular database */
+	if (am_walsender && strcmp(port->database_name, "replication") == 0)
 		port->database_name[0] = '\0';
 
+	if (am_walsender)
+		elog(WARNING, "connecting to %s", port->database_name);
+
 	/*
 	 * Done putting stuff in TopMemoryContext.
 	 */
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6bc0aa1..ee0f1fe 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -130,7 +130,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
 						"the primary server: %s",
 						PQerrorMessage(streamConn))));
 	}
-	if (PQnfields(res) != 3 || PQntuples(res) != 1)
+	if (PQnfields(res) != 4 || PQntuples(res) != 1)
 	{
 		int			ntuples = PQntuples(res);
 		int			nfields = PQnfields(res);
@@ -138,7 +138,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
 		PQclear(res);
 		ereport(ERROR,
 				(errmsg("invalid response from primary server"),
-				 errdetail("Expected 1 tuple with 3 fields, got %d tuples with %d fields.",
+				 errdetail("Expected 1 tuple with 4 fields, got %d tuples with %d fields.",
 						   ntuples, nfields)));
 	}
 	primary_sysid = PQgetvalue(res, 0, 0);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9f5f766..a421ec5 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -46,6 +46,7 @@
 #include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_type.h"
+#include "commands/dbcommands.h"
 #include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -242,10 +243,12 @@ IdentifySystem(void)
 	char		tli[11];
 	char		xpos[MAXFNAMELEN];
 	XLogRecPtr	logptr;
+	char*        dbname = NULL;
 
 	/*
-	 * Reply with a result set with one row, three columns. First col is
-	 * system ID, second is timeline ID, and third is current xlog location.
+	 * Reply with a result set with one row, four columns. First col is system
+	 * ID, second is timeline ID, third is current xlog location and the fourth
+	 * contains the database name if we are connected to one.
 	 */
 
 	snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
@@ -264,9 +267,14 @@ IdentifySystem(void)
 
 	snprintf(xpos, sizeof(xpos), "%X/%X", (uint32) (logptr >> 32), (uint32) logptr);
 
+	if (MyDatabaseId != InvalidOid)
+		dbname = get_database_name(MyDatabaseId);
+	else
+		dbname = "(none)";
+
 	/* Send a RowDescription message */
 	pq_beginmessage(&buf, 'T');
-	pq_sendint(&buf, 3, 2);		/* 3 fields */
+	pq_sendint(&buf, 4, 2);		/* 4 fields */
 
 	/* first field */
 	pq_sendstring(&buf, "systemid");	/* col name */
@@ -294,17 +302,28 @@ IdentifySystem(void)
 	pq_sendint(&buf, -1, 2);
 	pq_sendint(&buf, 0, 4);
 	pq_sendint(&buf, 0, 2);
+
+	/* fourth field */
+	pq_sendstring(&buf, "dbname");
+	pq_sendint(&buf, 0, 4);
+	pq_sendint(&buf, 0, 2);
+	pq_sendint(&buf, TEXTOID, 4);
+	pq_sendint(&buf, -1, 2);
+	pq_sendint(&buf, 0, 4);
+	pq_sendint(&buf, 0, 2);
 	pq_endmessage(&buf);
 
 	/* Send a DataRow message */
 	pq_beginmessage(&buf, 'D');
-	pq_sendint(&buf, 3, 2);		/* # of columns */
+	pq_sendint(&buf, 4, 2);		/* # of columns */
 	pq_sendint(&buf, strlen(sysid), 4); /* col1 len */
 	pq_sendbytes(&buf, (char *) &sysid, strlen(sysid));
 	pq_sendint(&buf, strlen(tli), 4);	/* col2 len */
 	pq_sendbytes(&buf, (char *) tli, strlen(tli));
 	pq_sendint(&buf, strlen(xpos), 4);	/* col3 len */
 	pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
+	pq_sendint(&buf, strlen(dbname), 4);	/* col4 len */
+	pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
 
 	pq_endmessage(&buf);
 }
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e0abff1..ca803cb 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -730,7 +730,12 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 			ereport(FATAL,
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("must be superuser or replication role to start walsender")));
+	}
 
+	if (am_walsender &&
+	    (in_dbname == NULL || in_dbname[0] == '\0') &&
+	    dboid == InvalidOid)
+	{
 		/* process any options passed in the startup packet */
 		if (MyProcPort != NULL)
 			process_startup_options(MyProcPort, am_superuser);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 56657a4..93ee489 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1361,11 +1361,11 @@ BaseBackup(void)
 				progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
-	if (PQntuples(res) != 1 || PQnfields(res) != 3)
+	if (PQntuples(res) != 1 || PQnfields(res) != 4)
 	{
 		fprintf(stderr,
 				_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-				progname, PQntuples(res), PQnfields(res), 1, 3);
+				progname, PQntuples(res), PQnfields(res), 1, 4);
 		disconnect_and_exit(1);
 	}
 	sysidentifier = pg_strdup(PQgetvalue(res, 0, 0));
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 1850787..5fdae7d 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -252,11 +252,11 @@ StreamLog(void)
 				progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
-	if (PQntuples(res) != 1 || PQnfields(res) != 3)
+	if (PQntuples(res) != 1 || PQnfields(res) != 4)
 	{
 		fprintf(stderr,
 				_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-				progname, PQntuples(res), PQnfields(res), 1, 3);
+				progname, PQntuples(res), PQnfields(res), 1, 4);
 		disconnect_and_exit(1);
 	}
 	servertli = atoi(PQgetvalue(res, 0, 1));
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index 7ce8112..4a2eb78 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -532,11 +532,11 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
 			PQclear(res);
 			return false;
 		}
-		if (PQnfields(res) != 3 || PQntuples(res) != 1)
+		if (PQnfields(res) != 4 || PQntuples(res) != 1)
 		{
 			fprintf(stderr,
 					_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-					progname, PQntuples(res), PQnfields(res), 1, 3);
+					progname, PQntuples(res), PQnfields(res), 1, 4);
 			PQclear(res);
 			return false;
 		}
-- 
1.8.2.rc2.4.g7799588.dirty

0009-wal_decoding-Add-alreadyLocked-parameter-to-GetOldes.patchtext/x-patch; charset=us-asciiDownload
>From 2c9d0b952cce025d4daa70b85b5a6456463f88b0 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 09/17] wal_decoding: Add alreadyLocked parameter to
 GetOldestXminNoLock

This is useful because it allows to compute the current OldestXmin while
already holding the procarray lock which enables setting the own xmin horizon
safely.
---
 src/backend/access/transam/xlog.c     |  4 ++--
 src/backend/catalog/index.c           |  3 ++-
 src/backend/commands/analyze.c        |  2 +-
 src/backend/commands/vacuum.c         |  4 ++--
 src/backend/replication/walreceiver.c |  2 +-
 src/backend/storage/ipc/procarray.c   | 16 ++++++++--------
 src/include/storage/procarray.h       |  2 +-
 7 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 654c9c1..ac51193 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7165,7 +7165,7 @@ CreateCheckPoint(int flags)
 	 * StartupSUBTRANS hasn't been called yet.
 	 */
 	if (!RecoveryInProgress())
-		TruncateSUBTRANS(GetOldestXmin(true, false));
+		TruncateSUBTRANS(GetOldestXmin(true, false, false));
 
 	/* Real work is done, but log and update stats before releasing lock. */
 	LogCheckpointEnd(false);
@@ -7522,7 +7522,7 @@ CreateRestartPoint(int flags)
 	 * this because StartupSUBTRANS hasn't been called yet.
 	 */
 	if (EnableHotStandby)
-		TruncateSUBTRANS(GetOldestXmin(true, false));
+		TruncateSUBTRANS(GetOldestXmin(true, false, false));
 
 	/* Real work is done, but log and update before releasing lock. */
 	LogCheckpointEnd(true);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ba5c84b..bfad8b1 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2198,7 +2198,8 @@ IndexBuildHeapScan(Relation heapRelation,
 	{
 		snapshot = SnapshotAny;
 		/* okay to ignore lazy VACUUMs here */
-		OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true);
+		OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true,
+								   false);
 	}
 
 	scan = heap_beginscan_strat(heapRelation,	/* relation */
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9845b0b..7968319 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1081,7 +1081,7 @@ acquire_sample_rows(Relation onerel, int elevel,
 	totalblocks = RelationGetNumberOfBlocks(onerel);
 
 	/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
-	OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true);
+	OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true, false);
 
 	/* Prepare for sampling block numbers */
 	BlockSampler_Init(&bs, totalblocks, targrows);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 641c740..924a12e 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -399,7 +399,7 @@ vacuum_set_xid_limits(int freeze_min_age,
 	 * working on a particular table at any time, and that each vacuum is
 	 * always an independent transaction.
 	 */
-	*oldestXmin = GetOldestXmin(sharedRel, true);
+	*oldestXmin = GetOldestXmin(sharedRel, true, false);
 
 	Assert(TransactionIdIsNormal(*oldestXmin));
 
@@ -720,7 +720,7 @@ vac_update_datfrozenxid(void)
 	 * committed pg_class entries for new tables; see AddNewRelationTuple().
 	 * So we cannot produce a wrong minimum by starting with this.
 	 */
-	newFrozenXid = GetOldestXmin(true, true);
+	newFrozenXid = GetOldestXmin(true, true, false);
 
 	/*
 	 * Similarly, initialize the MultiXact "min" with the value that would be
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index a30464b..4c74d1b 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1137,7 +1137,7 @@ XLogWalRcvSendHSFeedback(bool immed)
 	 * everything else has been checked.
 	 */
 	if (hot_standby_feedback)
-		xmin = GetOldestXmin(true, false);
+		xmin = GetOldestXmin(true, false, false);
 	else
 		xmin = InvalidTransactionId;
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b5f66fb..993efac 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1100,7 +1100,7 @@ TransactionIdIsActive(TransactionId xid)
  * GetOldestXmin() move backwards, with no consequences for data integrity.
  */
 TransactionId
-GetOldestXmin(bool allDbs, bool ignoreVacuum)
+GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked)
 {
 	ProcArrayStruct *arrayP = procArray;
 	TransactionId result;
@@ -1109,7 +1109,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 	/* Cannot look for individual databases during recovery */
 	Assert(allDbs || !RecoveryInProgress());
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	if (!alreadyLocked)
+		LWLockAcquire(ProcArrayLock, LW_SHARED);
 
 	/*
 	 * We initialize the MIN() calculation with latestCompletedXid + 1. This
@@ -1164,7 +1165,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 		 */
 		TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
 
-		LWLockRelease(ProcArrayLock);
+		if (!alreadyLocked)
+			LWLockRelease(ProcArrayLock);
 
 		if (TransactionIdIsNormal(kaxmin) &&
 			TransactionIdPrecedes(kaxmin, result))
@@ -1172,10 +1174,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 	}
 	else
 	{
-		/*
-		 * No other information needed, so release the lock immediately.
-		 */
-		LWLockRelease(ProcArrayLock);
+		if (!alreadyLocked)
+			LWLockRelease(ProcArrayLock);
 
 		/*
 		 * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
@@ -1249,7 +1249,7 @@ GetMaxSnapshotSubxidCount(void)
  *			older than this are known not running any more.
  *		RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
  *			running transactions, except those running LAZY VACUUM).  This is
- *			the same computation done by GetOldestXmin(true, true).
+ *			the same computation done by GetOldestXmin(true, true, ...).
  *
  * Note: this function should probably not be called with an argument that's
  * not statically allocated (see xip allocation below).
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..fe0bad7 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -49,7 +49,7 @@ extern RunningTransactions GetRunningTransactionData(void);
 
 extern bool TransactionIdIsInProgress(TransactionId xid);
 extern bool TransactionIdIsActive(TransactionId xid);
-extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum);
+extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked);
 extern TransactionId GetOldestActiveTransactionId(void);
 
 extern VirtualTransactionId *GetVirtualXIDsDelayingChkpt(int *nvxids);
-- 
1.8.2.rc2.4.g7799588.dirty

#3Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#2)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> wrote:

0007: Adjust Satisfies* interface: required, mechanical,

Version v5-01 attached

I'm still working on a review and hope to post something more
substantive by this weekend, but when applying patches in numeric
order, this one did not compile cleanly.

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o allpaths.o allpaths.c -MMD -MP -MF .deps/allpaths.Po
vacuumlazy.c: In function ‘heap_page_is_all_visible’:
vacuumlazy.c:1725:3: warning: passing argument 1 of ‘HeapTupleSatisfiesVacuum’ from incompatible pointer type [enabled by default]
In file included from vacuumlazy.c:61:0:
../../../src/include/utils/tqual.h:84:20: note: expected ‘HeapTuple’ but argument is of type ‘HeapTupleHeader’

Could you post a new version of that?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@2ndquadrant.com
In reply to: Kevin Grittner (#3)
17 attachment(s)
Re: changeset generation v5-01 - Patches & git tree

Hi Kevin!

On 2013-06-20 15:57:07 -0700, Kevin Grittner wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

0007: Adjust Satisfies* interface: required, mechanical,

Version v5-01 attached

I'm still working on a review and hope to post something more
substantive by this weekend

Cool!

, but when applying patches in numeric
order, this one did not compile cleanly.

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o allpaths.o allpaths.c -MMD -MP -MF .deps/allpaths.Po
vacuumlazy.c: In function ‘heap_page_is_all_visible’:
vacuumlazy.c:1725:3: warning: passing argument 1 of ‘HeapTupleSatisfiesVacuum’ from incompatible pointer type [enabled by default]
In file included from vacuumlazy.c:61:0:
../../../src/include/utils/tqual.h:84:20: note: expected ‘HeapTuple’ but argument is of type ‘HeapTupleHeader’

Could you post a new version of that?

Hrmpf. There was one hunk in 0013 instead of 0007.

I made sure that every commit again applies and compiles cleanly. git
rebase -i --exec to the rescue.

Found two other issues:
* recptr not assigned in 0010
* unsafe use of non-volatile variable across longjmp() 0013

Pushed and attached.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Add-support-for-multiple-kinds-of-external-toast-dat.patch.gzapplication/x-patch-gzipDownload
0002-wal_decoding-Add-pg_xlog_wait_remote_-apply-receive-.patch.gzapplication/x-patch-gzipDownload
0003-wal_decoding-Add-a-new-RELFILENODE-syscache-to-fetch.patch.gzapplication/x-patch-gzipDownload
0004-wal_decoding-Add-RelationMapFilenodeToOid-function-t.patch.gzapplication/x-patch-gzipDownload
0005-wal_decoding-Add-pg_relation_by_filenode-to-lookup-u.patch.gzapplication/x-patch-gzipDownload
0006-wal_decoding-Introduce-InvalidCommandId-and-declare-.patch.gzapplication/x-patch-gzipDownload
0007-wal_decoding-Adjust-all-Satisfies-routines-to-take-a.patch.gzapplication/x-patch-gzipDownload
0008-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patch.gzapplication/x-patch-gzipDownload
0009-wal_decoding-Add-alreadyLocked-parameter-to-GetOldes.patch.gzapplication/x-patch-gzipDownload
0010-wal_decoding-Log-xl_running_xact-s-at-a-higher-frequ.patch.gzapplication/x-patch-gzipDownload
0011-wal_decoding-copydir-make-fsync_fname-public.patch.gzapplication/x-patch-gzipDownload
0012-wal_decoding-Add-information-about-a-tables-primary-.patch.gzapplication/x-patch-gzipDownload
0013-wal_decoding-Introduce-wal-decoding-via-catalog-time.patch.gzapplication/x-patch-gzipDownload
����Q0013-wal_decoding-Introduce-wal-decoding-via-catalog-time.patch�\{w�6��[�h��T�eY/[����q�D7��c�M�vsx ���H��h��~g���4�����y�0��&Vl8�9B����;�8}.�Ao.����l��a�;���`�������u���9���^����6���wB�7��}��������C���������Kq���h�n��O�}��z����qo�v;��~��+ly�~�<���c��~w���q�r�8��8fc_�����ig�.g6��u&���!�^�>]�s
Q�"q+B��K ���;��P�#�Y�d+����k/Y(x���R`�G��n�Gl�*�,��c��9�!�h
����%�"B����������������G�$�\\2�q@�is���{0��|��9�
%�;����z��cG���
hs�#2 ��-yS��Y���rK��b6bw�\����f����l8��g�����V�����z�C��m�(��%���b9��F����`/����+��y�1��b����Skr�v|z2���.'�a:�8g�P���]P�t=���5t#)|�������H5 ��bzb.�;�>W
�X{�T4yH�+w�����6��za��u�F��ve�C���'W���"[i%��I�����Pae�Ah��mJ	
�8Z�LHI �"�H*[��������b��"�������$����<�
^�!8fp�w���'�u<������2o��:<�G�O<�9?k����z0xH�>�0�<��/�&���L�d�������0�� +�����|�d����B�D��'t���`�
4������R�lDr�q�-cI3q@�����h���Z���u��=��	���e���
,/��������������\x���������]�>���{���_
\��K,�����`&B��q`v���H���>KX,v
R��i��
D�E�z�#����Kd��b�(my�����O<O�[���l}oo���������6��>F�����j�m�%_�_��v+)�������4	������_��D�c_����6���������$��m4\- �F6����e�
�&��`����
���V��.X/�Yr�Fq
�=x`I�H��=I
�V�v��:	����n��!�U�$|�4z�����v��W"-�!�.xp��{�CT�k,�8�!���
#�@	��X<B��<�-����i��S����rUq�<%��9f>V���4h&���7b�B��JIYJ�yJhMK�VEI�(��@yp�b���=!��4*��c1�}��#�?����sX����`0����{�_b��`>�~���fV����j?�sX6�i �a���<N
i^'���.Z U�l�����{��y�b�]���}�Q7S"��a���F�=*	�%�$x���O��A!�{���j�e�n�YR����"/v����HW.$�ElWn�-6R��;���6��5�9Qv7�;C2f�����*fH��q����w��ih]w
�L�m�OUR�����B�*|���GB�����,��+����j��Q2h�Tg�2YVM����e)�6
����A�+3(�CZ�U#��R�t������DYH�)����S"������A%����H�/-L6���Vi
��c�&�I��G
�$U�l�4M�U�jk��m��rFIE���M�9�~��f�4��'k2�f�$e�I���;���!	��A����7Y�������y�MKfo(�S�k�kH#�*6Z
H*�>B�yPE
f#����fR��6���yR�(�G�^�,(O<6�~���bb6����Z�R8��a��0��ZM��<]����h2^aT����z��._����p�<j{n$�(WG�x�kXN��:�s�H�Th�6[Xf���j�k��
A
,�
�����`k��M��q���F��������G�o�,�|Q�V��Jg��"-����-$����$��tO��(�\�;�|����d|�r�l��u��0{�
;�v�:���H�����CV�~��f{��dUx�|���tm��Y3�S_��w���o���:�����?����j�gX_��,�-�g�x�RO���Q�KN�t��&��C��,������kfWg�����~|��zwvr����9�@�H.���l�#���@�r�s:��=c����9��F"7bo���GSfG�g��n��T��vZG�C�<�������X��"�2]�-����K���H�`�����saHy,���l7�s��r�	�����=a��
��I�V��Y$�8t��5����Y���M�}yf������H�T��s�Tp��l�j.J��^��i%��l��H(�$=g5�*�b��x�6_<���*y�K��h'�;��m;������9���L6�o����A7s����w�g�g�{��#a)��U��Phe�K���j���h��{�w��l�����/Y�M3��I������-A�Wnh�e<�\���m�����d�����^i3t�f�K���[7r�X,�x��	�1��*�����Ls�[�so��	���c=��C#\��h�������^<�RV�Z)5�c�\��kp���W���3�7�A��D$�Nd����Uj�IiXQMF}��Z����x�V;#uB���I����&�['H��w���z����L�`���F�M�Y����Ce4���*��i���|��X2��2������ z��p#�b-��:G*�l���HK�'h�|��ld��~n�]��.��<�1|h���w\��{O�L���9��P�����"K��et���H�����d���I���e�����2=���������z�?����9
�r
[�y����"�V�����p�f��&{`�jl�]����>��n��!_�`��d1`�n�(H����9��61�>�(��,k��9ff��t�����0��������]�{���)��c�gH�Er Dc]�a��\����|a��8z�BO�KF��Mt��'�PX���X�����G�����L�����`UpE':��+1wK<3��n�iuI9��w�v���v���fE�)����5�����.v0�Y���u��A�A�7\��w�L����7�$�����Ab8�7O�k&|��K���E0��rGV���l<o,������5�po���|C�
O&�#���s��#�3'[%����bwz��"��g[#��0q�$�����*�C*c+�R�����.�}�&�9�.p��'��9�C�E,4L+����	[�g��3�>]��t�.��9D��.E����R��<7�%&�h�����2�������5�:G���z��a�7H�q[�X�(��geX�{<?|�}��#
x�tBz���]��iEd!����Lj,?|&>����:<h*1>lmdW����
��4�xK�Ik|~}v5�!0��jxs�"��p��.."m`v�I�^��T��Z
��L�
#�������5&�f{�rv+�����
�=�Fe�c��G����T�5>}��!�b��|�G��$��z`XkxN�������l+Mx�E�����=zf*��U�A�Xm���Y����U5,����2&������A����eH��D��{��4��)B���H���Ej%h�w\����6����D�h%Q�������W��N&��������:���\��f���=9Ilh���3�B�r,l+����LL��n�-:��Z�
2�i�������a�����V��{8���j�n�����>��Kv��!.�8�;��>:������_~X4����+���qB�J�V$Q�2�E��/�%<x�P-��`4�|;�����p2=V��f��a���K;x=�3~�����[�Y
���,'=�MK%����0B��`���"�T�4�
�������Y���������%�?�e2I4nb���B'E-j$�V�'7�zc����HV�-��;7�w}��	|S��x�/p��
cNa�MUb� S��J�sSg��i0����{�V�Sh��~�iN��9*I�1�K6��^�OO Z�g��/��3b�W������,A��S��F��=�b�9��a��x����
�;3�q&�\L�H9������a(p��Y��a��
x�m��#<Sk�0����G���
@�}_�n��|���.��4'����Kw������3����V���weW�q��]1\qr5�u6�>h�_�hO�R
��������T���Le�����~��
�����L����G�A�
q��6��*��7���=/��f�Q�ZN ��a
��YnZ�������	���	�s�3�9v�9f�S�"@M%R�������~�
���h��������8lG���Z��)��a�L7 ������t�y��iKIJz��Jx��\l�������4�~g5!^P�H��1~����%�hE!�����D�,j�`��Ni��H���������z��N3������
A���E@
�?*�z
:]�:��A����F5HmZ\�{?�f(1z����<S�W��.(��Q���E&HNJ���+d�B��~�C=N��6OU
!�.�������F���%9�HYp+]xl���/M�O�jr���M�t{�.9�a�����G�7!Q��|(�N�e��������G3�^@x$��&,�������l�^.�`:����xtG������b�L�A�F�������4���(����
��oX�El�-!�2�e�d�MK���2�^��<+0��`M�r��X8���~D��3j��+jU���X�N���6:*��dt�#��nD����:j���_U�1 4T���1
R������>����,�F<4�����t$�����Z�/I���)x�w���u�������+ ���sU���_`I� K*N����<]���\�U5������s�lcW�#0�SuV�����L��)p���<������p#�I������)��kx��
�0}���s�zq(�D����*����*O�R�{.���_TdW�/>���>;�7�s!���!��n�hp���R�����2�L)|��w�����.���kU�MJ���n����M���8�=
�Z���U�VX�����er�o�S��JY��^��N����o�����>�?H
�k��dO$�L���2?G�B>Z��Mh�(?�a2���3S[�/��r@����g������)�<�2��7Q��|
n0�@%T��i��u����u����� ���XQNLj�%%�,�u	��"�c	M������j�F�|k��b��z���d�E+���lr6=K�Y��%^9�>o�R��gv.#����<��&����[�	� �t��-���aw|��e�)%�Zz�*s��� �Z���>�3O����Y4@YG������V1�uj��r)�)��w`f����V�zK�2j�S����l;�nZ��s����A{)��_9K�����v(!�����n�\�"��1(b8��:����rH�v �h\?�P�����%�����N���8Z*fK�^��r�;hp�p&���m��9���'���5�HtH�����K)&}�����g�����������6�$o�>E�]�����i{r7oc`�=��G��
��n��L���~���������;���UYy������E����{�:���;Q�
U�Y�.ukKU���������������\�'� r�7$b.�_�kgf��*~�xc�%����O5�;�#�D������~<HG�n(�s:����^�H��+�#��\C�:t7���4Q��$Q�`<Y��J�H�&���J]�`f}p=�o�<r�a�=�y�>(q������w��k!��@��DC����4�p��,����!��u��0�BO^�e��3t3�"Iw��s����z|g�-~MR�sQ/XQaME!���w��{�R���0XU��<G�j==<�Z'�����Z������v���eP��[;��v�ZI����0o/^4�6��A�(�D��"W�*_E{��Z@����=���-��kK{Q�|C,r������W�@m��*��%�����L��!�9
��Z}��X�c%l������|-������b(�4����}�;����?��"���/�>nlY�?+����	Fg5C���!�����`�!�Z7�0'��������e�=����J����TR,-H�G����O8���8G �r�������e<�f���2�Xu_t�7�<�����>b����5��nL���Wn[�W���6{���i����0�5g��Q^��,�7]w��t�}|K.6�~BR��F���G���M���(xx��l5�h>���L��n����x�I�/�kQ��R_/aS�#����N$�����r�[a]z�}w��.l�&a����P�3�f�?�8W�h����N���suy�����3R �[��6bo�����k:d�0����g�����A���3@���hr��5���HmJ�:���i�D�1�e�����T��$�l;�N��A�0C1�<>Gm`%q���)���f����m�<����VnL%_�
y��O������RDe�d���A������3���
^�3�G�i��f'����r���D������1���:#.�/�N_�y�����N��d5�#a���fE|��p��%��P�=g\@��l3u%��"�g'���.����jICD6^3P�6QG{�G���l����.���f0� �_2!`3X�������d�y����;K��y��o84�[���5��P��������l�('u���[ ��D_>}�j��h���gO�7���2H��R�R�>8P����P1���g_��$���g���_����1�,��%����3f��^�9��0��z��lU�k�����?H�1t����H%'zJ�*���7�D8���o`��+��"�#�����SP����5��(�df"�@�l3�~O_<���/_>Q������fl^u�d�k�������pnSc�#' ��a}O��O�T���U��vf�}6Tv�j�Ub����Z�>s���3�p�K�X�d�V��������r�������|��>���66�/6�B��.
3"���>���	��T~�~ n���C �.���/gt����=�X/��B�_8��o�(0T�T�K2�(f
���L��]/��,�xW���t���rThy���;&�jR�uC��8��X���������k����}�X7m#�tL��=�����k�N">�_�:o����Z�9���n�n&��~�M���J<W\�E#�#��j[?_q���5D����L�C�B�7��MH'&=G.�J}X[* I��"r�b47�w�����p��/�UV(��](�w�^,�QlO��w	;�����������
���2��d�tG�[WwV��A�Tl�4�h���P��.��R��u8�3|(��y�b'<��9������;�y�p�5������<.9A�D>�<�g����K���G������I�m��:�5�?�����]�����a�
�+�Y��|���xq�j�Q
c�At�`��4�Lq���W�3���B�2�Q�Z,����7���]����9������E������[�M�LDe����q�CS�[C�*Bs%c8��(NP�D��Y���'�2~E_��XK�s�������z���/�.(����i�P\DDG
��d�t,�z�������4�Q��4�����q��}��k�@���w������!��WwG��Z���E�C����{)���wD�|��������ns�d�9��A��<"'�Ucb�Gm��{6�qM�Ya��lY��
������A�t��=�}`����\��>���M.0$�����2�?b�Z��_�w���=z6R$w�-h���XW}�-\b���|a�]��� 1��a���w���_�$9W��Q~z��	��_wx�&��r����-�d��#�vS�`"-jp����c7A���w�����1�.V$��1��89E���ar�K�f�H��`�b*N�-���pi{@p(gX�,��Z�Y���+"���nB�&��7w%��M�X�����X
>�#�i|������H� �����/T ��^/T�����D�R��u���^$��U����,�#�d�m�6[��� �&�x��{����{�J����L������~�q�����k��Q��9�!��T���5�_S�y4�wd-��.J����_�������\2p-H��U�~0+�<��������W��J�ko�z����/����JQ�{Qj��n�����6�F��<%`�\5���ia���8���-���.!�;��x�	
4x�������-�p/\�����X�eP�'5����d
g7q%����a��+J�+��6��C����	��&Y���(��;
x2�����W�����
'4SBi�����
*�!v�#Y��1�������o���9�[�
����({`kb���� �L�kx�Y�z$�r���lz/4�A�e�qT�	_�;	X�]J��.��H���L��Z�#���+7KK_�eM���-���/�i�������9N��HP�nzD�g�H�l)���N�
����}���,Sv���^��r��n�:"�vL�3p�u<����}:|&����S����x^�<������/G2~�A�h�^l>����ti&��+������������?���?���>���������N/�������Gl�^�a��e�������JJ�
T#X%��%����7�$*_d��
��n\����.�W���@�����Z����b�$���d<����x�����N���m�"��UWFy>����O^��/y�^<������S�b���s��AS��W��
DY�PT9%>E�����WJ�/�m"�L������Jc`*/KV�MP�;E�jD+�c�h���(��,B_aE�oE��x_��
Z
a�,P�.�	��dH6T���xM�)���(��i�I�3���nA�;�6;�x�C$���w��������Uf�{�c����+P���
����;��>������B���pB����W��XEp���U����?���[���^p
i�t�y��
����}[��z���/x��
������u�#B-m��j*_^�-�5��0s�M
��'�a��.����<�����(/���|���m��0�@�^�Q�g��(c>�P�d�Y)Y��ut��@�&N����Kj��A������pn����=�o>����O�<�[4	s�PK�����vn8F��=�
X���2��U*m��b������zZ��3�c��{oh;������)��y����������������>��������ti-@se'��F�[e�B���(|�t67��f��E�����|9
�z���%�2���	�K�
����2���IF��^D�	�*� �{A��YN������3eI����O���,��\EZ��<�d�������/6�>m6C�'���k���t	�Y!`�
!��7	��V�3X9L���Z	���1P�\-���Lo������N���\��K4��j�u��F|��W�.�E�JO�K #u�M�[�����K�F,��pcM����0�\�����)wX�Q�e�<9�h�|i��,��������6�b����#�S��$�6M���s���KX�y�����;��G`���o(��V�' G�4gSJ��@J8F���PN���������/����/e�"PP�E�����.:��n\$3�E��R�*L��^<�T��a����D�t��>z�QDE��$�f�-���D�k�Q$"���g�p��;�m��WA�r��}m�X�t�Y�
O�q��oC�M=2>MK�v�����H�W�
��].�l;B��W 4�<�f*����+{��Lo�?0K��!�r���J��p�����C�_gK)�.���E��������O�����O�n=��p��f�S���^0@+k�� �0���<��e;����o�K][�{`�t���}�?��w������������\�+��1��xwNv��k�/HI"'����z�::k��������hyEDV����y5��>������YK,'���z8�A�l���d�����^��6��'������T����������>�k���������m�����4 MV;�o��t����k�vF����(�'}+2��/�/��r~n������d|,i"�7������h2	������d���/
� �,K/�%Qa��$f��n})Sg���`x�:����9�I�h�!��������N!��7���8�:��x|r�K�J�;�?�����~��;�[9&WIm�Z�?�`_����J�02���]���r`�d�����6RhYR7�����f�(.#�i��������:/'���MR���$�h=&p?�Al2���_��D��+�����^�\��� �A��8��'I6��;�����2.�a���Z��6Bo8@M$���������]���8���#/,�^P��M��u1 =y�����g��([������:;�f��)4A$J{��z��m%2���2b`�9�J(���jb�����`�(�B���F7����+�((��^����)��,���3'�����d2�����2��."���L���������������8z	B������<}�YeY����7�����}����}���KR�l��������5� V1yj�;����$�#U��<Bf*��M�4��.�����fR��Ub��8�+�R�	�����h-����������V3�x�<�~0I:WR �������9%=�A_#l�&i,�'G�����oq'���:3���/q��?z����
Gk�2��GK������2y.���iF���1��5�������c���|H��y�F���}%!�"dD{�7d�=�-�@R�r^�����!V*�'�����+L������s�c�c��,��]���1�/t�����q��<�QNZ�D�D6������',�,�9���@����T�H�(K�i�E�
Z,]%���l�i�a�hy>!��L�b��D�@�!��`�p���G@����$�}��N����,xi��Ricn�����}.]_��O�#���z�%0������@J��/RU^���2��gIeTF���g��q�?���<�m���2[*�rt�P�Y�mFp���dP��d:��N��k<z:	�����+�Ao��j����w��^tK^�����n�����x��<:�����������y42j����_3J9 1����'��t��lC��msO����a#&��;c�c� qb�,�$�2�����&M.\��-iD��;�5��x���������`N���y��#1��.��H~3m��aJ������1H� �$��y
����u�r�=�����R-�����|��\�}��d����t�)'��X�����)JP(�R�	I�]W(�c���p.���0��%�Yg��&/��j�`�~j(��ED�0������������p�Y���0� �\,��H@���0�TI����i�d��5��j
�5M��Wo[�w����
w~A��i�0��*��e�[��e����^h�%�D���������rR<���v�j����������b
�S�X�lC�_�?�!E`�8�8�\�0�{~Hy�{6��H�2�b��F�����0���zhr�!_���O�]_r�����x����%����D��r�;�;f�D�����a�g��>�A`(�Y�ggC8U��l�������u?�i`�j`��)��n�Q�4o��N����7�h"%�3���9��qOC����[�ox�6���k�����|��<�t��7*�����CKp�x�"0�I�E�5��i�#�K�����]#���^�H,������rd1q=%t>�V[��C����b���I�(��VA���,�o�(�q�,8�q��#g1W����Z=�/u��,c,e�x�1�����3�$H����1��Pg���6������3��2f�:�;|�m�b9�h0���C��M���_��
3���#�������1{O<~�q&��	t����z��%�A��� G����l�Jg�G��Q1up�N8����G��N4I:�Y@��j�f\Z����tI��c���o�������V�K�\g���^�>���b�������L6�]t���f���yG�(9�ln1��+��u��9kE�o}�`�Q�k;*R�s�L7
j����&�]P1i���s����=�6G��#�$$��(!m��
����������4��p�������������+�q�EL���c�v��c�DM'��l�e���7c�L��ic�A��qE��i���f�7�{<����a��r�d��l=����;����|�����f�y������X�"@��B,���d���\������XX�GP����B�����+�������(:���������9�g5��
�d���t�~������O��:�5�L�}f�������g��z��y���������C8
@��i:Ni~�m��k��F�L#��/q�+�]S��5I���e
q=R�o6_t^>}�l�A\��
�����^��9�x���[Io���������W��r���E'�o:2Fg�B��=�����}�$F?�$mX�6����t��I�zT�	���-R������X2kD�Q?�A��LF�w���l��l2��S+��Pg������y��}��E���[&PE�B$%?~JF�I���I:oO8L���f�[�%bk��>����5��u;���@�����;�e����'��	�����R>�P�C������������z���l�,�!�?Z�zP�4�����?A(���lO����;���������'��Vl%��x2��ej� 'S����#P�+����,��c�-�V�"�]�ED����U���s'|�������n��"�8m�e��R�����W�l��n��#,�pL�F(t�I���}C�����/`c
�w�KMn��/$�TTa�������l��3����r����3g=s�k�
5��������om�����b<�G2��6�������/E���II
!�`;�.{����������h<�N;�w=RR�A��58��H�u$0"4������Q7?)�\�	T����$$�H��b�w���;@��9�"tq\�.'�BU�����l	��X���y�$��>n6��=�H�
��p-3Y���{5re~���g�@���2��������P�C���}������e%�%������[��|�yhW�S��?�@:�O�GP.�_DJaV&E���e�����W%�W�z���)�I�����A��������0�v�������9�8;W`��aU���m�����JJ����������g����/��WY=�
VV����d(����x�"��eG���/��*���d<�g����;;S���&n�m��^�ms��6�9@�W�����/��]������FX��7m��/���M��~/�����~�"�.�$X�p�.g�q��!��V������P�3L�l9g��?��]C	$�P����������Cq(d�[���M�F���h���/��:�nJ^��;9�D�'�!�&lS��[W:2v!c��m����
S���j�pko�8[�<�����B
�������4	�u��s��e��������
���A�~���&>P3-�	j��,� �������?�y�y�h�]�=t�M'�9ND
���p�>�����
�����&�;a��� �%BE�)mu����f���l>N�*���J�)X����b�@%m�Lv�����x�z���{
����z��5}�p���9�E�2��q���:����	�NMG�����u�k_�AjF���7�h�=��vo�����O=����8����pB|�3pR|s��o�	E��1������|�)��gY�E�Q��oH��U�v���|��?y���^�&_�eq�����hl���H[k��?��?��������)���R��^��l����f��z�$������M�����fv��oL�|�������S��b:��7/)�]h�Ay�w����z���t�����%�	���#�
���|7�7�B;Cz������3����'/�w��!gg<}�������HDLo�#��=���
�$PlCg-��"o#�C|��!W�yh!�v|�#����L%�����Ic�%�k�Gg�W�_����@3X�%|uv�F��<�\��YQ��X2���Uu�1b�+P&
�jO�r������	; �����E�?����v���1\�:�h���g��m>n��3v��-G{(�
G�4��x8���I��K�N�TDx@e(���������V�S�B��T.���+������$c��E?����p���fQ�Z����`A�����J��EtWmh�b�
�\Y��<wKJ�L����~
w\�e���������,������`���/b���9G��	CG��\�}e�����x�a���|K�G����owOn�E`�u&7j���������.G!�I�KR�7B*
�F�8MU�ZE�_dj�5����\��*���&����K�����$&��
;�{��~��M��^
wg#�f�K������?��Y��y��pH{]����`�w�<�`�0\^�I��.��"F�<��?D����BOo��9���(�&��'7X)�����<w8*�gE�!8����Opu�#��D
/@���������C�/qo�e����8�����(:"`�A�9�U��b����%F#��pzyE5-��Y�����dr(k�j��~�W-���]���8e��<8���z�h1,���39���ESr���4��_�~��=�D�f��C<:�vj��:amYt
�p�Mh%P������{��������.�\������6���,;�RP��K�*�lK.y����a���m��-��1����m����Y�M�p"^%QV6�u����[vR5�KJ��lhb�[�{��w�����_['c �����m������G��g���>��}<|%9���v�c)���O��J��	���5�I�G�~��F��uJ��i��oM����.�� M��(��������}�n�����(Eg"�m����S�9w��������Nb�'�����=���N�2�{��l|�>���JF��&�&QK�G�9
��)(op��e�������3
�/c������+������M����N�����������wK��4��r��>b�������+d�c�*O(�'K���E1�O��#S�9���0k�v�������4��
������{����u��p)_H<,X������%Fo�m�h:N��7��"87{��V~�E���h���`�����,��9�}������~��]�u��=PF��r�h~2C��n3�\D�>���[h�8���3y�\�h~��R8��Y�@gkA���$Y8��R���h!���=z�~_VP��D��2��8o>r�,�@��GN��d�KK������[H���\�M��&�A,���m��%OR1B�R)�����%O@��m����v�)�m���M�S>8�s����������'�(3�rG#nI):{���cQ��+�B��[�zy�����c��[��\����CsN�)^A���O��*����y{t2�� K���.�*�\En�B{������f8N/G��;�f�����S�-��f'������y6w�jV��+�����7y��9>�z�<�Z������}��d0�v�B����(����*2{78Kt�(������)'NL�
�O%g��/R0��.�p��lb*�j�#��_���sm�$*����<�����X�V�������X��~J0?�^S�����O����|r���>��g	o�"q�S-� ��b���"��;8=�@�/���&������~AX���$z;�e�|�,�ED�0�����e7��1T�� �Qe���Q�����B]~�J��`���k�������)����_�,x�����QQY
l-��j{��gV�.#��~����<k4�������Q�&X��^F�w'��,P��p��\j%�_UM�|]6Sw�w�������K����&30&��m3����ZE���.,x�v��e�I~�d�
�|�����{CGN_1O:��9�u�� �����ut��:i�������i����X�I��@?�N����a�E�$��K{	�j8���T$�G�g;�&ai�}8>h�����I^�x�I������c6(I���&��t'��E{MZ����a���I�TZ(��z�F�v�����Cs�U{�N����!���=H��d	��`��m-vv�6���%����zT�H`U����l��;7�����([��Id�����3uL��N�H�l�['�:�$[��������IU25���v7�����M�Ut��h2z?4|-���5�e��WR���'SN	�B�;����wfy��kF'�n�$Sr���l��b�OI �zo���dx���M��|�vZ��$tt�G��&�C%1*��}���8o�w���qk3���%���?���3�b'ga�&)P��W������^�	B:��.��(r�������	f!q�-������!$�
��������)�����zH�z�3$��B����]���^_��u��y����d��%N1��xc��c���o�d����m�/���qW��������2�j|������1wQ������?���W�!��#$�5�p����=���{���dBC:;���W��TX�
����zm��V7"�^��V��q��8��
�lN<�[t���� �������:Q����/�%%�����_���R����������rU����
>&6�BE9r��oo�����l���9�����E�.����o14H3�a�f�Lq-`4
�V����z3�@�wP4���
��~���
��t4�a�g��S�J��NZl��#�q���34'�,�]�]�6����OG'g;�?�B �*@:'TT7|��Agv��"#8���#��w�%t�^�~\���	}Z���wN��N�"7f8��M���J��20�4G ������V����#�}�k~3s��6_�)��j����njG���26w�W��=�����T�nz���/�B�l��qS@kJh*)w��AB2wW"%�@���xc`��x3�����<dI�!�zX! �������|U42�wna�o����'��~������,�X���3�����{1�yY����{FoUlb��t*�ha����I�m���sWpg���t��B�3;���$�~&+�'���#6�}E��PP�����k����u�ll�<�S|�m�
]�x��O���-�\��|�s���C���%���GT�`�����X���#L�X��<��B��l�m^�qf��:'�~�u�=����d���n�e.�J�
H��x<�oK�_�U����t\��1���Vp7y8X����g���1�ev�M��sO9�
'G�=���`fx�Y����G8�\��2b~��b�gI��~�?�6n67�n��l�x�WQ�{�8�\�-��Uh�I�������L(Gu����zS��oi ^�0�&������@{4!�������f6F������L<w�e�"RxgBu&��Dc�o%�QwQf3	v�0O�	)�����0�GuH��I����e]�7���l����CX�����S�8�810����~��V�/�s�c�z�.�K$yy����(F��1s�������P��Pe���E��Wt�@�f�"�a|#h��y���%����O�rd�Z��%�~���k#��W#��#[����6�Z����8?,t��o������R��h��v(Xi�u�
����^�{'�U��yw'���������r� *�Rq�7}8%~][QyC�GP~�$���1�?�N;.+���=�)�1��� R���10��:S�����TR�t��H/�����A{e0TQA<0�*o+-��u]e�U�����~x5���Y�{V/M�D�o���t������s�����a��y8;0,������F��i��A|���/����R6�0_��8]��s4g��Ya�4���e���C����~x��4�o�	bW������HD�\�E1 ��c35#����p�����	3��^ip,;�W �v��
V6s�j����?�p����J�<g"����3q�&NQ�y�u-�|��yz�����Q��p����a���	�;3�tj�~�z< 5:�|zf
�^�Y���9.��\�e3���G�N�J��"���O��?����L�v=�iy�&������~u���r��AG_-���d��d/"m"��f$�-��
��_��-������[+���a�]����)��.�(��Sb��(�~�yBq�b�����y�������]4��&���������p�.��a��R�(olg����K��=��,|��@�C������	)��E:��0�v��H�?}60c��#D*F�`��g8�	�B�sw/P�l.H����xXdgG��rm#��^���"
��2DdlT��R���2����>�
�f�xl>�@�F/p����D�����0.z���f2��C��Q�MB�����|��^�c
Q>@�������/g�R��B?*���������t�w�|i��!��}����-�:��t��5�T�m3����8�)B']��r��� F
�\#m2�s{{0���/t���s�?���4?;Tw~��MC�aQ"Pge��_t����#�/x�r�@����s�
�Q���*C��db{����b���������jF@��T��B�pi�����3��
�f��W�t%���z#
�,�j���r��`C�*0��`�9���S2��W�a�#(����k�Bb%e!����.L�b��u-O����?������VFLQt�Q�����E��M����1l��G���xawu{�=_$���������'�-��2��Z���6�������a����r3��{���j��v��Q�k�m��.�����}�N�����A����A!m1�R\�=�c��]��2��AW�y��!0������,Z*�m�{�E�,��Gw�n�8�a#:��n}B�����������u��]`����,?�Q>N^qj�{O�����d���k��,�\0��$���ufD���6����SsQp�Uk�%�Z�%�E�5Y��L��I_�[�Ut*&���5"�G"�$����|�.���K:2�:�v�]��.�
X#b�G��;��vBEB�b��HK9���Dc�x���&��I���&����O�N�>%�z��
��O77�^���P`z|�5����Go����Y������'������j���������%	]�����b+u7�u�vT�#Fa��O���{��B�hF
���)�����z���N��6��y�?M0/�EL��"�O������G��xhHZ(H����j8N�1�S����s]�#f^
���.2)g����C�b����������r/x|�r��7�P��h���^V�,�<���-p ��s�i���;F�.0������;A����o>���s�in^�C��t2�t
_��(7f�_O��1#;��|�m����}�t�j�����(z4A�z
�����+��������r)�S�^��'5^tl��v��;,��
)d�e��2�?k���5w$	������V�!�G��Q���D-T:n���7%�G�(|��t�����>�M<�G����pB�n��}U.�j���
QG
:H��]Bj�#��l�
�kO��[K+^��j<�UvJ��X��4���g#�0�'���[c9���}9��b
"��(^���LN���v���myA�_���Mv,?�k{���u�&������x: ���+��	v��M����k85&gV#�_����o�����i����8���h�f+k��	��YmP��xv�l����1���t��:�P�����SD_�����;�/��/����1!Z:�{!�^��Cd���94z]0�����I��7����f2K)�,�"M8�'����i����b�T��?���r\�@��la{���&�U	FUF)��N�f\��5&a�����9��k�Y�7����Iv`�o�"t t�P���s��
�*��|�
�}Y�i`\����q�"������st�����������BsI����Y
��OCX���`>��<��
�x������]���>�i�Q�Kk�NrW�.S�g+�rB'��yS���T��T;�"��������9]�����!���_PQ�R�9Id|A�f�SWPE��l����JT{j9Z�3d�yE��Bc�f�[><�s���Z��[{�^Q&z*EX��m��3�s��9�Yp��8��N���u�Hq5��WG�+Hw�z�Ay~Ih�"<$��Sy��PQd83), n�
q1�����V�l�"N6��@N���>���
�z�s��'Y�����Z�'-V�A�����lb��a&��
�}�"DV��I��K2=D����p�>�W<N�H�krG���P�8�����Hb�P�����E�����x��A�H�s��3nI8$f3	��W8��.��������l�����(�k	�:(F�fS�+' �
T�z�+���9k%^zW�����O9��f2�(_����O�o8�����Y�1��K�Wg��pfV��[)�9o3L�t>����N/N���b����a�Ea�?�b��i"�2��Zc���t������t�'\Q/���s���Q\�:�2,T�+p��d?��M[B��h��!y����	a�`o$�q���D���J�����Ql���=���V�K����.�'"�hVN����V��WF�+J�c�s������9��"'&���c��.�A��F�"
��g]"$��!�=�q��\2B\	r�c�-���;�1:K����c�����������)'���	]�?R��M�b�PX�N���y+���A��i����(���3����������b����s�����,���upF��p{���2�l���+DQ�_j5;N����X��������_*qZ>�lE ���pZi��a��e�0_Na��t0u|o�TA���z v��8M2�1����zFC�fp����1��c����n����ue�\/05��W
�:a�-aP�/�w>�x�s���L��f.j��.���l�Fi�&�[����.���w�g#�d9"�[&�I�w�wM����m��xh���J��z���QW�V��W����GK<~9D5o���TN!B9<
TG���<~�y��e�h6=�����{F�mh^P�^�W�s���]�)l��t4�	2���[f���q��I�.�����t�	����Oe�l��!n�S�N����egi?�R��O��'?}����{��>�����y�A��
��Ik_`�m��^��Lx���R�,t^U��p 
�^����d����OG=Ju� tfwF
�p�1���	�$�;���E��\�8n]��^:�5��Y��'������;��������(�e��uFV@����/q�r�J���A�"@���E�*��^�S�3�M��G�zz��VXK�������F����R�3���j�(:��[;g�����/O�����~��s�7��UNO�5��~��>�f�$���(/���:�&���+��3�-�+ z����o���#
`�;�� e���+��tK���q��M��H���\6����4�$sb��'��Xx�_z��y�I�K���:�[������Q�	l�FD�C�FgTg���[O���0K�}|��mH���8&��{������8E�a�E������Ow*0�yV�G��I���L5E���2M��GQ��%#|�����(
O�\0Kz* }���T-A����22j��j�����"B
��\o���]n��CDCx�%�Rt�
�U��[c
�m�h��'��F������k��Z�����!�8?�Z%���<�g��U�l[��%+�w�����K�����������j���?b�����p����������B���^:�\��B(^_�����j��r�����+j���PT�-�I�3���k9����	 �mMC�<�z�>�+H�����z	v�D���8���N�GQ������X�G���F=����q���rQS�'��M��
R�������;���J#�����(l��h���Y��m&��L4b��N����wZ��sHd
.x���5�4�pp3��9�p�sh��{9��rI�=�B������Su���;]G%S�m�u�%�siV|�E*�M&
���F�f�O��ba<.�r��g^
XQf��hL]�����&Pwa�S���Z�w!�4�PN�tD+�x�k������S`])GM�k�N��^W��:_�����f.��|�Y���s���bM8U���sp�[r" ���@b�$�x����<&�W�}�) �G�9�X����gW�Iwx=hTD��65��[���q3s�(�M'����������z������6B����<�<E�m�|��r����<���q���-�<��z\�t����:f�}Ks��N��6���*[�0E8�K+[-��b��*�;��'PM�o���}��&7��(����g�X�'\����d��npJ������H)mH��`_�������$�Sp�x)��2���Z`g8�bn�W,DJr 8���Q��5M�m������L�ku)y<1*�	q��b��D8yT���u�g�y%�E�Tk�b���x4J��Tk����%��0z��/��G��
�4X���2|���R��o�V�T�@+7%��-E�����Tea���@�&2#i�M�G���.#�������A��K�?}���r/g�����eM���s�+����{�ft8���/�P�����#o���-e�J��'���[�1�#��^I�S�#����c�8K����?�d~v{Q�5���n��`�s������^���,b��!p7�����<���d"ex�jg�����G������b��spp�{�:k����|88k��?<���Ve���Y���',CR�3�Q����#��5yO�y�b�F-|�)��$���v���x9X^����4�`X/��FI<���� �=�w]��J�������X4��cn�^�C����3���N�*��Rl�js��p}�A��B�@�4r�;�&JO��P���r�C��jv�FPV�U���;C�\�6���3������M�N���������V��6a�.S�����+"-�.�������{�6�&�C�X�1,X8���`b��$(m�&�Y��,���s�6����=��"���	��a��U�Gm�& _�4.�iea�2����������f���T�;JU%r\A�����V��}�5���"%w`
r�i�3o���=���``����R5������^QDw���^�(����r�
�aP(5�I��1�$���ww9��[�7>9��������x���g������C�#[$)�|0Wf��/?�_>���`�������?;;����;��(����P��R<�
|f���)�r#�C~�����x�V�����B*���X�r�,�+���d��K��p[�Y/-P{s��I������d�j���x��C�!���7��6r��eb�S�a3\�kbUbQ�N���!���E"�N�tg�O{1�Z��9����
4�`p>���W�(�(e��?�\0��=�g�tFlO��E�s{)m���	/E��[���������pa	���i	���G�QR"�"J8�Ml����rGH��a�7C!��8��1�j���r��\����{Zg��+�L��������L|e���D�>��������������@+��=4��@��a����7&�b��7X���0��
�������1�OnW����,�'� ��}���8K;xO;��lq>��|)�����R8dp��E���B�}�903|?#�BC�]�qO?�b �X�����.�=����Y��L8�F��w�$��B����������^�=��0����f�Q
��'uSxO��st�zaM+���H�d�g_5�['e�"`�8[t��@��43U�S����������c>hi#���.�a|5�;EkN��Ke��r|E�d�w�����c`W�,���M��?,8u�e���]V�`@�'��7��70��%���{9�Yi�"���-������#���T\}<�c���A�<��9%$�{��v��>*U���$�(w�h���PB�����~=��z��Eh��pV��X�A2��?�
�3�!�.5�|�L��Ejq���q+Kx�CyQ�8w$x�
���hU��m%�M#8h����r����A�@�Z�7�
�Qn���dT�j� �$����/<����`d�?��OGpd��|����i���M��e��\��"4!4jw{j��x���h�,�aZR��	;�Q~�4"}3���/�1��hL�dM�����E�A��(��zt��Fj�J��t|tg$g�
)^��M��[2�W��5�R���fZ�� ����'&-����{� �W�Uu�hf�YX	(�c���8�F����T�����o����|��;h
���{I����]�����������7qp��_���R`����>����(*H���������1�r���W�|���d��Ir2>��q���{���1���n!�*�c:0t�B�Bz�6�u��h����6�����G�G�V����#����'_+��IS���@��NG��N��}!��,�6��3��B$�a�mlal\o��'�Qn��R���R/� K�&���8�����a���*�����#O��\�
�)�?���:9�e-�i8���Jp;	!���A�$y��T��&3�V~���I��t"h��=�q���&g��mG`��� ����v���C��d� ay����6kmW���V9�M�"&*.����p�5����X;@�"z��XW�����D+%��I��u[��Nt������Es8N/���$����E�p���o��F9A���Ai�s�y�L��
�@�ug��MY=*�	&6Dw���|4��UFr4���zFBv-���a�9I��x����$�\��������l ��x4�^�����"8I��lYA�����4&��s����Fs��Hi����I�}������[��)G�'rpn`��._����
��j��[U�h��"�PnS�|��v}��9t�C]W ��o����
!�F�_��l��L��:}+5��:��������������5�8/H�o��S���s��@/�g����V��v��	*c�%N��U�"�K��-�L�]�4M��M.@}�	B�E�%�����=��?�7Gg� p�F��K��L�.�b�����^�#������]��OzH^��W?�w�'���`F����4������%�8�xxvx�K94���:h���N�z���`�-'�Q,��hI��v�q#�f�>?���P?0��<�p||tr����F{�����P]bH�4����#�(�)y=K����W�u�}��C��N���#\$����c����BS=XY�L'�vd%���c+��HMl2�7��.vu�"$�q:���-����,��z��� %��N$���o7��-��u����K����-�gj~���8:#��9��
B��6��g��qhH,PI�V������pw����>�����������`�}Y���s2M��U���5y�;���)���Obg	����5Lz\������8C��7���@��.lH'����y�6�J�l��P�5�s%	M(�Y��*�$��$TAp�J[���qw:
E�(��&\�$�Z��%�Y�[I���U�������vf��f�x��$Y�����K^<�o�^�$�&��BO���LiM�g�>-���2��LG�	]S���*���V���� J��������u�%\�I���I���)��\�H������N�H��_��'O^�i�H�v���;����-���4��D�|wo��=�J��]�k�[���^o������w��%/1�L�4����Upn�$6��3W��E��TV��sh<�?��i�{x�Xx�K����A<���W�^�KX�_�6ZrU�2H���r&�E����$�������B�b���WZ�����/�J+���%������+��p��/����/K^f��AaV����i����d����fqA�]n�:�������q����^��d�9�_�wG+��q����K\�������
Q�x�:9[�p��s�Z����E���w��A���L@��J��%4�����AY���A���^�h
cw9
E��l6��E���:(;�����'}@)���@�~��.�%�<S�Qqw���W���q���0]��@t��Z���.u:���t������h���:������z�|������_P=i���N�o?�{?v�9���&���!���P��y)�(��=GA�a>��h��A�}�s��������������H!e�����������n\!���n���x�A�r�������������]A���������P��)��n/�Pa5o�0[�����)f&#ux�F��0F%����o��'(�h�u�H�*��!0���[!����6�l/�a8��I�'��d�Gn4��le4�y63WV�3�D.����}���c��+sdu�����NLI����v�=t��z$�C��p*.����s�K\�Rh����
$�aVQ�Edo�}�@W��wd��Iar��^[9#z��b��!<�0r&����6�.�������n
Z����������������[���N~-��.�d
#����?2$����t����D����4��������T~�%�mG�o����`���sR3$]u:u������v���3��}t	��S������Q���?�"���9�O��3MR�b:�L�
���N���~����b����Df�{dJbJ�qN�vh�7����QR,����U����
�'�EF�_�M�
��{��eCM���("��O6^>���Oe�Hl����,����\��0���r�w���%A���C�����`�[�eEI�Yonl=���_�[Q�3Z���������h4N������Z	�vb!��W�������'h�t�v�^���q������o��
������nI�B�Q�-��!���+<J�6�����s��I;������7��Y��~d�#�R��-LY0t
�&�����	Cp�(`I��'f��������G6=wI��f���T��4E�d|I�3�.}/��H�=�r�!�����
D����i���N��[�����t�fW������V�&�L1p��,eO�Q�!n����������j�#%�S���U��O��)q���x���VC��n���c�d��/�
��r�^�@�G$0�8M����2b���{~���q��vAi��d2�K�TCQ�������n��
}����q%�;F��@���Y��Z�����������/D�Ol�@5c�w��`#~����x����vN�D�����Co��Gd�h#�HT����O{'���6���������W��D�k���I���������w��i�w��G�;��?��'�8U�&���[Z'!��u�/�4�;;#t|�E��\�}r�Uc�����}(�&������$�
��?����{v���6�vg�+:fA/��$o�������\FTt��.N��h�K`��=j3Nyt���.;�m��~����;�m1��
���Kb>�a���\
=:��.P4����9x��[�&�Q�u�'�Y�k�����#i-�������6{�%��)K�p�t�����L�-w��_�w�\����o����/����4�4������c�]��{��������{$���j�M9���#����~�0-3�XT6����]���k�������n�^���Gl�����-7B�_�^F&Q�r��*+��bYE�::����t�c���
�!����\K,uVb������x��Y���/�/4w$�`+�(�>�F���E4���`�4����u�!tAAok>%�����}�L�ew:@�J��/Fo_��.�Ds��.�P�&�������ML��sa>���������.{D�".��fP���c���#����m��2��]���
�=9�>��
�^�"��6���1����
��^M��V�
o7�'����.�S��m�'��s�����b��l�����-~n����.lX�kN�N���x��>�'�����G����m�������f�#�}���h�}�9��"��_H/�q�b����|=����}�4�L+P���=�?�m�7wh{��LVW�3���0/��+7>��c��Cy����6n��[3��Y#r�YW�B��p?]:��<��WJ�6(��`�spk.�c�K��k�kp����]6���L	�v�ST]���U(�Od��J�����B�v���2jv����!����ex5oqV��.�:4gL�1hz�Ho~g�
WX�t�
l�����4A�^������T���>��`�
�g����8{
'P�wc����|������^[��Ww�Z}�r�Vu��x���x��MJcb)%��������-��a�|�wAuD�w~�����%�2��x+�-�
����R��>�O3�j��az���o��@}8m�wN����v�?�q�F��'p��t*I.J��W���EH�����f���l�/d�fkR���fT^	u�9��UF$_�]�e�g��Y-�������_�!��<�,r��7���D
F
H��6��v��r����LR���a��m-=��9�)6��|X3��������N�#����.�f�W8J���*��+�:G��=z"��#y�����NF�������s2�����LSx!���QBny�I�"�:Y��L�D�y�%�z��+�rd��XL����S�)o���X������X��]�<�T�_G�>��t���WM�3g��@�J:i�����[�eHY4/k�oJq�V�_M"D����5"�x0Q�F_N�����Kt���Y5�}1�CX��m����#+!�b�]� �\ ���jLq�9�����z��.����1���=���4�uqL�h����G�B��x��6��dL�Mjvd<�\�F��\Gpue�%rr�Y���������G@`�-��t�/r���^�J�8����"���Y������7(~vU~��0i���&�AX<%�,������[q&������|���Z��-J���FG���� (��(-Y��&���m�D	6��t;:�.�_��[��G'�u�tG��?������MN0�����R�%�)�����)�
���=3��h,��W����S[U�9�?���u4vs7�s6�--�I9�5A:��!r�N�[mX������s!*.*����j���k�y�$�V5�#~���������s������3&a	�`��{����8?������4�����T�����$/�?��<�Dy�Q�=����S�*-x[������/jOx����9"�&|��A�W���������:��$M��x�p���ZS���mA+3<C�g�<^�g�s���ipV��H��,����s<��5�7�<
�e��?���_�����g��X0����F�-%��{}�=WN��Ih�`������-��1a�g1~����Tul#��d\�Nz9��J{(Ej:���i*���>�#���z�����uL,q�3k��pE�RiuK����E�I:�9��M�1�w�'�8�(��KM�F�"�cr��@)�2s#���&G�$8'��!��J����D���;��>p��4����Q��(��y�j�H���c��uM�l��jD�J���%�"�T�4s����+��0���+1��R�O_�#�0�OB�X���L�K`-I*������IY+�c����i��@��H�c����0��&��-a�;)�������P��C�h��i?i��;����(+���)��Z�a�Ha�Km;R�d>������|^��������#��F���!A-�M��M-��tS�@����,�g������~c���!cv�h�������de�D��!d`��T��e��isA&���uQ�W�w�����|OJWs�,�_���62��'��DL8��O(�~���>��B
\�H�(h����c9\��w�X���$$�lafh��X���n���&�����(��}����Zag)��@����kuw54��h��Z��V
��$7o�p�O���1A�`����;�sd��nL;��N���:c�>��i�H������x�w���!����/ ,��._�n ��`*���a`�9�u4��1'��������u�x��m�2���|����Sa��}�j.��Q���Xn��(�w&kD����1I�������d*����Z0:�����;���\��B��g��5��
C��vJ�]F���K����:��AM���i�!<���<����=�t�1`�)��M�1y�v����1��+���,�z%c�9���1����{�gj2^�2�jv��5f���������4������0o�+�O�>��v�������h�hjc�4	�-�*^oF���QQ��Qw7�[N�}�@�	�����
���1�FKs�-[�v�?���pCo�AC��n��hu��S���gG�'�|v�g�Y�Y�����3z_�����o����_�]��2����G|-,�����������ik����@<��en�S�p{O��A����i���t@
�^��|����@�NH���N�jWXM2�n�-8�?�q2��e�s�f�4r`@k���N
<lW)�c��]�:���'N��U��y����Dw�(�	��,c���y��Y&���+����/#Xk�un$J��f�wg4\6w����x��6�Vs�B#"I� ��������-X�H�Ln��������{��J�y!j����_E�I7G�s��w"��f�rT� �/D�����R�^%�3�E�;��j���HVp���pT:-^�)�.@��a��Sg�{��?@����R��
���o��=�Ak�q���5���a�����eYI�����9u��R��
%S$�Kn�\�0^�J��+��v��g�7iw��t
�����o��C����ab�4���X����u�7f��Q�(!�F��A�	���	�)'������F�$N�
���))�w@�z#Uc*L1���g�M���
&^� �R(K�f���<�5����q������}�����k��
ko���X�)z��,��������8
-6����C��2�J������6����@s�6D�U����9��(r����/����{�6i1����*��q�%NA^8$rD�59�;Z�0\F��k�*-��K�m�/�����_��s��Zf02�������(�A�1L���g|��l
�q�s5�{��I�!�F�a4\!�1�Y��-�"U�hH���������%�d��\>�
�A��4�j8��N����!�8���[V�6��&Sd�$��>���|Q���S�G�7�f���+>�m�=]Et��,�;���z�������w�
�8qz�+�	H?����f8m��M��c�������.f��G�x
]k����fbJ�{��)^1�J��
W#�=�A=������=2��{=gS��p1��?�W���n���)
Q�v�Q���k����e������t�W���jSFp����4H�&EV��^tw/����mwS�
��4��d��i����@�u~<<'kb}���wE�������0%�''�-n�t������(Yb��
�������%�6��E'���V�if���]��+Bc�T��%��Xr]*}�/W�?�d�F4��1�\�a��{8�9�b�5�xm����A4�� a��)�`�5Y$���p��7�=���
#_�j������i�3~3��]�[=}g�oXi$?L��F����^�r���z ~x��5���&���qx9��X-���*�@�|��s���z��8K��?�'��F���.e��xM,�������7>\��8���0o���-a���PA$tV��Y��T��v��#�3�G���E�����G���s@��q�w`�9�L�6hyT��\X�o�$��<�
�a��dK��:��������EM"��E��k���-��\�:643����L��v�z�WEet��d����tG��l�R.b����9�;�L��`]#%R[%R�@�R��zC"�H���V��%A�<�9����uf6/�)�����_ww�����ax�	N��8�C��A�	�zT���m����G��r�����f9zf1��U��i���I��
r����]�����x2��'kzV���/�D�"Z,;���.��r�N���3LC��;�!Z�B��Q�ndz^��'���dcy��; \L{bb���&�2�`j�������|SR��}��X��))����%������8n��|����At�Xuu��"���������N���0k�be#��vEQ��]\����b�M�g+�b��,XT����$���Z��W%�*X��{&����������+�1�������.��D����M�����CiK�@��Z���/��������^:&�4{����1qn���__`��8�)�@����x�%��#����q�,A���h�~���mrzI�0��
%�%��Q�W��n�\��D`���
�O�)����|�����?��\(D�����IV�Y��v����.U�8'S{�y��V�i�y0
�az�gHe��)�F�B����I���[��G(
g���c�����h3������$�����\I�@��y�,-��J��r���F��c�;�1'1o�7���'�����;��%�\��3�NO����4��saU���|�p�����-B�|���a�p3{��#%��0�Y \�/iLf�?��x����Q�=]Z�)�Aq�k���G;i$������Ug���������F����(��,)j.����$�����ie4���i>z.}=B�w�<�B5��F��J	=QH{����jwve-��^�Y��,#��;�zW�����K��
g�y�$3�����X�0w�;�����H��-��h2x�H{��&d��3t��'�@\tB��6��Od���D���nX�9&�8?�&:sf��aE
��o�p��S���f�y�o��6��;�3����!�x�����
���Y`�4��a��G7>4�"7�^^phDJ1|X-��4v�p����I�O������N�i�p�^�ya�^������}�h[��IqN3��������/E��^����X�`�s
A�0���gE��Y���E%F��@�;�.����}��f���v5PH�#
��g��w��Q������,U@ ���l����Fl��9c�L.�A�7K��+��}���l�Z�J��S�N����.���vA���1�G9�w_���q�e6��l��������"�� � ��>�J(��������i	t:�H�'�c�����7�7���[/��D�k��)R@G�6�]7���t0o�^�Q��j ��urrt�y�
����~�$W����;�����F�~g����������6|��G�����Sr��/���p���u��:5*������7���n���}N]6�����>m�����nK��|������r���]�_�p7��cs��n��r�x4��������G3^�yC]$��C�1��� ��������{��s�*c$r��������X&c��sjB�T�1���/��2����~�8�c�|�������Jf�M�^+�H����BC��|n,���.����W��������U*�%Qa�!���y����=M/��"L���y����w�a�x;�W�������FX�1��������{��k�sk��Zi9����8�G�������H_��(�V��Vqx�`S2f���r��|�#5@N�`n�=����,�XMEI��C���������X��:-��O�t��������5j�,���2�OA73+� bz{w���O �smb���EW�R��������:BW�t���|ni���)�����}`	������{�K�J�t�1�[�L��]h!�6�m��Z���7�����1������3�`UPU������5�4=����x�,9ISg�i�	;�>��o���:�h�
8L���E���y�s);-���qG������8��n0����Nrt=@g��)Im���;G^>i�}8�m�A�l����o[  ��G�?�:���A{Z����>�d�6r���"��C�{�]��y���g7����W�l��N�]-�,&:p_t|����s���cw `eC��>�I��L���t�l�m�J|�sv�p�@�g?�}�����#N�/6z�L�l�l/����;1w5�CD&r��Th�	
���E��
�t��a�B���?����O��:z�.<����9i����������%N-��$�^�A�%�2������`��s0����.���>����J�tc�$�����6<��0R|�g��!���]y��'��@BwI��D���m^��+�m�U�'j^�8@�%�[`����u"q��pP22��U�%1���r�������h�������j_/JR�3" o�T��T��$��_�9"\����8���N0w�����\����`kq5=���5������k�r��VB�JX4+��<���[�jO��- �����7����0Qa"�dxmVn����4�=V.������� �W��7�U��m�{�����]��])R���X�=�����:��������nK�/����~����%��|Q1h�]R��'Dz��<GI�Lq���{v
,��d;�q�tnmJ�;������O�u�.�s���S>���j.	#�F����?�
)���'$��{�[�������g:��e��:������{`�1�=�'����~��{
���]���<D@����r�\��+�q7�����0��[���\�RLH��XZ)a<�tc��w���FD��Xu����q*�
�:!%��Nz��AM��Q<�d�9/�h��'#�1�:��&!��������9�d�m����6J��U��*_:�
tC�Wx���/����~�?�}�2��>�'*l!`Mg��w`����s2���TYY?K�~DJ�#�cssHl���9���]aY�0����9b���r5��������j���I��)����-�O�J��$p*����4z�_*���8�����ma��Fo�A��3A���6����r�C;����zO���t��o�r���_=����`�f�^T��&Ns�(�w���d{M��y[�����L����3��}/�>�$���K4EP��N����y8R�������q��3D1r�q6���Yt�9���j�E>/������G�h8���m;Y�(	���Jg�����R�*��}�G��;"��\
T�
��Zt���8=/�X��1n�W��Y������.�ry�ATO�C!���$/����_c\��Arm���B]�����2$��T�
�k�yj5_��q�I�g\��l�&�~���RG�<�X�*�����M�a���Z�w�'l��������;i�!Pv.��N��Y�_�Q���nM��X�kv�wb#uf2��EFB�Rr�f��.��~���O���s&����1��5OX
�gs#O�O�P����
	-�,��T���//�<�Uk�kh#�_K���i7�_��wTX�9�����]��T�X�����}&�R������5�X�4�{[i��K��2A���<{��"�,9X-!���Z���>/%���%�}w�6x��I
���n����	fe�1N��GqAp��m���l}�v�:�,�����V��I
�����~D���2���D������l���%�1��b�k�n���U�(����]��h�"P0O:+=S>U�r�9��b��i�����G�@=[+.�����o]r*46������KU\�����o�W�F�2wh��w���M���2;��x��;P<zN���E2Y�`���>^v��Q2�?rhH%;%��Wsp��b��xhKF��bb,�Y�P�ZDX���
{��S0��N�?Q�����*�hc;=�{9��.�?P[u�,�#�#�IrI����I�'�=d?v�+�]B6�A	�� 5���I�qeRz*�qo��0���[�-ci��|=����X��Bkv�e����du�7�mZ-��a�k�5���Rk,�[C��!t_���^x#zz
�^��EN^bF8^NipzJ�lt��=i��E�oO��F��������o�wN��v��?�p��H��G���[�4�]���j���6�
������������f��m��A�����/�~����/�����������4�U�}`p��"�)�qn��R�L��U�����,�&"����q�x��c,�^�&t�c<�JR�p��h�Es�������#���jl%����M���������(�>6�������q�-+�l�*DVQ1sI3�l�|�>���/vD�����"���)�������?2�eT>�>Y�����f��|CF?x�,�c3�i���������7}�dVt��@��Gz��8=��L�!���>��4�6$��z��:N����
�p�7G�l��n$yt�?��(u�����e)�g�/��������6�/w�;3<tm9��WT�Q��'����������nf��~f.!�Gn~R��lKy���2��|�}�V!`�P�*V
m|���i�?�0F��N=��-2>K���d.�j\���&[U4�E��PB��bI���u��*�$��N3� ����Cw�q��(dO�k��@1Gpj��������#�<yDd�*#�9CH�M�E�u�.��S������@�����6������1����qx]d��{~�n`�.���v��o��<�7�Pom���pm�9��n��%��1���!�����\�G8
&MR.�`0h�{r4^�����c(�}�+wY����2��%�K������
0��C��JkD�����N�.SX����(�j�������?�����4�� @�|�7���Lc�\��^(.�����a+C�Q�\��cI��d��I����}f��%Yh�K���q:������~+KUN"�����3�j-\H)�p�X��#���j�n����O��7_:����q��'�tS����VS����3�s��so=GJ�>���}�d�|�����]�ihY0���a�v	g�LWMs�F)��)���F�����_.G@!�La���E����n�x��pYl�
Z�h��������Q�yT�p_��p�Kk.`c//�����`��Y|�'`�����|8L��H0�*W����Cx��>����
+�����0�T��B������v7%& ^����a�V!�5^����O�`%w�+h/��T���bw�&�E�],�TbML+�z!�S���1�;������nMMPms�N�&]�,���yu[T|$?�@18�cHO��W�`@��rtJ�L������`SH�Q�C��)5����q6��'~J1�h�{��'����FT�w}��}���R��������c��S��b}�*�����vmhP��
%�y\/�����%ht�|3���t�K�y:ml�c"<�9���W?3�tp�����������t�s�:��j����sr(N/��m��6�����h6��C�K9CX�����}�-M�����&�`��P����.�+
&��nr���������Y��IN�[g��#:j���c�Q<���K_^���#���A�����7��pL�H����zT����3�A���S�:B�/W�t�?[�BH�r�{���>�i���?�7R�=��h��`���h�[���
�o
i�|Baxp(��NO[{��b�j���/q�(	�	���b��q�B���,0B�����n<�C�.i��f[4���k���m��(���y�����*��	�����7�'��RaxQE
4,��<�i ���ER�����X� �q���"��|���0x����6����������ihf��)��3�D�(�`�p���H�;FxE
���$/��4b�����I�m�+
�S`����{`I�5�>!�y/�:K]����&>�'c�({I�f|J�I��f�>���r�X�o������x�j=�N�h�aG����~�v^&����
1l��C&8����QqO�m��`�N[�-~����l�J��,&�3�nnM"9�%?�!�������O G�������H��|=���L�&�����;��;���g��V��D�7S<��1�#6{i#V���C0k[F�N�z�w!�C1��?u�G��
sAb�h�S����2�����d`�)������t3��Fi���r(\���4�x~g���#����z������e�D�w'��M�D��y�{ �8�.�0G��y���n�������ttr�VF7(�r���;%geLu��A���C���jYz������k�
�1$7����Ts�TC+����Z���gDxG�=�����`�q�`�R�J'�Q��B,���y7g<*���3�q��r]?���l+����5\������qt�����I�����������H����I�0����o�D�[��P(�cF�V�y�D%pN�C5��b�Nr�/�p�
`��_����L 3�>���y������x��Q5��5&�
��X���@������ %����X���8�;��-.��jH*|3�m�l��ZKn���P\�q;e��P� �x�C�3�)����P��
�pe�"��|���g�cbk�E�z�a�&Q�`��x6���7� A������Q	B�M,��8�����I��w��V�PB��n�����Hz�
�.�lr��Y~��
�����u;,�z��E���{� ��/�$�������k�8S����v������)�L��a1@�v�g��sx���o��6�`���/%e��0�m��#��y�+�qBUz��Y�'�t%a��xo�<Q��M��5s�����y�N�+A�g����%��i��4�����L��P�t����T'�$T��L�L:�r���@\��G3��T����������� �9$[�:��A�CL� A�nn��r�j�c�0�I���-`�z�����������9�0g:t3Z�������5����[^���F�<�b�n�E�4R��$Y�Ys6�,��Uhd���kf��B��!�`&{o����|ie��	�����[~��v�����V�a2���4�U�[���O2��.����$�����D�@H'w`�n���H������dOt_���6	�0��Dk)+��6����T]�����7�"�Y[j���U-��e��3�2����{:��
)6m��Y<��~<��(3?'��,0���K��7�y�7�)�-������5"g���0�������Sv��v
��wu���7�g������h)��#���W�v�	������]gFT�},��c�g�f(c�N��@��R�O=�0��,����l�0'N��� �c\�(&:�w��d4�D��d����'���xy��l���������c�$���g���wG�$��M���	�|E�yLA4.So<�.>z���T�k����j��_�C����������U=�������2��2�zM���1�U(158Nn��r�G�E;OlE�:���������y�����.�>��FC&J#0�j�$Ec����������*]�����3����hn�	��o1g��l���it�
���B�[���F]�@E�������i#���oB���A�����������9�n#S��\g�W(��Ne* ��.'B�R~�f.��B���g�Y�S!hf�E��Yz�s�f�����<�s��#M��<��M��l��d�������������P~���I����t�C�F�	��rq�:�e_�U�("�X�Bmz��Q�ze�]��0{eL���9~�2Dm~2�a:��0���.R������%r��$]^a���C�~{H&�R����L�~�$��p}y(y�����	L��k�#z�Y�WLp�����l�� S$'L�e���o9�����|7��	���'C���x"�)��#iz���f��8�'@�f�:��L2��@�X�l���#"+��8����������b��>���3��1�|�v
�U/�lxg#����:�����Pb�2E��.
��cb�l9q���Rd���fM���N�>b��:��,5���|1IS�C�K�$���F#�:;Y�)-��[��gyh��r%�����F\�*�N\�h#z���]0+6E
0M��#��������Og;oI3�l�F�S�<�+(��������A��zh�������I�x��{���n8�|8����L�3C<�qF�F��;=f	�I���7��]s��2�����:�j��%�\��{
Q�eWNAmv���az(9��
�$�y����xc�2UNbN�eeT�_��{�4I����/2
�BB/s�ZM($��?�n]�&�������!M`'���Y�rD��`^t����b�/d�$pd�Z�\)�j����q��'
�H������E��T�#�~$IbX��L1\�6��41R�s�(H�B��\v��#^���uTD�Aw��i��u�?]G��s�����|�������e�H����<Y���7���l�L�����b��n�eU�������?�'Z�hlD������7�����\���iS�a��P���9��CG�z�����TWn��2P&Ls�Gg�S+J}L�z�B9a�=��Gli�����B_8�D������k
|�apD�1������x���z�/�	b#s8�93	��6������"�B���B)�
�&�kq����4o&�V�K`kB'A�K����4�I2�9=E��^�G l�/�x8@�+���+���(�����T5L�A)-��S�dL��<�+�"�)���'�8����>�U�3�@Q��	L��-/�W	��a��8!�1��Nq\����x�&�0�����������d��s����
9T����Zv��S�<�w0�p���[q:�/.I��w'1��$������^W�kB��U�D����9q;�qPf�:��j�%�PC{��j��h<�;�{pm,��(u	9����0|}�@����Y�x\SZz3��G��a2��1p�Xu�R�UA?�4�	fE�J��8����<P%�����\!��tM�$#���Y{e[��{��{p�n8��vC	��6�U�Dr�d>����B������YW��������������p�L��d`x8��qW����	D��1�n�Fv0�7R�1������'��Bm���sZ��HTh�~���.���#�-v�������E9*e����uW�U�h;)�$�4�i?��cKO��8�������S�uk
��p��aqw$i'�$��/$U(�\4��i���t���8�m��Y��RiMm$fQp�3�`C�a����1yI��i+7�}!����i'��_ww�!R��8����:g8+��������#.s�#a[���SK5�?"]���&��N�:�
�@�S�	a����cG�yP-�"1���#��Q0���IRj��y�d�8��t��;X�{��mRm&8W�%	���tS\�/hO���0��e����H�VG�X�%�Q8cRX��i&f�f�F��S�N��+���:j�_����,%���Q����6�1��:����Hu���O:��X�u*���;�;+���F;���\��COm�K`�P3����\�#���e�XS$�[d��rw�],�4M����� �!�,��u&2��rpE��9��e`�1�;�NgL��YeS���z6Hb����2�3��`�fK���>9�A�y�5�\���r����K������]��b��v!s:�be�4�o+B>v�]����P���$/d5�eC`�S��66�����
L8���?�{�s��f����GX���d�ux��nw�D)����wU���o��D+#G��Z�^�"�:�q4����SW�{>��C�z�d�����7����B�q�r~�l�~����M�L��lE8����}�oG�t'2�*
�A^���q�y�>���w������a���	�Ly�����e������_B
�t_RZ����J�w�����K�0sa���H�e/��$��:4�m�<{@�_'{�%_�nr>��d�F�)��A��n�]!���v�i_`�uI�����8];���8�-y�Jn�[�z��s������#��u<�T��t��b��3�)��OvIS�)Il���q��q'�pMcb1>��.���M{I�[��P�����X��iI�u��uQ3��\�F������S)���Rk#��c��2�$z�*���
���G	���yF2�f���PoS�O9+H�����d�.�i�}��MjR�VP|C=p�7�Y3�s��_4f,loT	�xp�R<�Wnzm�L�����'��]da�F����F��A�O�~D ���d!l���d�6.i�
D�s�,*��_v=f��aM%6
�N�2I|�^���;f���.�H���� ����<�b���
��&`p�{p!�W�����C�H2��K<fC`/\�*�m�����L>*��}��~����D���g�B��������&�g�f;G�3��Z�I�&5�>zJ?�,*=��VN2�lM�X�S��!�[lT\�`�+���Y�z�.F2���_�{tx�z"$�M�������8z�d��j��"!�~��C�g��S���<�o�"5x�-j3������Dr�,����]<�8����V{��D���������$@�'�2��
��6�Y��eq��P�4��G���2��PD*����|����9�.--�'�6��/�1�E7fxsa `���.�U�9-2k$+^{g�J�Qq����U�9���p<52��p�D�d��+@�
~I�z���Rjp�e{�M��ga��S�h3�[�<PT3�8��!��@q�k%��d�������iq�g)
=�Lu:��C��$�3At�K���E��:��<f�8����L��#�#���)\j��9".�����F���I4��T�-���|.����x�P�9�I
H��p����+6�-�,G-2�"���G�]���~�Fa��(g����g��O��w��qt*���r"��������[���	�N	�#M��t�v�@�5�����)���1�!�w�����GbD�W��3������S���S���Y�&':����s����1SF���t/}4N����%=���ZM*C
���R��qW=���)��z������g��px��%�x����k�4yW;Q�A��fP3�W+�g^�f����0wE
JT=@�B<CL|Gn��}E��D%
k���8mT�C������������xg����BO���d�I��,k,k���c�w��Qx�L���������[�PFWF0��+��������� ���#�e�9I.�����t�����up>m��D���i�Jzie�
e�L�!���@��(�.���/���W9��[�qY�.��:��io�)5BD�?��S6�9��-�X�Be�A����$^���}N�o-����?8�`D_��.�5���+t�7P�aEo�@��\��
�w�o��j���xk�Se-K&�t�6DA9����#�m���B	�N�	c05�zOL3<`���-���D�w5�w4�nk��A�0!������� 	�������I�pNI!aX{��:������Gow���xg��>��3a�)��x�N�9B�~���!9i����`�5�z4-��?�h�3�Q�&1�����K�e���R��}XA����+cV��^����������_'3T6��L��������m����U;i��?h��t�B�����8����R��jHo?�+���L�,��0r����h�	s�}������E�������`;�i'f
(��0	����wiAZ%�|l��J�[hgI�����;�b����;���~��B�����������8xyU�Q��-��G��u��I���^S�a�dd��s�����mFdN&=��<��d"_�2���p�U���J~"�Ewi���hc�e���bte���������|r0��@������������p������=�d�.S�N�3���,+�5?#!�-R���	�2�-����6�t��*�_k��l����s~�tZ���
._�������ls���)(��[���}N���u0k�,+*|�=��B�G��~�z�PJ�_�~���9F����������nz����[�x������f�~1d���q4���)/(�|��m��c����8ea��X6P�s9BUq�j�u�k�B�����<�f�s(��a�x�t0N�)
/=�[�����]���8y����Q������0N��K&0��B��������@��E���,��>E�J?,��
����a��{^Y�6Ao�0p�;��7~L�t��
�� �w�>��J���n��
[���������wK6�t�037R��!���ZL�fV��b{;T��y�*�����C�\KL������-T�T���`�U���AJ��+82���������{��Nz\�U|&&�t�O��X�qL���{6��&��@��]�'G2�~�a�����i�%E2L��u��������?��Ka(l/��d{s���i�6n��@]q��n��������(Q���l���j��F>�k�yL�����H��$/�S`q����;`<(��jY�;��,��{�E�������x��Q�+��oZz������r^���~������y�sw��s��kk�y��k���Of����
`����5�Sp��;i��N)9�4��%�D��U��v�9�/������8�m'�ca����q��e)��0����Al�,\���L"�}����l��������\'(��&���8s
�W�|���&�SR�H�d�V&�Ge�H�G���2���T�M{�Y������ac9xm�����bN���G�>��s��7`��{�;I t�G!M��
��u�]����#G�������80r��v���!��P���Y�'�&�� x���]!�)�p
!7G���
p1��.g32{��+,p��	�"�vjbP��!J�����8H����gN�T&�����p]������P'�:�yd�}�x�\������?r9��v-�{��@��1nC��\\��|����>�b��'��4� .���7P��^�u��8���W:D,���(	-�.=%/�
�N��^���������}�3/F8*�@��+�6�J}h�Y�@>��XfU�G@���(���*�}��%��Ct�Ez���7��y����Fn�J
��NC]��nE��Gq�vn��q,��
U%�C��S� W�^Vg7a����)��J�`^"�����z���0��W�v��?x�X��vQ�5<��������|y�iNMg������i>�H��T����c
�
����F�x��Ze���0����e ��h&�
�=������������rC�3c��&� m�!3���)�����"f5�����E�)���`-sz���2_������}K��y/�����d&6�T�K'F���>q���pA�?5�N�����r���8�&bQ�E�wdBWFo8Qq#\p�Dir�G	��u������jd^�B����
�^�4�1���/�8"w��'����W<���Q�[	�c�O�!�Q1K���x�Z��L1�
����!����A��?�H�ZAG9/%����c
J=,6zu��e�(A�sU�0b��h
��jCHG@�����?`x>,h����/~�������G�������\�:��D����u{*�ad�& �[�OWj�m��"�4���}�GJ�!0OrP����[s���q*��)���
�;��f����%�
�\��H��?7�!�����W��lX4nc��-��B�732��U��^���39�Pk9p3b��+N�DsF`�	;�)p���}�f���������lx@����p��I���C�"������L�G8.&>N���m�+_�!&�p��@-���Q���I��:�������u����8���X���'x��Y�]�2mw(��b��p��!�0V�����lCrrd����Y��+qFQ@��t��h�Vfz�)}���B!���Y���� �����Sh$]I�tH�+��K~�.<��)$�fe�P�Hn�gE�w"���&���(�]��� ���B8�����ejmkvy���Dga"��i<56�Y#�n-
���!C�4�;�$��4�s�e���Y�����q	��5�HM4o���[C�]����h,b�l��k��|��n�����[��p i��6��\�@}�/���_��J�z�]dY����?��-����g����T���x�#�A�_~�$[��yu��o��0?|���OQ�sHj����K�W�;,}�nl�Cc�C����i��D�S���('���R�@+9�h��^E���Z�?]�u����O���-�������7/�����|�46�N 64#
���1&$Z�����1����^��a��d����0���n�OK&����<B}�M\�!a���4#��8s��Y�|9��{M�m���d�d�"�>�y���:�B����GTq���~b#�..���)�4�wD��m�%d��
�m.<P���L���YWq?���.>tNG4�J��\o0���w�y�� �����@Q�t��d%�X�u����6����+��b���3/���PzS�3���P�NB�
�rY4G{�i\R�b�5z���d���S��@f-��$����u���6�v{����\m���N��A8.O�jj#|hc{����=�k9�=���?�{^P�Oq���s�r�%�x�X	Q�}V�CG�����
Ka71?�7�(5K���#GW;~���}r`Br�A'�%s��q�����5��z��V��0��7x�6��^��h����9�+��X	�B�U��.c������+�#�EU�y��d�=��/n�7��U��

�����o}�����	����\Hd��R���G�;�[�\��=w'��lA#�F� �
��{=D�� iUj�����ok�}�j�v=��W��������%]0��Y�{kr�����/1!;�y�v�� ���jg|��/!}����6;�g'���@6n�����
,>�}�����ne0��{o�6�>����J2������U�O����Q���m�Q�Veue��04�����8��tWk�k�{��ft�����@����P�"+e�O��K�IP��_p`6��k��l�F�5[g����v�6@��g/�k�-��%F����<�A�0�+y��jf�a��X~e�������������k���S�&�wC�����S���=��k�:��||�%�ov	9�����1�����^�D"�o��������Bo�\�b���nz"�'����i�4�`�#����s����%�����������1�21��Q��
����wy�]�d^���)Nc01/��t���d91�Y�wM��V��x5�#�uT���d\T�������B���� �O�m��l)���jz�N��������C@2�������j���6>���~Qi�+$=�C�S�6Q����G��Ci��wCAI���s�#f��(CYN9�����x�]���It����O�g�kD�)�����Q
�$z�&��������
��9F� �G�����\�g�>�w�^���vq|���0d�Y<�hU'V>������������]r4�-��V�M��y������������Y7�H�sw�B5�T��_��#
��sr������$��{�?�j���t6�:D�r�U�\�4k���}m�fl�)#���$��8�A�L(f��i��EDk��2J��^@f�Zo�8�]���"�[(����	�v��D�
x���F1
O�/%���}�F~D�
��}�]��q�bo�9JeM�[�T�d��0
@���2h��������2�#���9�_�u�Y�X��X�9�bF�^"����/��p�Y� ��_@{FlS��5^%|��CJF��7�����M}D��H=cX������[}���4��l�L�[��(�����+��$��+f�goha�9;/��C��"�������K�T��g�L����1e�5���0���_t�`
������^iy��Q���w��Y�5m5����������&
�u�-;H�9���9\�2W������`����t�����*
�O�K�����v�0�,��!� /�JQ�L�.�MT�6�������*\��	�WSu��c-o'�k��0���r����N���{kkA���`���<��hJ�B��E�$+���C))P��\q��=�(�LL�Vp��t�q<��6���r�����i�Q��3��2�����P��0���C����L�I�����h$�"��d��C�<en�/7�����k$gfr�������������#����b�[l'������LD��Y�Dt���u�#�<��d?���r��rs���^���&�QE���$Q7���^�����5�r�@ITN:�;�����:|�����^J{V!�YJ�FLf�!H����o6�B�P���3�,Md)��'��"���o2_�xmR�#>�t������l�y2����-�u��Q����[}��u��r�A����Q\��Nrg(P��|�bSsyae��U<��GT���{�C)�Hq�D���A��F'���?/�Ol;�1�����u-H��e,��X%�������;��.G��p������s�|�l��>�^��c��hL��/�c�h�9u<T}|��4|j7�^�~������fC4C�\�~B�k7�U��`v�$�AK5�
�{����t��R�S�<*UM�F��J�Dqg~��\?C3x���C1��]�i�j�]�����e��}�q�9�p���W��mt�������Pf�����b���d:(����l
/J^��1��_�G7-�w]������\j�9G�� 5�^����q�qs�s|�=�{"�n7H|
���-L��n�,�ET���!����9�����a����i`��'6z�;3xC�8N�>�W��C���Zh�����!�V�������Mmh��[��+�a����0�a�qv����A���N���<�;Z9P���"��\�W��I���%Nz�]5I�q����4f)�5��5c�;�<</
�����/~����\O&b����h^��B4��W��%�3G��D�V	E����&��������X
����I�xb���d%�g���[4��<,C��z>`��A��~:8i��;j��9�"�k,z���c�{����!^��o����^���5�sC�H]����U��s���XXZ�����L��l������-�X/.I2c�5%j�.tP�-q���[9+0��+Q=�r+����|El"y}0R��G�=i\��j�6�$��(���h�#=������9	IV�N���]�����:!3K�������tj�i���'��<Z����	�})���d�#z��tZwi���}J��W��7�����D�_����>��%�G�V���E���4x7e_��MK����MD��d�5���`+���&��s:y�G�/�����^L=p=��d��\�������w�g��U����D�vv����6Nlz�������@��������bd($v���cB�y���xp���2��u|:�X.�=v}F����I ��]�,6<����d����H���*w���Y������.�v�4	��4���]St��|�,�\7��9|��ssef��s����)���<m��^o�>��+����������*�$���_m��x���B�YC*�l_��y�Ciu�qv������'��$y��J(������+?q!���Y
:���A�C"T?������l,���lOs�j�#�y{����Sr�E�vnF�K�T�
]Fc��8v��������)8�\1���$��9-��h\�]�@�����jS��DsI�����a�i���D\�1ko)�w�c0����}�@�$��b�e�V��Q�Jv@>��}7A�j�PPO��7��gD�0�L�q��cs�����*8��c�?�B�U*���(h9���>�o��!���9����]��y���q�Y������x�w��_��q��
i��7RI�|�
^2�����z�xO��bV�C1�"��y:@)Z��y-�,��`���E����P��l�Zy���E��h���u;����q��Ij��'���$�d�-�rlb���3�}����:��X
�E��C����ML�<�������,��Ff�%�	�e�v��k�7�hb:�0nl���]=21���/D�^�#Eh�@[�x�:\j	E���7�by2B�xX�T�`��o1Q�^[7+�a�,4TR%[���16��K�����;�-�c�:Z�pC�G=r�����U��%�\��4�P�\K5	��L7�I/_&�����Y��^�9Er����"���9psp��&�YM9����FV��Bn��v~e"��YXY�<�qk�%�>��mU���,.�|e�l}�6�Kp�O['g%+���,���������Y��q��x'�����������2��!��^|�r#OHn��W�7P��6?C�Q������3��^��U2�.��	�F_��$N|��5��	���6�Y'�"�`�'%��V������*����I}�Q~T���l���l�[���� �n#���4��K*V���b�r��xb��	!(6'�D�0�����@@���������b��N@�K��|��arM�����U��,E���v��R����+�&���}�Dw�G|SRl�?vG"<�.�Kt���� �P��k�M9NCNNCU
t$$��w�%Fd~�����[�X���)\ZOx(%�0���^��NJ���t�bru�;��*�#������{�@�Y�)��zb1
�����8�f����A�z��%&6rw3g��k���k@~�.���)9E��v�	?��r;����y/�+��iNL6�Q����a��f������1^����	�^5�>��$N6F�.|��y����q�/������'��O���������?}8�;�xXT�.:��y�Q�hJ��1���@k�K�]�~�=.�q3�Q�@?��8�DEh�"oA������h����$-!�'L�����q�6x�H
�,�V�g��������+<�G�������8�P������Y����!Ft����=f���9���W�;��?a�(����=��R^z+u*�<��������Y8�D�k4�CKY\�X�;8�|�l�7Uh�P2�p(nY�C��'������%ab��^2$��vT`ES�������h�����s�,rk�7�`���_�/�v�������_�&2�J�9�.���+@��u���<0(9�p�y��)�P�K��7����C�	�(�F������mv��+������dsa:.��^��]��zY6K���u1F���z!�"HT���B�n2��H�I({SW�����%�L�}"�&��R��
���Z?O��B��3�9H����p|�#g�2��h�!�`��$�D]��v-�y��c�����8L
L%�_N\���E6�F�[���x�yf��N���:�������F	� W���+s9������ �u�M��D!Fl��]v�r��{���VAS)����@���Fh:�m]
D�r$��T�����^����%�����t�����LPj[�!;���vz8(��Q(g���Ht��1�BV����=�<��7�a����m��?qf6^X�r��
srw��D��S�(��]��l���n������M%���O��������d�L�dD����G�]	U��#�}���,G�%q�B�K�tH�;
��v�G�z�4�$[G�2����Dn��f[2�s%�N������
����{�
N�t@:�4m;��W����	��>��_���TBT�+�	�{@������G����b��o��5������N���S��"��h���%�C���d�K+�;�l�����^�������L�A"&���������n�Lg��r��zS���F��{����]j��Z�- DP�W�	4���bj����=R
P_}�8l
����M�?��K�`�y���0s���e�/��ez�OZ�;'��}�;`cW�����p������7��J���dr��G����Pr�{��9�J��l��o�
~���y{T���)f���-��q���u��&����W��=�'�`5��4��~n��o��y�D�*�;m�X��]�x�����,�;��������[�g9� _P��h�&�������b1n��y@<q�s����h@\v5�t1��8a�p�!Q�S�~�	�w6�{�G�������/��0v��q�5���	g�`p ZEI�0���ZT�9#�+��5��6=�QqT�����-�����Q]�qMnG	��G�=��U�o--�����:��#�����.�e�a�����LIN@�)�����)~�nK;���b���7���|��O��(�!q�����w�����;����Mga�����(�y��T�	�a/<DH:1�{���s����-���RE3����s������������<�[p�#����i}���k
,M�#4\��~�����;g?�j�^p�XEH�H����3t���+6���zDL�,ifUT�|��/p�R��r1YC�i�"��
\(�������l=���3�E�_8(	������i��t1�F��xY<���)8�6�|'p��-7 )jf��`���2�{E9z�O��
���l�s��<����x���'������C3�lC�\��n'��
|�r������v�S��������BB�)��(�;�>X�j#z� J���
��L�i8�1�D|_jP���
�����&[$-	Zi������s(�
�Mr��&�'�|SIe)-��$����$�%�k��DKM�����s6F|&��!���:�}�o*A�[Z�R2�@'�]�w�s�	(z��\'B?

Nk�U�%B��J�������Q�O}�)����+�'�~����NF,�Eh���ct�D9�xp{��mZ\�d�I�V<�@D����2	4���I`���`H	b��$�gQ/�,��=�	i5���V�a����#"T.�O-z����L��`&I�3�����v�m�(��s���M(9�n J"�UH�<���{�)@�e����Z/I/�.��}�c��r�:��72� n��J�hGw�B��&�a1v/rfb#*�rU�S�*~�s��m�\{C���	�(�1��]�-�6�9H����0<��(����m�gZ���A���h�����Zzq+7�ki�;-TCSS}����{*+"�H[�������$/Y�B������R�?q|�qG�3_j[|!����h#s=��=N�����x7�-��Mn�����(��!x9`8�Z
`|G�����Y�����{@�|<9:<��z�c��������ON��'NO�5��~�A�[���9���Dv yI�W��~��`c58���s�f��7v{�,���B�6�N������
3��{���Z]�"�C3GIpx_���}�����,8�s�� �Q.\�Nx:����IT�F~���^���0���1o���j�t�!����f��5�/�:%�pl/�r��58�I��d�c��gh!�0�mWp���;��4�f`Y�H�g.O�7�[i/��x�~|kW�A@�����zc��h�R�^����7�@K,Z�����H��x�p��P���$��x&q�T���>�$Vx�N4���<\�����'=��}W���� ������X������WA%@�b>�xc�Ua+��<����0|$�[���nM~5�fb�;�s��vp~�F�&��JvF��fS"d-�L��i�T'�T��-L9c��%�F��������'��%�1�t�>�+��'�S�FM#�}ETr��=��@���sY��^z���#T�����3��0C�6l���N�����2�����0����@��D� ��{�O<�xDf�J����q����+�:�Zp���5�������q�mI@t�o������U7<�I��������QjB:�RY!� 7�L�����p�k��D?���h��`|��E�4�:g�������S�}���������|;�����'����0QFD�D5�)Qa�%A������;$�'X.f!(PVM�x������Bp�	C��a"z���GZ����P�����DK��K�+1��tW�pH�j.S���@7�H����
d��N�aY�N �?�{dd���2P�o�t���H��������$�4t�_���*Sx��>�
+�P�u���H%�Y��8��uSx)��� ���Qt�d�x������g�^���F������O�o
r��f�/F�U�b���{*���<S%���7E��n�[(�)���������Fg�=v���sw"(��z"b�i!]�F0XRU��7������F4��%����R���������S�?���y�NF����z�������7���_;/������5`]�D��!|D1?���}��i��B�F@kV�tv:�zk�s:�7��b����*��FS�L_� w��I*/l$�JR�Ycq��T��������Kd\���u�N�cF/�������������)����Mt�I6_��h6������f������������Z[^]]�������gO��U��<�.z��o�D�%$4��6��.G���k���kJ�`t��v�~�~�&�7�)Jg������������}��'�9��#����{|vmc��?n<�V���suY����v��������yJ=������@�y��9=�99k����ww����P�����6������^]��I�,b{q����Q���@;:����`�m�����������)������F���ic��b��-�R�G���t�|�$`+���SL5�FR��8b��4�]���mT�D�N�W)���8sW�Oz�7]�����:�c���^����OML�+2�����Cm������ �B	
+�rK�����I/��f�y�/��q-������2�9X K�ig��j67�psmn���%%_-G�������9��'��#���E����+���RZ-. <����������3�t�R���
�*N2��ut�����9��WS�j�(�a��50��GJ�xw/a���5���v���,�x'/���4��V��~Tw>X{C������2������FZc.Q�l���'y���KY����x���?�����Q��<�ao���w�!�����"���������Z:�nE_���[������h��H�\3T�S�
��R-X����Y�w�#�-����]V%�"u��L�Wg��X���"���ucO���|��oU��p�,k�IV&�,Vz:As����G
���2f�[��d��wP��$_����M�0�z��$��~��C��x�u��H�y�+<5�\��E��N��.f+$�f��ovV�����,�T����'��g�����'����n[���-L��I�h�
�	Fn4�V���N�^���
hP�z��
���:��MG�-��%�Mn�I
��@�Ai�JDqYTc���e�������h���q���0>��/
iO^���%���:��z��k���R���-�[G��dt����8�Ho^�����-g��Mt�H;�#���[7�n����p�����N���H��Ex��5��C�����ycs+Z}����q�	�E�(��g9o`)@4ZZ������U���I�s�(=p�>���q���!%D����=g|a-RW
���K�B��p�5��J���p�{d ��wv+�b�~�Ma�A�l���A]%7�����g��l�}y(�r�fz@`����������X�������j���������"���;�~��V����(��G|\yOD�N:��wW_�ll>APM��N��8�q�����M��>�"��-YxI�N�T�~YV�*(���m�UQv��5����FO���;�`�����#�"�|;�|�����r�k���{^�z��_�>��%�_�>��X�hl������xB���}���m������ou9���b{���A���oo~7��+H\��}�k��s�z��<�b�_6�	�O'����td�0�%$�v��Uel_��q���LGV���;�\�k���no���2��_F��~h�{����N��
e
��2����E<�j+,��.L���d������J88�����=~5�����/�Yw���n���:��)�d��la����>�m�S��=�>Y�:�����+`����]���X��3�
cP1����`~k�x�. ����g.���:I���
�+'r����'���f�q�<�>{2����T-6�e�5m>~����G������S����wI���j�\�Qfu�u �8w�&����2�I�@�#K:�3F�l_H���5L�N��7�`l5\\�vs��
?��82_�3�Es�n������R����������|��������+k�����$��d��77r�@?Q�SA�}N��T'��{���:6��4���&D��]�,�u��MmV�etU��+J����L3�����	"�KG_���IM�YBf ����sz6(�7$���%���	���R����p�L���]Z�}|:������������w��2#�s����=��#	'�5�K�!l�>�����mk��A9t��*���D���O����qq��/���)�=��`�����������V����#���h�H\�9�����V���O��$�R��le�������#���pq���j�N�|�6�����H?`�%����f����x3:S��i:����d���O�����4�~pg���N/)���z<N�x���<�_����l���@�@�5��O���G2:��^<���XI-~q��a�������e�J	�P*������<���A��~�Q��s�G�����(�[-v|u�!�����*��Lj�a;����T��M�+~��}�-:���`2�;��Z���@��_~���+)�4��LV6���g�o&|�o���_�^���
�����d�;��*O�p�/	�.��!�.v,Lu��m��"���"p�|�C#��rSF�T3�w��?��s��q�J�!7"������u&�L�����!�6���%�����g��[�k�'���������'
2Y�v6J���[�
a`B���,-u����2����+]�r�D�������Jm\��B|��N	":^�%Yg��\�3V.F�����t kh*��]��D��h����Y:<�(#9l"ul4��:�2a�]b��*���1z'=
d_�*X��ZVn��
�r��'�o��o�Z�����J�\�,�tfs�����������J���pX�F�W�q<��@�s����By�2Z-��z;���h�.����j&50��k����B��0�u�Q=���Gs�2��[[xo�����f����N�L:d�&����6��N&�B���
l�:���q$sS�~�W&")���5�DZ�l���W����-~���p0	*� �H��8�����J���]��-�g���Z���P�-~-%�������c���x�f���a<����F�^8��RPf��pG�a;�� �f�v��z5#d���uv6}����g��R���c��M'(���>�}��o�N�NZ;��\�IV�-�l����z���*����4s�#�q"�c(}�b�q �$�q*��?t��A	��am��d/z�O��%����5�i��K"����;�pr�t�n8�?j^7�U���0�0!�Z"�;�uk�i�F�WC��ov>�OD=���4������
����������9
����m�����<��@�(�������of����4�"8h4=Q�n���'���fisD��������;���u\�#�����L��#c�5���T�I��f��%!d��i�L���	{s<�<)�.P�����!�]P�����w�;�[�C�-tk2��\(lO����!U�x�w�Z�8�?#������{�k-<v/��0�0�:�y���R���0����|�w:�,Z�ld��������;9�!E��&��]1��
G���O��b 0%�t��spp�{�:k����|88k�=�����RxU:G1 ,E�O�������F�
kn���]�&�w4�]��~o�5�=:�`o�l���E�[]�]e�x�y����pK����4�l�[�
l7H��p:��-������&�0M�{��1�]�(B��R�4��)jy���`x���I#+[|����c,����h	wN0Y��U)t������i��'&rSQ3��,���Xe����,j3�`d���K"a�K�����7�f�$2�!~���K�pr�.G"�J����N%�L5�QL!Y���d!�B����g�F/5���������8!IU�=���F���q�kK�����k�ugE��q�-���`����(�����.if�]��L\f�:?&����������d����z����R��������L�\~��p�\��8����dv��V��EKg�0_�v�������?UM,`��3Q�4����n������%{{���I�MX~�Zg����;~s�7>(�3�@`jD�e���W�
nwxTVi���r_���PC�q3`�xb
O��9Q���kT%�W/K��rF�?���Z6svc�I~�����<���7������3V6cE��p�rcl)����v�
�
����e�>S-���1�����u3�O�k~4�j�Cv$�k�c
]ew��Xz�U���qX#�+*hT��������~WR����.V�����~<����U�&{�������(/�XHQ��f�����C$	3��� J-�N�8����P��� �;��l1B��jBd�@����!�BMF$���1'W1�z83�(�0h(rL99�*��QS9EO�[q,�0����c����)#��b�	{�l��l@*����|s�F� ��;1��"mM�D�P;Aw��������(�!�l��K}�q��N�>A *?5K��:�o$������D78+���T��'� �>����������{�?�Zm��?}8.p������pru��5�=&��A]���`V��s���N����]
l|LTZsX�?��TT��SW<��WR�g�J��s�P�.�	\���{�J!_��y�6�-��C ����Q#�J� V�Y�2�����r����Q�yP��c��K0�!�Z����a�i����Jl8� t@�}��1DYv��F�
MqN��B����V0q��F��LEn0~�M`9�FK��lM�n���0�$C�k.�i����5���}��zz��G�.%s�^�4��k��]���W�W�����jjk�Y��&���KY���_I�wK4"�`*#J�+��O����g��i�<E({$���c��
J�F��xi�Gx\�4H~$F��Q:��(rD�p��7�)�=j���f�� U���y4�L+�Lx,��2le�����7x�D�����,������6B�����9��=VC�2�e��/�����}����"K�������'H�v���	a}��v����I��������Y:�K��=�7�*��������?�$y���>�	1������#E>��|�<(j�����ye����)��;e����Sr��P�66>4� M��QX;L������U$lQ��-�#����9����L��dPX:��2LH5^��+���#���{7�����WI���]����[��;i����Io�7c���}Y��F��6��s�h��@p��T�=q^ �o�,�rf9������(�p"�s%�s�u���BK������PhA��&�_S�}2^nJ	�wp�lv#Z�b�3���^&c��a���c�����n���y���w:�)A��T!���r{� K�����A2��
N:��Cs���a��8f��4�S]��x��aV��}������
�R>a~bwF��������&'F�91T_���������)����Rg��=���&h�P����k�Gb��.����F��1�p����8/5���K�����iF���^����ps�k���u���gi)�%����5�nl�5us�QmV�V?_&��mt�=�F��>��Sr7M���T\�b���.8�e�$,kt��n�	4E_(�*��&Qv*�\��.lY7�!���)���P�J]��Q��tnu%O
l�0A�^�^�T������
�t�pC1��\b��������HXN�������Z�?������@��p|vZs�����G�2��%��P��[	L��fC2��)�"�LG�^l����x���=�Bm8�-H�s�����6�	���4�B�!����u��xD�~8N��@F�tI�t���J=���]����fH�\w��(��u	���O3�����!��&�a�k�C~��v�@���\�8���\��N�{]J�hrk��0�y|tz�~�(����9�)��K���h���Y����Yk��A��������}��(G�dv�5�-���t49��1
�S0�&�LNG�S��Y5"; 'O��[����x0�"^����K:e�����v���AL9fU R�3�J�1����m��8BvA/�����h�}�)k$4
�p��<_$�&��Il�Ss[?a��v�d
qx7��qJ��^@�.�|&����S7��m���i'�N�_R)S������}
:y+���k��#��V�m�ln������sH�8U�����U�u�;?+"�73��s�!��)����\��6�����1.�rc��
������������(�ys2i6ir;�������Ue����:�9bmu�-��f/�k+I	=��������-�����A��v���f��������5o+�oW�j0�Wq�B�c��p���Q����!��DS��'�D8|���8�8��c�����-�2����\6�<W��b����{9�M){�O��u^7�Q�=���C�<on:^�^�:+N�U�]���k�V�v=�)��\|#5,�I�\�S��<$Ux����� ����8���]��IX�	x��f�/q�\"����Y�O2�L�J��	��+�����Ur�!K�U,�J�X�)� �!�Z������LQ0 S|u�x[7Ig:I8�3��m��	B@���>{�I8G��#�)<J���-���N��fg5��An��D�l^���}�~�p����������f%�q�7@���10��I�����6E)�@�U��!��ZA�x<���n��M�B�c��<{��~�$�Z#�&��_tf�#$�6�������hl����@�{�$��6"��qQ�.�	�g��@/A���U��y���qi��;HP�K��*���i�����-�b��c)o52���o	|�Z�%�#�>���"�KGY=^��~�xS��,<�a�EgDx�v��pJ9:����Y�$S���3�8&`>m�`~����,�0]��t7��w�����Y�
d�Dv��y`�@���dLw_����SJ���tk���8�%�9�RrYj<���a���I)��mhfV�.#Y���Y^�#]�j_�Z�������r�'e��5�7`T�I�&1�*]��#C�4"8��P�����5gW##�#������i�|H�Q&6���)���7[�������G���Fj��3��O��tJ� ��C��[���\6K&';j9h������
�#�4r�W!x����H"-���`2(����%�j{��5�M�/y���	kp����1%���x���l��u
�/1f�L,�.���KV�e�.����-&�Gs�����n��^�io:v���0���n4?#i�'
�F8�q�l:6t�,�c��x��GK��Dl��bDzByP�@L_D��F��k��g/[/0��&Hz����[�i�|I���F�*�\5��:'��R��He[�g����*��sr��i�z��Eg�K�mMu�9fm��������O{�"+��|�/4��	�?�����\'�5}�y�����ft�%i��f�C_s��(&����(:u
`
=>H��LD�����xI�D�p��
��jA8!�8��r+�!(96G!�ra���Rk��q��W����!�!��6=������]���[c�OHM'E��X����H�0�l�sN���B�3�����G�U���X�&�M�*�[E/��D/y�������dm	/����H�?�� g�!�7I1@n�<�h�^4�
M��{�}P�=��a���;������6�D����5��}(���;�U�����V\2�c�������9����~�sFj��6/�1/6�k0�
��c��a�99/7MH�t�_�x4Be�rt6�c��r4�k��1���V��hY�;�7I���%HpVH']�����K�7��i��3B-�|����V���f�w��9���o��;�>��0�S�_��x�bV����>:��a����.-�q�\��=��
���6���1a���F�[���W|���h7����}A��_i,��y��[�2c�<J������:r��f�y�E���\����6���/��&��b%G�L��8�O-(!;�up��#�j�����0A�f���9>���M�d`k����z?�a�V6��1�h3`��:�6�Ox��%�|����Y�2T@�x=E��t?r�����z�J�)I��R}SH�.R���[��v�������=l�>a!E�����@�K��P?3�~����g��WqA
����J,�I>��N�������YP(����rFQ�N����O�]	��|l����)`'X���K�,�U	
�GYYjh�!{��������X��A={�����p@^��;�����*����yVI;�z�[�����!���IZ�$��(&%�!2�����l_T������X�?JH���%���l�E�>]x�i�-�"�
�U�����]u�����j��������(X=T�k�Q��J
�9�F������F�\�V��-r<�.!���C'7F��k������9G��u��Rz����8��8�@�:��o5��F�B"����*��W��W�/�#�n��U����}n3�%��MR����V�
��x(�Y��I�$ �l1E���r[��������jSDG���5�Y���Q�:+�i����3B�[>��
HlG�����|M��Jm����&�
�F�K4J�^��$\�o�}�_�=��l�!eT�i]08H|%���R��F?D���m�A2x��Z��?f}���MNA�]:��l�+��dG*����4%��N9�$�!�io�2��X��-����m������;��Y���!�B���jXj������<�������n�.gEot��������-lru�>�E���5��R]C�&5C����t�����|f����������f��E���$.Oe����,P�R�*Nf�'�z<�������K���9�$�q>��S�9���V���9et@>�_��b[��!�����������U�xt�c�Z.��s�#��F��
�����n�������Isb��+�0�H�.�����*b�����J��V��`����
�|���=waLq8_��g���%*�zNd���{,Pn%�1��G3n_�fV_�os�93�=�1�� (+��}����K��8�l?���,�iq!7�-��$s����=��zUIw�����YY������"�*�%22�s���Ji���N0�{DKz{��K�����M��l��+'����X�W�t���L�?�'�����h�_�0E��h,�
�.��$�Q����@�Z��Dm�����)�Q���
|,���t���s��O�QE�Gl���x��Pmji�D����!U��V�t�V��t#=fi�������xw.0���"��H�Q�Jd]�pc|M�\�.�<�p<��/�j�A#�0�������&��tzs�\��#���p<6���K���	�'\�F�,W��2(`��1�T���l!J|(�~�	sd���A�X�����O�.i�������:�7��V�7�����g��8n����(�+���u@����6J�?d@Vd�P;������������<:&�&SH�>�r�����z�t ���6��!d>l&-e%��sg�@����D�iFPn�!m�gv]-X���2�Zs�h]t�,�p*}E��qO��s�������F����n�T����f��4���.��cd0jMo�8pOz7w�!f�;�����@B�����$���t��PJZ��F#��&	i��
}1�8�9�$�������y����zb����������d��(���}�R�Z�
d_�S��w�!���-:�|�y~#)a���^vhXc������LG��D?�T�����(�3����K�-z��W��>TlU�
�t���a]��{�2�
���R��%���Z��[�m�l'=�uSo������������~/�7zQ�F�>82���#�_��W��K�/�dGYh:���o
	�����'z)�ET�~O��~��K�>�������n]dQG���j�/���@ E~��myr�����QI��L�e���TV~����6��I�����B�SF���"���xy�`��Sg�����	A�}S��r���$H<������a�,���#�R���
�:��� `\m�p��Z���l���
��{Q�0M�������h<����_�����=�v�bj�2`v�G�g�����.�q��>5�o��`GP!���ZP��H0��������3��������(c��R~���z2��0=��*�gc0��>2XM�S�$����Z�1�X:jE�+I���[4����X^�cUQTS&*-+8Oq_8C��,�;..�!���zM�����d���>?��\��iP����es�G*t��ly2��V-q��W�K���Ml��������rl�"�'Z.�)Vm��P���OE�?�@�iF�)#�w(���NCo�����Ws>�������P�s@�I���=�����~d�����Co@V��f�����yjC>�}a�G`�a{	�P�*��
5��{Z}��k2���(�'�,d�b�,�i�8���i"�m=���K���tT��h�������[$K��()��&Vh3��_z�
��
��^�Ui[E�f��;�RI�#o���t���Oy��!�|�f+�*����`/|�>��krg�l^1zt/9�\��2��3:z��9���6�Ur)I��L����4*E�o�v�~�z���KJQ�R��P�t�1��������Ms5Ia�_���#�X�<���K��f��x���sa���A�c_=H����9�2�;'�9vs��`�� ����XH<�\=a5|1��'�K�FRG:�Dx4s���Q-x��a�V/3s�K9=:p�����N ��5������;X�(����Y��N;�>����$F��!����wt�s���:5���X9�z������L�Y�	N������g�;B����"����u#_-�j\�3�d�>^������*3�r�W=�b�b��%(H�t ���j�=����^�QN���S��T�
V�o6��*�EL�)���`B|j/��w���/���-.`��k�B�)�I�9��?����B����y.�xq���G�4�J���|}�����x�������q����%���k��kvd����/����*H����9T0�&��pL���D�bn���:�2�z��$�������~d�:mC��3t����#?�s9T{q�����r����D��w�;{A������w��-r�����I(������[���}�;@*�aM�~YO<�iJ O�\]�ym�H/�5�h���Y
��d���952���E0�J_=��5K��X�y�U��2q����]�M�rqB����5��-XZ~�/e�6�Cu(C��g
��L� �zt�r����4K������
B2��[�%���h���sl��;�z�;�����i!3F�yo��Y8����|��a?;�vzC�����}>��-���vYN����SE@�a])k�_P3l���;a$�]�%�;F�Ww�p_��������/��\�Tw7
x������[�x6��;�����A��*_=����)G��	������>A84}9�$�p�Y\!c���.6����Q�@u������0�c}��u{2����l7��f{o/���;��&[�?�������Lcqg:��1��dv:~��J����2C�`�a�*���[;�����#b�������wh��q<
��������9����#X]����
�'�8fT�q�wfs�G}� Q�4��"jS�i+i�3c�H?h�Mmkz���H�[^�����|����X��l8�u�s����e�x��J}�h�.�I�
)(3��^�>��<��2
�S]�����	�{����Kg�jR#%A���<!UD�y�3I~q�#�P>,i���\�(\��/3��f�'`��F�/3�����M��$�B�M����"2����+;�qzZgCzq������a!XO`�����Y��c��� ��N;���
����(�T��aZ�xVmY�
k�7�<��)�~H/Rz�y���U�Fv)ZK��"E28A�����������i��5/;�����	������;5o�4)UaO���%�.Y�Re5OO�)+3���{�b�r����h�s� J��gW��%�.5�;(:(�����,��9��*�����=i�N����^������J�0-�D�Z�
+�����1����B����3�������������2;
�aj�a[����'q���c�x����1���{~�e7��)5>��]
�����Y�z��y���S�k�	%_w�DfM�#;�;u����s�m�@��<��'0�b5���)���t�d�T���(�^]��Cj���%|��x8zt/!n���wjn"��s.�����w���q;�n��5vT����_���aln�A���%�*z ������[b9p:~PA����4�P������g���AF�����V���}~�^R?W/}���~���]&�.�p;��Z�#fO�R/[���>1])9�Ohz��|s:�g���+�g�������#�'M�C�'��e8A�:�`b�F��'���Z��"����Ks;��;��yi����S|�C��#��]x\k��r�����2������$)���2��e!�����$�x��.�����
������J�/e�����������%G��������;Pj`!�r(�������/�����E�K�b�JDL���@+rc��m����N(�E��D�`��sP)e��g�w<�T�����U��h� ZT�c�����h������|@i�#�81���=c�e���]�K#-*��B��I����W-"\6=?������������������
��C�����i������**���E����������uK��:����T�Q'���D�J���DM��M �H�((1����#S�)DXQ�u�K��P�Ug&�$4���a�����
���0�t�oRC�,��+�7�ci�
��F�N#/7X��X�a��rT�[�#��
�5�%�Y�#��S�g�.S���=�r=s�~��Ib7������d:��,x��?43�� rf���#���� �����W��G������4�q��|�^��z%z�������{�;{����A���HC�P�q����er����3�n��ih��lNZf��W;gFO�L�X�M�O
��v�P�|Us&Vg�e|����!*�����?���'Y�Y���t�Akt���e�_/r��t2"fI�_�Pt��Cd{>�o��wF@�|q��~�0�#�R����B��?
$���*
�kj��W_�K����M���k&�����@�{�<��ZvM������%�����n����^��!h�%^KI����fC��$B�8��s'=�de������_�U�5��6X�����v������j5�_���v.)q��.���=F����-O�.=4�0~��-��}�	�KpsH��������wG;�Y��8���#��j�]2E���[�9��b��d�M�y~;�5�H���2My���n:������a7(��F���:R���l�	u��gJ�}`��:`�$���.D	���N��=��b�P��G��b�3��(�������������V��F��V��})2L��x�l�a����{
p��cX�[�"�O��w��!���m`������������I9�N��tF��/J�������XD�����3*&��#5y�ubP"q����oRf�O�=/�M��g�����Oy��^����0��DY�vi2�O��r�1d��/�$��i��_�W���^}p���#SH��$�.F?]f�����5��W�?����p�ox<"38�� �)���~'�(-=b�c���15wA���2N�+xc2�=6pYc�'.?�����pB���kK�Y�������[���������0&��L�a�������r��Y��-!t������ J���ef�4g��������k�km:k�g���R�@��|�����RG@�g(a�-��D����4��H���8�������5���3;��W<��c>��k��i�yk�����F�L��8M��3��@m����
���V7��'�vl������s�����_�^���J�.������;��^�P��h�)b.�2�8��������+��2�@!��4�tD�������)Y;�m������U
t�WV���zR��� L�"��q���
�	|��va����<'},�x�LyV��k�rG��q>~0�G�G��/B%Z|�u���t��]��u:�'��-���J��4�x1�B#l`7U��ip�Lrw"1-a��m�4p1�5�����{��E{t�������G��o2��l����i���������[PB�K�K��q��
����b["���&��\�	A$�=
��v7��g���?��J���?���|
�#X,�Z������S�qL�d�)�M�M�Z�{I���!�G��lE����x�`�2c(2�4�1b����L����h${DL�
��z?
p�Z��������F�(I�A��u������Dm����Q��:��S`��K��[QpO,R���X�`�
���E�d;���|�B���T�����Y�W��F�	�U�[�������Ig�}����o����z�<�R�y�3��d�p��S���9��LO��Xtd�'5��F������xz�����8�j��4��h>s0
6�!g^{~$��#�����aLF���hr��T[����X�9��1��}pd�(0�I#FD�W�c���a�>a���*6��5ZX? ��r��P����`H��!��x-�p��p/EZ�e~�dG�S!� 2���^�L�!��;hS�L�w�t0�)��?���>��$c"�o�����������Et�
V�&6x�\.����d��\/^����Y�97��m���>{�����}�i�l��O��UkNC�0q!.tM����	���������1�x�"�9&"��2@�q��k"�m"�DV\�O}v9�d��Fq����4a��i�	����Ne����������n�q��I����H^��p�^q��1=�.	2���*�x�SG�*���E�4|�F_�f�Bh~�AqV�oJ��\r��K��z�/��\�O8�c�����x�%0#�b�����<��;��d�K����I����rn�o�.��X�f�S���"�U�s*g�+6g�"C��ca�����l������alH�$-�j��%�������8�8zG�����n�G��"
(
��Y���ubVRv)e��k��t�v��/;?���)4<(Sr
�Y��jP��V(��sF��x8?D��`EH��`I;,���%��WU
��f����Y�Ff6h�::X�em�V���0��i��J�t��L���9����aw�����yS���l�Uru�9�pz�^ONA���@��}�A����)��\�>��
K�I����$��S��>�\����oC����1�4/>F���eOVU���9�gXBD�4KN��G�_��0�<B�RYEg���3�W��pe'�
>�?mLV��>���8-��7�U����v����c��Y���s���s+tH��Ba�����@�y�3���;z�c�GO5����F�?8�?��,��5������-ff�t\�BW����{`��J�}{���9�DC��2������v��Y���m8���Q�����h����(���� e�an�������������JF�%�/L��-�i~	x�������!u��"Sh�#s#�%�H����<_q6]R�a����UtZBxe70+�NU��m9�U�� 5�nT��g���+���;+��iqX��W sd����sR�q�����I>J���Q�]��R\VmW��2G�;���3���Y��d������T�v:��/_'<Zq��{N�(]F`�UsHq��z�?
�'��^!\n�����fg>/i��8wuk:��B-��s�3q"��
r�%��9��`��g��>�����C&b��Q"��y�`+Q%,����V�b�x��PHX��Od(&<pC+%e�:���g4���G��6��4�x�:���cQD*�l���QoE�1�e�*=U+h�n<`um,�"O:%���`k��q��;�2���)�Y�E�������h�)�o�9��Z���m���
OY�����R��i�,�����{���������a*b]G�)x�/��G�@/(���^��@�i>�*�F%w�h�E��:����_�:���c���R�D�m�O�����a�J�Z�`���&��z�-"J�W��z�I���$���|��t���h���$�Yn�e�0���J��_�~`w������C����~� �c8P�U,v$D�� '�N9t��e���%^ ��G�}A5��I�>b���}����4p���� ��6����������� ��
�7���)uh������a�z�?����C����P/J��l ��o���q+
�Q�NYBb_\^�/�������W�]����s}�<?y��w�L�)���M,���r{���/L�(}�Y{1��v�6]��/JbJ���+�D�A��w:�z�7/��Uc����P�6���`������$�47�/��1��V������)�=�����n�9m��l?6�S����>�n]�x����� e�|��C�	���^�,Hu���s�>Am���)�8����
���[A)Y'��������TT.���M�9>m5�ZT�Z��!�_�^D�����t���Q[/�q|q~��q�\��tn>\b����8����W$����h�Stc�1i�I��h�6kGH��U��lk�s�<�V@��iB�����0�%�*�P�"�+�?�
���W�%:7��W�j�GU�X�����U����1������F�Gl�����O�EG%��]��`5����4�a�5�{��N����6�e��Z:��r�Y�6�I	�����3
��:Eq����Ol(���~��5|DL^{Y[6~�t�#����v����%����a{�\����N�i���sG�
{�9�>��#a����'�L��81��=�&���7���4�6�+Z6�����6k�����,��|����P�mI!��e;��1����	
���yL�<:�������������:?�.�z�7W�o�o�e�S�����`�%��o��/�dKL����,��B�
��H��Z����54��"5��l�����
�o�ap�K�������M����t4w��V����H�pxu�CunD���Q�4.�@����xIf;w0 B����A}�s!�m<�dC'�c�V�����dC��-����O��Q���c��r[������~���N�>26������n��v�p���>C���
J ��H��
+�*^�-E
��P	##��n^MUl��a9�c�`�R?������'�G���;��
�����p(I��6���]�t�5Q����rFM���.��)L�������t@(�R���=�����h���i��xo��Y".�s�(�M$�	�G�,�e+S����Ui�Hx�Awv{+*)m��-�m�������q?�b��==�2��������������������edt^J����������f�h�i���E��j����Y��&�;z�)�#�b�D#(�%O`�h(c�=��Q�)�����/����z������_�4v�Aow�}<)��.�$a��viQ+������$�C����*�ag����+�?QG�^��$��'��C���e��������X="��5�������UA�xYG� s�.��R�8�Q��j��������x��-�#�V���O���W�P��w��x.ZS��<)���N�{�S�V��~�`����8w�Rr�9������J�4�e�m�&���y��y
�X%��yu����ig�����}%�An8�M�,u@��������������%��Hd�[Zl��^A����E��d�����
������2��$����w�S�&�C����xu��������5�,k���.��^�]I$
8�
�@������;�2�#,~O��m0�M��F���1o;�)��~�hS��k���b�k���[���
*@�/f0B����V��u�y���&o]����/
��/k�����&^���:d�5���j�%k��^����)j@�y�v����j9�$�+���
��s�m�F���!������a�>~��1F~O�)P������l� �Tr�T���h��C������Mm��*.7r�:e%�<���r�&>�o�9�A�YJ>!�ZnP��o>����wn#���C��7�9�V�����W��+/!�	�i��3�S���v��xj�Dr>�U��;�-�&���
z���-�
�}��wh����w4��AW7]S�@�2+9/m.X.�4���x'��e_2����0����>��������#_B�,&i�����3��{���"��"v��]�{k�g4�tU���}?�>�|w~}�]��@���������U��u}|���u����KTrC�I%�K�/��|�*�|�v��mr�C������������C�����h-�w������m\I���&tu�M\��^��]��9\����G�xJ����]�P��<q�TEp���>���_
����a���`���fF��;IA���	�A����*�����g��e�L�3��W�J4*��
�e(6L��a4*������R�[��'�� R(Y����t�u/t&���������kaG�qp
���~���]�OJ�R;��-�T���QM{T�Y���Z�;'��<�����.__n��o��4X�_f����j�2B���A�#j�� 3�������eF �����G#P?�y�g�@,�s�l?����b��4�x�>sw��n��/�m!���Aj`c�H3M+a�>���>33=G������-��	��|�f/#����& ���a1�#2o����O��&�]���ia�F��a8�t���	�?����-��K�Z��-@"1�t<�|�C��<�G
d�z]����7z%��<����<^����v�J�������r%�x�4T]�I�+����B	U��F��QB��n�2��t�k0��~>���J���`VX:����
D��ax���@��k����V<�;.�[���	_.a���u,�^������&���%6/`J�HcN��i�X��,�H�����7��������V�����A�\q��*I��o�1_��0'o�������������4^�=��M�
q_o���M��kp_W	���(���J��>s�9�q�.��j��~���!�RO�^��-�	��y��|7���nc���Vw��A�Q`�Y^B���i����5���Z�Z�`��n���?��[Yi��o������zJ@�V���W9�K��� F���
�Yq��l�l�����e0�E7��]��i^��}-_���l"�Z�{
���BG���	�Cy�k=Zb�^�Z� �n������BN-�|n�g���y���}}sq����W�V6�R���A�%Y�0���zi���m��V���A��3u��_^�)���lW ���q9����+~�'�T��B�t)����y����"� �<�u�� ]�
�������ix{{k�u���������[� >z'P*�'��n:�M��� 4����a�x�pm�7��0���B�4�$�JT��:�8���hO�7%~�@(E}��)�w����7���c��@D��Q<���Dj��{����0t��1q��r�no��"���t6�����m�����~�v��}��Ijp�N���O�<Ie_�{�K��K�y���\���p��j�>�+��xOfqGx�L:����e�&�-�o��O�
�w���0���
P|�e��5'�W�^Ra�HX�q��!F����6K';��6=�w�����G��;m����'�2r9���x����J���-�d:�BT<�z��7�������L�UXB��
���5���3K�1���>���m��{T�BgvA��d����$2\[[�=(����y���z�rk7����e5�����?��,���>���
��� |�@����������z-���.���q����&�����lC~���e�!���]{��U{3��Wi�1+�6��\��c�)���0D��s4�u/��_DG>�v
������J�@/���*��X���|3b�xt�Em�wX/R0S?���������H��V;���������X��;�N���S�,G����_��b�������}�h��5L7��,o�d{F�!H�C�C�-a0M$�I��#pj�����_�N������0�^g�9w��xTJ~��x��N��d�-A�c�
O[�2�8����I���%������q�n���C��V�����2j[�gS�h�������]��WW��;g��7���*0��;fj�.f������P�� o#���,�|���lK^��#��Y�5S����J~��G������=jw^��p�E�Zq-;����9�����TS�����%1*4��N��4� 
P=E�k�-�8I����
 �8~#
�^�(��M��o�����|�R
9u�0����I�`i�/I�[q7�
�s�gV:\b��15�Z+�����"5iR����Nx����Cb^,p��j��2o��i���r�z��G��>�?D���@m���u,��a�S�o'��d���4�L�������U��R�S�2��Hx�<�*/�&��n�����wY[�uX�P��Y.��K�����$	�9|?������&�"��M���-Q�q=r7tQ���Z�x�!~���JK��N����!����p�������������
+Tr�8�&f���6�=3%9D��%����&o��DR@��T�UV��d���* IqK4�*�6�� si��d<����NF�!(9��Y������F�?��S��������if���/S���d�8��PV��$���W�� n���0���tI���SmM�g����J��|�d-�<}6��Mu	�3'v���B���cF�/��0>���#���4��y�U��;%C�P|�d
q�oq�v&�a�L��9GE%����?������4O�����b!;J�*Q'���j���K	=w�;{���=M5)���)�4>S���X.%y�����BQ���I���,��F�V��Jl�����a��n>\�4��4HS_I.Nj9��d������}B'-��9m���*����lr��]�AK�W�����2��9�=�r�m���m����V�z�}������l����tL5?��X�c]�^���Hn.?�t.O?�k��;)�ay��2�
��oOOz��Pd���ZF��3�B���k�!E��&J`���M�?�i|�J��KE�&�����L����L<�w��m&�U>��!BZ��J
r������/�s��+��:@���@�i��2e�J�H;��I����/����b)��i�_ ����@Q�qM��`a|�`��$~eG�;�Q�d�te��f�3-���t�mP�We�y%�OXi�2�j$R�{X���(Y��qnI�@��{� ���-�R�%������{����[_��_$���j�QZz���%�/C���47V���1a�$$E��[�f)s�ai2)�b��;p+�?�u����N;��IQ��)�"���������5�/�{D����D�@��[�Oi���uqu��z���[v��LM�C�}0]i��@�-�Pz'���S(�Bk�U8�8��ZKp,��g)����9�~Z��z2'���Zz�M�HC���Ky�2f�
�����J��gG���}��]K�++������I��u�Z���4�1�������)�d��@�/�w7}FU�Y��I�?B&�'E�^�f,�������g��+�eB���������I�>Vr���\�>�h������8�0���@����q����1�[1c�[c�g���j*�&+��x�g`�����`\ (��m=9����6:�&L�U�y�@��q���F<���x9p��[�PCM�!:J��d�S/{(M��2i_�������L����e�Dg��<a/W����N����b���~��.�����+bv3Y��[Z�a,��X�������.�;�;����@�V�8Ir}�2�&����(J��	
�135z���{���-�������25�6r��B_���]7��/�G9t�I	�-�����{Ry��`��������{w���fK��2�v��6���H�N�����#x����D�$j%	a��[��-��35P�&L4����
����<�!�f��b��J����J���L�-l��o++6�F�n��M����")�������G����s�*�G��L��4�O�D?w�VD�4�Y�;m/s���&�`�v'(o���uaYQj)��Cvz}n��3>0+�@|�N��z;q��aM��#��7~w���u��q�A��`9��E�u����5&����%��Cw>1�c�X�@�#R��i����c=1�
�����������&	�qu������V�m�E�4p���N�\��&��7���C;���$��:=����&I��J�G����L;���d�n���5�:��M��V1��6�D�Y,s�
�7�9�`p�@��j������])i�.'Z��,��%l�r�0��;�Z#�h�;,Lq�Z���2�,!��Hj!�=�h�HA{m�^����J�K�0`WtaW�f�)���_��-)��R�^����9P�Dx�+Ax;���w��+H��~��%O!�s��o�@ ��WZ��IgJn�@�\/���i�`�=�����-��~��%��b�G����5dX�:��JfA����oB��:C�X6~8B�k�ql�z�X��:��J{^�my�h,_�(e��8���l�O�7Ep���\+QTAG���]L9�n�C]�-(���>T��*��_5����_0�Q����B���d%��V^:�*����eE�q���fK�X�Q���
�W:�dO�Oa�M;��~�����;_��;�Sg~������p5�����	��Q�,B-n��J��n����Poo]4&PO���c	Dd���Dg*��H�L�S�K�2��<q�SN�
@ �x��W3��T1M�	��������"]Fh�l�\	�����a�����DK��;�p�rBE�_��2�qN\��?@`\����^ ������}��&v� >Oo������i����qebq���
��,q7�%��\HM�jV��%������>C@��U�#����0�S�Q)(��3s��O���k����l��6�g�M�V(h��V����������;�E�.�m�kG�V0���j�rI3����F,<y���C�U�9�+���r7������U/[�:�)��3~o$5=G��p��$���N��~W���T�x�@z��g�(�m����^=�2=�����n�
)����������~Z�����p�������~v��L����wW��l'���B����=�����1����;{�l1'����(�����������|=	��"��wAeE�+DN�	hH�;�|L_U����+��������[��e�>����.9J�~5m�g�##�G������^�}��JobF��"GF��-��]�/E��':��L����,d �m���jQq�J�>\���f�_VP�8H���pi=�h���UQRV�/��w������)_�?��&.Q_1��]1DBFD��It�j�_�� 7*k|�����e�B�#�xnu��L%�b���7�q���������e����m_�A0R�*�.m�{���'��}tU�9WD$@[DL���}
�?������K��@+��
A�(�h���J>���./��=c����'���p��f����+&����-�g|&�����p�.�c����ys�W�TK���������g���u�������85��o���1����id���c��i��i|}��|��}z����^>�.�9�4)��B+-�E�KK`����M�U��2�J�����S�\��N1^Y%�kT
H7�VF���]JkE�����2�lL%�v���
P��w��!<@�-
V�<��E���z��}&\p7�~��.?���-yQv��-w?//-~X����t6Y�	_�S�
���������_d�-K��S��.�Y���yV��� ,-Mn��RR~)��z
�;�����|��x�8���E\��)+z��:y|q~���i��0>��l���k�Yf?�/+�7�_�*�i�KXHk������7dmWA����5j�����;&�.mN
��v�>�#�Y/m7���0�5r�-��`<`�ps�4G�����"�,����������3N�2�����i��^�R���(D���n����������X��)�D���'�M]sY��i���]�#���]�m�x�t�3<���z���rZV��k�a�i`�;�����Mo"'��;�mi�iOF�
��g]'%R�Y�������-�l�(����e�
R+��~���V���f��tQy����Sm�c��G��^��T�^^�D��p1����2���q<4��K��:��(��.��}�;D���52��\�,<�}��:i�d�F��,7}����A�I����>��-��N���V�$������urL�i��C�!���|e�=�rG&K�o}���Q���t���-������)=�<��=Y�&F�,��O����9lu/���7���X6�ts���[��������[;hT�����=N��|	���O��U���_)���-�yqdC�:/��*���wW������

�������,�.\����2�W�#�be9�	@��:�����=[$�6����K|����o��������+��5���?'�t���7�-�`?8�W�5�KL�-�|�mL�>���?L�1��=�	�Sm"c+����a�+���c0��gX��7P��j�����iD��D-Z��M�W��X{v4�C�H������o��$��L�xL^~����)~y�{)�M���������L���s����!0���ysR��
������Vw{��`NT��2�g�M������a
C�J�����]�_cS��.s
�h����B�	�e��}E��4[���1$	I7��n�����h<
>2f���������W��GOk�K����#]O}���p�2�L9��~@���q���u�ys`����h^���/M��2�u�;;=�Z=j{�Zo����/[��O��=�v��=q��(��".�Fk$�nH_��}y�Z���Ap?�BRA�%�$��>	G&�C����fqP��
dv"�L\��?�v.!2���L}�$���;�U���a�poN��t��,��+�����nvaq�� [�2^��
�:�8�|��%bj7p!\U'~O�`�K��Xdy*"���X�d'�i�=����a�K�VLH�)�h�E�IM����no���_t���zco/�V{���?'����l���0�
f��W��\���K������#�V[�������q.1,���M���1��R�8�r)�����(�����B��"��`<4FLb�8
/���.IQn�o������������`��&�XZ#(����7��#�
�l�2��
z]��&�d���'Z��7O\���$�w;}��UK�[����j6dNKggJ���:1V�����'��9nb�V8��q���@�:Y
��p@[j�#
��.���e��`��0~m�����ui\s����#���%����<oq��A��0�DBa��%�&��/a��8h����c�����D�9���O`�/��d�k�-5Lv�����D�������������Mm[�u��G���Jmw��zkk�oc=�M_�z�%%_�!;�w�W���i������
�����t�F�q��	�AcB617kI}�����\z���#�l|H�~���xZ���K���a������	�3|��!Dq��R���^�uL���d��m�M�	�{u�K��$�g~�vP$Y���lO��~�	������&x�:�l]������i�]"�?����F������ir���P�'T.�6�#'���e �
�s����������!����K��|�4v�zca�S��y�����A�3�l��hbRk���~8�Bc����T79y�zj"����D�V��3u:���=?,�L��Jyym�:M�5�8�u�O��|+"�?���:w�3��t��[.��0�%E�U_Ig�tL��H���s1W�K�U{����:@VB�}������l�m��xRQ)o��M40�W�mV�������)`�i�0Q�	E������=���e�����������L�����)���8��B�����J�
��E�g�
h�)��:$F����Z���#b9@S�>?i}�4on�:o�7g��4OO+�~l�\���y~��7�@�ZT��8��^��"
����MC����
4'�����uy��}ccL',�������9"�������=f��;�������=o�_����e;,IQ��W�9���:���Q��'�:/N��Q�8��H�bA�� -��ju��[NrfU�E�OiS&�6)9(�6�J���P��	���~���F�z4h�<��a69�E�,"����9
�����~������(�Y�\�=��g�Q *:o^���ZJ�2+��7sLlo;{{��`�`oo{����.[��;�e�E����
t!�S����N�~�SM)�����A4&E�\J�������������[#�V�Dxo���G,�1������D,� ^=��1oQ�:_k���r�y^��T���\q�����pH�K�
�9�UvY�]�7�����
�f�x���e��
:������s��hj�d�oG���zq��G�%���R������=c�R�G*�F����m�D�e�an����5-P����,������\�{4g���y����srd"3')l�p��y��
/A��e���������
���4�
3�-"5tG������T�zmn\C"�I�lDH ��O�}i��y��]-
	�D�X�������w���k\��;��E�}w(.�l��n'7:a��G�~��'K�l�V�e�t������C���2�X�tK�sS��x��L����e����[�<_bQ���Y��-L���E�!w��~d�+!t�1�� @	��q����tc��<�J��`v��E<2��DD�q�c���/��=x%M?�����������(r;����P�����a#���������??�*����������	[������r��2>v�s ���������������������UpQ�O7��k��E�9�nJx�:m�x/���{{uq&��t�$�Km�
�������5����{?���4���������K���M�����:]z��`������Es��d���>�K��5�{:��I�f~}�Gyr�9�O�[W-/�[=`+����1V\��O������."{��D�@5�&����~$�k^s�a��������������������'�w��O��3~���YE�4�;���d��9�bwdu�u6�`�d�����W���<��f}�����NK���|���i������9�.B������,�iC�^Y��k�7�/�f���)H��mU+�m�)Qu������|GT�EA��/�����U��!s��FQ�����H�k"'���T���{���/�u��/�_���V"e�vJw�������1�������^��h�n�{����j��2�����%_�`wx���E����a���^�����c&�R������_�.'+�V�W��%&���x<�^Mn��Q�Ju�QUP	&�ak���;��ju��{��sq�u^26����
��������GX�pf5y� �	q�:
�*o��V��N�B�Y��O�����;�@���������_�+��cI]��'5�EY���?��u�n$�0�(>��h��G@�1x�xu�@����/R���wW�1f�?���&E���k�������B����Wwb����q��Q�`�F��b�[0G�1�9���<p������V6�+5n4",��?;��!���x�K�e�lU5*����e_��1���2�y-1��/�Q!�}f�
��>�`������{�H������s��w������1=g�w�c����#�=����Le�����x<`K��3�h��j�i��������`:
�T;p.���yS-��y���o5�p��5�����X�����`��H���j��}�'�3��L��e<���dS ��ovQ�(�
aL��	��J�>	����^���L�'�L�"�@�����$�I��q�/�@_0�9����7Om.��bEZ@�.�����?Jh���||��=Cd���2��B�j�
u��n�t��g������������������d�e���2~��������+��uC�/[��*p�>�s��mNg��"�F���R�'OVK��C*�d���#,�BT���O��4��<i;����q���sL|�?�#�(�W%�2��.V}�%�ho	~*������lr����������F=�V7������[��8����W7����o��kz�o����#C	L�[:���[i�a}W����B&������
��U�au3�� _i������l���0N|��S�@���,I`�Uas�	-���T\�D�}���!���B�7�$g���3���FPe���+n�y^�Q��]�yorbB�u�b�V"�$�b�~u�%$��f��Cz�&�Fv_��O���w�:e������MN[1��^�$���H^HA��
"�[�U����W�6��������~8��VW�9��I/
0014-wal_decoding-test_decoding-Add-a-simple-decoding-mod.patch.gzapplication/x-patch-gzipDownload
0015-wal_decoding-pg_receivellog-Introduce-pg_receivexlog.patch.gzapplication/x-patch-gzipDownload
0016-wal_decoding-test_logical_decoding-Add-extension-for.patch.gzapplication/x-patch-gzipDownload
����Q0016-wal_decoding-test_logical_decoding-Add-extension-for.patch�}kw���g�WT���7�t�	-�6o$QCR��4^XM�IbzL��~�>�|Jv2���TD��Uu��S
��A����8��F�LN�Q\o���I�����"?nqP��a$�U����]l��\���f���u��3r��^��yX�������^�?�E{P�F[����+/�Q������C��=�.���t;���������l����7;G��wa�4���>�f�l��������Q��?���^�i�]�qT��L��?Xqe1���]\��U���e++Gg���u���N�����,������!t�E�s1�6����b,�����h����uz��������:�W\���6�m�]Z�]g�	��p�0,�\7���=Wt��uV�N1�a��*���u���������������>tFg���w����	O�	{��|��U����i��q��C�hPb�����kw�~�:|:����g��{e�����n���������w�8�$���r�7*O���4fG��)2����w���;��F#Lsh3h����;��uF��{�����G�%������b#:�O_?�'�n�n�/h�[��v�/����[/.?^`c���v������
^���W����-$�}!��um�wv!Q��	lmm-`s3�
����@�n�v��[-�@<�7��e��_���?����Vv��k�2����
����K��j�,]Y��Qp���0���p���p������j}�������;''ns���xz�;_yi��k�]��N�8��J��q#�����,g�l�?��m��l�_w����7/��3�_+���Q��������(P��&��K��K�K����u�k�����O���/Z��y������-����Wz���Y�U�xlm1|I�����v��io��.l�-����[7b������o_�6�;����W�����{w�����������������b�h��w/�cuN���{{��|���+�o�k>����w����I������3�o��Y�X��j��k��������c�(���/��!Tn8>�c��me}��h�;���
���5;���kO���t��N������]�������������a����f��	�p������8��mw\�������E��oc3�1��n���>��y����:��OD���f�����C
���pL��t5,r���O���ku;���������_qb��sl��O9D\����j�v��a=_��������s�}�8s��������������w�A��NM��6*�/6�V^#�;�!�Q��j�j�����/��f� ��>K�x��K���
l�]z������������?^	�ns`Wb����8O�A�y�v3~�q'������]w�������G�;�Qq��vF�V�w��M���^]B��m{��=�	R$y�"�/p��w���������x����Gn������`���)Zm��~n2�Y�p�sb\]���?Z������w��s���Z��k��[�;u.H9
�i�<���rv|V�[��=8�?�v������R1����sx��c>����w��G>9���v{�y�~]�}n����A�l�8�����n����f1�;K����SO�8��>�� +S��x8�S0���^������H�[%��������[�i5���;>��6\����^�;�������;t������/_V������t��&[
�z�d�����O�����5�5i{�1��e����No�yU��21?U�0��_��f��W���1/�	b�y���;�����qQl��{n���i�����f��X<�� ���<��C�J6�iLt�C=�����l��Y~,�/�e
�:�=��y�����9���y��
7����Q��7���#���|1��Q�F]�f�����w�58rp��K�0���������������>����?��j�.|������Ku;/^���/��z���Wv��o���)DX��K�W����>w����l��k�_�������q\����Z\��g\�kv_����l��n��{��{M���r.I�'`4��.��3%���6�5�ymMx)�����Y�]n���K���~�	�71�Vb���<g^�=^"����m�u���<��������@Q��\%~����mCF������o-��v\��|�����-��~���
���m�'3�����
2�/�"^�{�y��m���=d~�����Wa�x�d�W��W-�
-&�5���+W_
��~|��#qo�^�j���|���#��\���4q��,}��G�>���#���j\��v�6�����>��"��`���,~e������g�vx�x����d��ID�G�������;��qY�bL]�#������Yy�JOZ<��u�.���l-�CF����������t�D�W.�F7I?�wy�7?�Zx��xr{�UC�TW�_��0���X���>	��}/~.?y������\�}��q�m��=}��`,����� �+�Y��&��J��8f4�����J��x�x�>n������J,�x�d����=�@M�wnB�5���:��8�i��s��ja~?�]������+��v��3��V`�������0���W\��,�`��y^:��������/����j��Z=��+��r�O��vgx�-X�����J�o��o����N��&wb��Gs>c�7�z5���/���]��\h`�JH�g7����,w~�~���$�[�Q�>n���R]�1�\[iU%������i�+�X���C+7Q����Z��-(�t*�~�`���G�A��o����g;������m���}=�l&�ZT�
w�'���������xt��g4�����
w����������*�%_Y��`�����<���a�����&�����K'�����yU�7_�����>7^���+�b5^�7������d@3���7���%y���aj�e^j���w��q/�3��]w����,��]����^�s����~f�����V�0������n_K�q��l�CI�
����!�<�����[\rcY���y�5�����^�*�U��a��l�>�6����]�����8�e�nb���{�<������������qAd@ww��}/;���u�[D��Ut���w���W�9�e�J�s[��y���~���=��U�Tzy����w�4^zit�K���Fw�oz��w�s������~a0����}&'��g���R�������'�t ��W"�i�o�Z$_�_|5k�w�y���j]����������:�4����#�I���6�N�sE�(��v+?u���9�s���T���t���3/��\����;Q^~����B�������V�|���,a����V��K�� �����b�`���:/G�I���`y���������z������O+�X�_���w5�s�6�i6�4�a1�����^_:���fy	\��1���s�����>]����K
����I�����_���?(NK�c�
������5�\\y4u}�O�������?��Y���[��'
X:w���w�d�����\��f���[����iu���0�E����_w�w��w������-w�h�=]I��M�w��}��]���u�rn{~�t�|:�%���d2������b��-ZY����v������C��]k!lL6���I�Z�98����jA�8^���S���8�e��k_����e������q1x���`���
��e��"D}��^>h���}Rt�2p�����[��i���5��S������|�{����L$��V��6���h�5}������?���M7�o�O��V7�����x���`�M�Oo����	��������i|�������X9=g�g��~qU�.�!2+�=}bg�g��d{���3W�!�����?(���h�|<����v�L����G��%��]�r��sq���J��T�U� �3O)]�9��R<Yt�eX�3��i���=f_y"l��?/x���._�Vw����+o�U���{zy�M;����g{�nN���~~+��6]�v�S.�>�9)_/�����&���wF�������F�,�w�Y�.�f�[�{�������q���ou��[Du9���lz���wy���8��y�Y�M�w�K���Y
?�uy���{��v2y����l���2?2�
�=�'@��B�W�u���:~|��7
rI����AA�oX�#��������G$�y���K?�|�h�gMk���h|�-7�g��5���G����C��m���Y��c�n������^�u���n��Kl��X�n{�/���+?���L���6���3nve���w����i�]����!
u��^�/��|��~����4���/w��4��,e�q��k����HWG��RZ6�����u�/��� �/�����~{��gI#��r�0P�y��G�l���3�Txw��Eu�����`���<�����~���w���aY�k��FA��lp�W�>�:��qL�am���`���f�����s�����Cw���������G�b����t�h��u#L{U��Z���5����&���������K�]��s�������v���������v��+� ���/����D�� {+�QmfR�~�������}��-�f�����i�����d~s�=��K/������3����3W���G,'!����NF+� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���v1*��sAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�E���)� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �p/���ov��{����z�w4���8�����_G���m���p��h�u��:�Q��v�����l����Q�q��8I�z#�0�0�0
��6�<
�0��8J�4��z���8��8��8��8��q#�� 	�(��$I�,�'�$O�4L�4N�4M���6�<�0��8K�4��z���zP�Q=�'������F=o��5�F�HY��h4�<�1�7�1P�N9�p��%&�9������nK���*,����2,-��(�;V�QFieaT�F�a����&a��q��0n�q&�K�0�~a�i�daR�F��a�Vi�q�b;�Y�����y��E�YfI�a��+�0k�Y�N���8�'a=
������p��(l�a#	i���d����X6)��0O�<
�,��aQB�WDQGAidQP��F���������0�S�<p�<�e����u�>@!�<�n!���I�Q�Eq=�Q�GI���E	�Z�FI%�(iDI�7��q�B��2Y�����y��	n��Q�D��S��F��Q=��0�z����Fu()�����P��`#�I�H�F5���,(R!c���(O�<��,��Q���?��q��AY�������A��r@ f��T0\@��Na;�f,��1�<��alD�
��`E0�8�����-�;	�)��,N�q���<N�!v�0?W��8m�ig5���$�`���z�5�,���2���I\O�:�f���y���f�H�F7����
���<$�	���<��,��q����s�$H� K�z4�Z�Ph`XL���fC}��m�*�EC9�3d	��N�����p�Xf�)�	oR�����������$y�4J����pCY������y��W�!ve�Z�I�$k$Y���24���iR�S��j$�<i�r(/d�[5���%
�<x+8�z
qa�UI�%y=�����<$�	idiPO�F�\�
z��s���w���CAz�^�e�T`Pzh6D��P�b���
��!KL"�����n���a�y�y��<^FwM�
�
���i�{�(�k�B���{����fyZ��`�PC�..�����7�z�6�<�24��Y�����p��r(/d�[���y=�A����z
qaA��,hd�Z@'J�S���p���p������N*���S�8����]/�2��iE0���
����1U�z��@��_@�����5�)l������Eq�$3�#����<�t��^0J����Az���g���{�bwqQ	� �<�24����Y
��
�CX9�2���z=h�@@J!���kR&Rs��dz��*O��9���W���WLk�����8���9��Ph`XL�A�/�hL�Ja�Ph���(����M�p�y����y��
e���z^o�C�yx/%t
[�w��J0�j���E�9b��,
r���)CC��/h40e@�e����oE�7�^�^L@��8�
�a@Z������3N�n�����b�
z��s����$ssL�B���,�
d����1�
�-���XU�������7���7�����y$�<^��
QU#8�����`��5l!�A|�c
P%�
�
5������Wn�Tn!Qn�MnaJnGn�CnA@nt�1�F���en����FE��Jn������e��|ss��9��\[nN*7w������f���en��������on���J����97�����mbn����r�bn7����b���(�)��H�2$eL��4`�mAk���A&CI��1�c �p�AC/X�,1$b����AC$;\0�`��p��Oj'���I��\+��$I*$���H^�(���B� A������������j�P�6�������������f�L�2�h�4r�2
�fI�����h.4
�>�jLe�JZp������'I���F��C�3�Dn��q��o���4����3Lg0����5�a]"F��y���A�����6L����"�A$*:`:����(��Y�lF/M�C�b:a��<j2�bh@��+����1��)C�Gg1~ /�����>�b�� ��d�<I
tv4b&�t���-#6F"dX2="-�i���3��=�|0�`����~�TB����@� 4�]�#2��d�����#�"�=i���n���epLM��i�	�L�3��<L\1S=���y 4@C?�.
��H!��h[��/��������@�����a�A 4��<L5 �h��1�d��~�<"T'��y�f���1�����ay�<@1h�y��������SC8^4��<�*����sCc�xU�<,��<E�!��<�fh�y(Sg�Fl��h�y�����0���E�c,���9�\���fa�5�!�cb���I
S&�������9�A�D���@$���@���t�14�A�a4��<�!�}h��1�g�~�8&x3�<�����(�1�A��)C�{4��<��h�0� 7f�`A4��<�[h�y�i�XS�K�<�K�VT�<��!p�<����6�2(��j6Z���y����DL�bh���@���~��,�~��[B�~L@X3`d D�g�G����L����&�EL~ sc�4H������
� su��p4��XIa@�~�X�0���@��~�BF4��y ��.�@���~��%4�f�1P	��"$AC?�A�A�}4Vq�z!�kh�e� D�h�!����7�����Z
�/�AH���������
� �~h�y���Y@bR�~�4��<��O���
a�5�1b����QSN��G&�V�b��~L���1�<�R dDh���v��
� �Y������l
� ��h�.��@\��	�@$��~�bW4V�]3j�������
� �`h���� ��h�x�����bd���(�Ac���"3idV�D ��~��Qe�T� d�,P����(Tx:�Y�c��~�R<4��<j�nO��rG4���y �AC?�)��e��X:bI�R}�cj����4�_R7SY&�LK�Z���.2)�<��!�fh�y �Bc��~��@HK���@"���#S����	����@x��~�b4V*J@�;��} ���u!D��Z�a�����ePu�4#$G8��$b:�~��Q��b4��<����%f���)�AHL���@p��&��
�����*�A�(���@�����*��L\��@:��~���H�����:�3Vw@?�����3�bZ���)2a+�2�E?&����Z�rD4��<���������T`�2'4��<������������l�r[40���
���:�����\c��1FjV�0i�?��<,��������
� d�h�yXx�,�� $�h�y �B���@��t� ��h�y �a&��l��n&��y �EC?��BQ�y1�e��3UF?��B����,f�H��#�j,Y����,���)>y��L��z[��i4����^��h�y �DC?��b��	$�>4�<�?���!��h�x��5,��D�Q��lAE��$B�BB�C��i5�A�����@b�����c0�F?�	6�AH`�>T�<�g�1�F?��6�AH�}8y �F��p��<�v���a!?���
�2�A0G?��2R�����p4��<���19G?���Ox��:+��������Ud��L/����K,��d��?x��A�k��N��b��i;�2�4�B`}��,��j�q��fWF3W��g���<�P.Zv(7s����y������`Kg����)?+����B��+:�����r��b���50��0��aX��xx�~H�X�fM��F��X�[����&<`���[$Yhc����BS�����UpVwY�d5�U&�2'�x��#��Yfi�%;��lY,���y>�s��Y'f��u=��l�,��bx���	�YLf���?�l{X����IO0X�g���TVY����P�xX�C�Y�f1�EF��I�����2
K0,��t��K���!�������Y������B3=f����)+�Q��L#�"�^���1%c��T�iS �7L]r/vK'�*0
`�����9�n���W
s�2<e����!#�A�z�W�Z1lbH�p���� �W1H��sR5i�K�$5�^
��HE��+����n�.�*��w�t�ttmt[tIt7t%�Wg��C�yD�GyD�GyDA�U��F�����d�����;
��f,���(����'�<���$� y�H�a������A�A�a��"�AQ�c5��<"�#�<"��L��@�#�<"�#Jy��~�GyD���)���GyD�Gy�i���Af�,(B�A�AQ��
�y����yD�GyD�GyD�G�`Y���V����A�A�A�an�,����{!���!�������jx��)b��S�b�=���~X�3n���$��x�����Y���b��g�v"�~<]�*OJy�SO�1S!����q��C1�C��Xp�<b����3>��<b�#�<��(y��Gy��G���� ��0���-�C1�C1��y��~���Uw!���!���!���<�����c�#��Gy��Gy��Gy�{6/�$�y/��@	���w�VO��f�X�"�|,�<�a���y�#�����@	��@	��@F,H��;pG?����Z<����Ek��t�������<���,O`I),lC	��@I�W��<�#�<�vX��<�#�<�#�x��~�Gy5�@y$�Gy$�Gy$u���a��":��@	��@	��@I�Nv�b���H ���@	��@	��@�A���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|.>������s���\|��,��A�kG�N��b��i;���
n�[������^���ov��{�G�q�t�G�b8��o��o^��n�N�]�7J~���_���~7|7*?��"<S���� #p�
�^��-�x�� Bx�:�u	$����uh!�B�!�h���� ����i���F��f �F��Y���F���A
u(|=�g�z�����p7
le#k���&��r�,�@9:��cN����2��p!7�$7z�EaUX���aeX����7�B8�u
N��QA��d\��m�C����a�0>�IC�%�_X{
�}�=A��q�*�ea;�H�	�$�c���������� JR?&X�����R0$�zdP��7�Fp����Z+�.`�9x
����
A��BPadDH����9�R`�o�nNJ�Na;�f,��1�,7��&P����
�Oz��+'|"�<��	-�E�u�1,�����Ha!�K���fP��>��T!�*�C��B�`1����,�p�0���w��H��$&�	����,P�E��
`e9�)S���S���H��
�71"��������
��@��%��$nbD#���0�S�<r��06�	��`D�"��4�&���;x28)��x
�K���E)����l�5�4��2�QX',��������	nU��M�, �
h�����0�|o
>��M@&�E������)X$�$F 2��
�N��0����E\M���fC}s:uj[�S9�3d	��t����������0�S��B*��z��8xW�����-�;)�P"�H�W�!ve�Z�Iux8�
L���x�^N>.��+��B&�U#��������'�=@�!L�1O����������0�f��	2���W@0��n��(���8��`�P����sY4���
�t6���+�r@ f������17���5ca�}�r!-���]�a�c�e�9��F	]������f�4�A�tl�W�!v�����i=��������x�����~n<"���nY���T�����-c��I��
����e��$=,�(|%",9����)d�8x��n@�K��)ZL����ze*bL���@9�3��as�cj
w
��5cay�)�tQ1����s �,GxA7����a�N����92�j���E
p$�����k��b�1d=�A��?���;Q&L�� �r[L�&a" 0\^@��7���t��#�(|%�< ����L��j��h�1
����T`d@�����!�
��`���R�i^������9����7K�P0<�����p��^0J���4+ 4@f�	���d���`\���y:Y��
cLj(���2 �2�i�����"��r/i/� Ke�I�0 -����
��'H7
���tm1�=Q���q@��9�9�G��e\M2��T������q�T�*`@���C�������\XH��<Z/cL��
�)�!�<����-�;����Zv[�R
Y7�u�r�r�rnrSr8rrr����97���,s���,7*��Tr���}n.;7�����!���rsR����Gn. 7c��,s3��L%7��M}sS��T*7��M��	,���ms�����s�Yn������E���1ibM��*��Xg�}u�j?���f�zk-���F��G�+HV�`p�Z��0�g���<�U&��38a����y'sJ����DY����2��9�!�:�cr_��nH*�i���.���*Zp�t�t�t�t�t�tttgtZV�����3���c����i�4X����hb4$����O�SY���3he0� ��IR��������FG�� .���9!�,'i�%
��3�g��`�!7kV��%b��~���D�J�A[�j���x0RD?��DEL�B���oy0�dtD�'��K�����*��@5B14 ����E���r�)C�Gg1~ /�����>���eAZ�1� y���h�V���!F�����a����t��&�<�z0�`�������W2��YQ'�b����yX���Y>1|dXD�'��=��0����	�1�`2��������R\1S=���y 4@C?�.�WV)R�!'�V��<��h�yX��b�<n�1yd��~�H-�Z���P
�2f�[���G�U\���EC?���F�F?��*�T1���~�H�y+i+�j0��
� �J4��<���r_1��sd��
;�=!�>8����Y�[��&�A�<��W����C�H�~�e�1� ��+�Z�,,�f2�~Ll��0Ia*���l�����9�A�D����*7��@���t�14�A�a��W�i���>4����3�D?����4S�1�A�R�C��au�2��GC?���Jc����4w� �A�-4��<�4h����A@d�N+�@�G��A�E_�7/�$�� ch�yX���%b��@��~��H�D�r
@����Z�c���#!<�<�(�lV�`��~L��01-b����1��$Lh��@�����:�A8�e��0�G?���O%�!�h�y dDc@�~���\�eB����@������N7�V����� 
� �>Z�O@�P&��sjO$J�"[4D�����cDf�I#�*�_4��<�$�.<b�~�r.4��<������n>f�I�AH8���@��OdH���
a�5�1b����QSN��G&�V�b��~L���1��)�@F��~kW����@��OH;��4��<��1�@?��N�HM�"a4��<�����:�S$��C4��<�����-�'Mv@�!@f�g�2���bd���F{G��2�J#��&i���X�*#���y Sd�
� D��T�t�d��� �xh�y ��'_�\��#�H��<�����a�c�e��X:bI�R}�cj����4�_R7SY&�LK�Z���.2)�)��@j��~��)4�z�y�IC�i	�AH$�Xd���i��h�y �FC?�1Z�O���2���@�!+�B�����ER��E��Tj����y�� C���(�A�F�2������C�-1��LN��@b��~��{����@~��T� d�h�y�)$���*��L\��@:����J�H�����:�3Vw@?�����3�bZ���)2a+�2�E?&�<�d�y GDC?�Y+��;s;e8��	
� �:h�y ;A����Gv8��Cx��mhEu��NW-������YE����r��/b��~�2^4��<�Q������a��4`��<������t���2��<��1F?�y0�Av��0�@6��B7�c��<�����D��(���
�2V��*������3\e1�EZYTc��� ����e��L���3]gR���j�L�s;}f��4
� $�h�y Uy6a'�v���cV��Ly �F�.�P��=K�E����P�
J"d/;�f�1�F?�Y5�AH�y@�O���<`��~�l4��<���s����l4���y �FC?��N��C���x��� ��h�?yg�y �FC��gL���@��O��@�����@"���� ;�g��V�Y1f%�>V��"���'�L1��E�X `��������3Gk���#Y���E�Y�7�v�����g��������
��k+�+�{�w�����}wy:W�=�[s��/w���h���6�����Vg{���p��k����W�����bT6���Sk������i��������_����O��l�{m�����
w�G�on�k��������gf�u��Gs[�����$v������N#�~��������P�6��.1���l���\��������~F�Qgg�N�=L����w�����U`�j�����l8*�f��i�������Q������A9������b=���s�i��;�!	W��NAAA�_�AAA��U�A� � � � � � � �&n�~R�����'�����~R}?���T�O��'�����I����~R}?���T�O��'�����I����~R}?���T�O��'�����I����~������n��Z�6���~RN��#����������+D7���v1��=��Uu���=���|O�{:������t���=���|O�{:������t���=���|O�{:������t�������{w�+��S���l\�U��M����
w2(KW�M;7�7������G�l���nbr����(�q�;���u���=��k|�#�dSz���������Y9:+�S�����������c�����?(���<��������&��r���vz�6s���+�K�����O;=���9.�|�h�:����s{k7)L�hv��\��������9Q�;''ns��3r��V�7t���p�+�xZ���^'���Wz�w����/7��=����Gxlm�V�jd+��{�.�?��������n��?��`#p��F��������-��]�lb+�����8��o&������O�����u~3}��a���<`'f�4�yh[��`��s������{����Q������N
��vF���0�O���������V������{_U�����?��������f���W36���uF��d���=�Mv2�������l���M������,��\��3��������[��lcZ��Z��^��W�������d�k�����lf<����������J{Z��	����������mT.o��g���m���;t������/_V���~�r�$��&����Cb=�S���5�5]�m�����������Y1r���!�����n���{���v��-��x�Cgt�`w�������@��X�_t��y�{�����9�{�s������n�?/�31����/�}1�\�0
�(#����d��;L��������|�{���v����>��>����yy�{p�b����O^�}������?z|�6x��4",c�%^]�'WG��?w����l��s��o_��g�:�t���.���3����
��w^�N6�l	7�
z��{*`u��|����y��u�@;������;���_-���;���.M/qo�^�j���|������+>�&�����o:
��}Le��V�����/���M�J{e���.����~;������1`n�C�}�<��y��f���\%������]��}���3���C0�I��e���/��h��:�z#�?;�T��rm�	�..��j��~	�0�v3<��L&�c�jh�����bF2��-O���.����OWcw�-|����q�q��5��
�� +����'��"�~V-1B����y1����~[�RMlA��Jsf�W�9��vgx�-
���\��ki�V,l�~v��~����C���&�p�*�Q�>n���R��h�k�m�Vz���=-{���M[-��8Ys���[���T�����.5��.E������NLGmn����az��������to�G^������~�?]���y�|��������������/��TW��;���}�#���!���i��Q��	:���y�Cx���P�����
���3�K��[��T���n�y���[������#X�6qH^���as�����+����oX���\�nb���O4�p�/�VWp�=.'����.8�&������������Z��It�yE7��~���|u��y��u�f��}�$��9�+0�i1�hF�/5��Jo#��j8+����jVjw�y^�t�����jW���kp����O�����E��ommm����]=���_[����W�����s��iJ6(��UP %s�CA��_���2�����F��Qa���������7�`�Dm�\���1��w?��#j������fsn�f5����3,f\�V�_������(1�S��jc2�9�I��M�98����j��=^�����tGv�^3��_u�������a��M�u���^�P?{�'Ew�+O�WI�&�[%�3�C��/p�r����r�����%���%�U����mOF��/�a]��ES;R?���OK�
�Y]�d�>u7$�W.Z��7j��,t��*���p&��s&!��/�n��A�i���=fu�6���A���/\�������%��������^^t��.�u������2�|�q~3��d����1���LH�zQ,-Q�&���wG������F�,�w�Y�E,�O��d��|8��x������a9�EX��y���W�]|��<L����A��^����,�~����E<����}���ow!��?,����h���h7�A���y�;��_�����������(�%�~�z���6�|��
��]���y]�~GIm���MY0��%�~��}M��Oa�����}yd/��kkKG���]7���t��^�~��*�����7�G�s�w���EL���z�,NOk�[��m��������y>{:��W'��;�h:h�����F��=���ab������7\\?�tq�>Pe����R�����Nm���3W��Tsr
��*�����[���>�V���vy��tP
���~��]w?�w�>/~��r�%��W9sR���\�����O�:Cw��a�}-:���&��}v�A�"���P������nn�[������;=�{��A�h=�	�[n>�Pof��Vn���s�����s1����A��_��
0O�0E��0
��0�������{�<����~��vm��[rW���m��j��}���C\���F�Z^��hX=������{��zS:]��?���o������B�d����=M���L��e�Go^����C���9t���_�}��|�s�=���I\;�����;��{����{��^�z���������������e�u'-�Z�HW���w�}�<������ku���cHuD��u���1}���������s/?��oO�F�.J�����5|J�����������7t�|~����4u����`P|Z�����EH���N�����w��s�a�����0��������|z�^x��3l��No�u����Ty�]�R�W;��=7������������&��9����}x���������������+��*f��|Q[] ���0����������7����|��V������P]>���5���~��e��h��	����"��t����s,G�~��N�5\��ul���<�/8�j���r�.�	7t��w=�mN^a�������]�d;�h���7:���
���x��?<F
:��1��??���?�����A��|�����6���?e���`O�O���=��]9:�m���l����z�,A/�����'=��F�l����S��������!j}�������l8?���f�t��������k�z�R�M�wU}/���0��l����w�Gx�I2�I+�����wX��j��`�����Xs'-����j�?s_���>MT��4�6.�����o��������R!�j�[��h�����_{��OV��}�'��9qV�awo4��7���<��w�	<)_:��a�>���%�2�'�0�c����4�D,2D��(3��c+�t����������T�,}�+jx��~���no�S������L����
��{p���W=�����/>����!�����������7o��v_�U�z�Ok�az��������,|��W}2����'Z����(����7P.�w��7����e�i���~{���
N����_����>��e�?h������'��Y�X-�Z��7 ��DR�����?�g}�;�C|�q��O�Q�����Z�P��Q=����O���=���i����������M��W;�(���7:#=<F
�O�K��)Ci���L9|����&��h�~����+��xo�~�i"�����q���vn��a/m��������uv����^���e3���\����B_����*�E��0��7�wN�*���Y��J�wCC3���X���C�m��
�VP~�([p*���$5Y:g/�g3����ze�������$71����x�=����+*�-��!O��{Q	���p4�FM�4v����p���?�Fn�Y���������g:��d��w�(0���-��{���M\}����o��
����~�(:��a9��G�����u`�W�����}����\43U���Q#�|�s�S3iT�����~���T���R�~�+���x�G�.9�[\\��v�ze�jU/��]L��S��k�R|��J�>��j�3��L�t\"�m���E�\L��gs�Fl��������A	c�#H��;`���j�}����E�����K6���?������no�:t���~16��s��*Iw��^��\�n���q��z~��<XX�p���e��P���mw<<��Nma"-�J���!dL���������#�r���h�7���+�s������u5z?.Z?o��!��/TvF�U���NN�O5��?���g0�������.�N��vS������_{����.=��'js����(���iY��nTc|E���L���s���1b����78�D!'>���[�u�k~��38/��
k�����z����}���?LLI,t�6����u�6a;����`�C��:8^�~��;8����A^V�=���ew��*���fR�;�{��_}���I3��^��jg����������u��L6cZz�������^��9BD1����5�����������<^ mNl�����^�����3��j��X����I���7��G[2��*�p>���a���^���k��C�Uu+x�����//�)������E�0;�������"7���u~.����
7�k��h��������O�w~�~l����}�����R�8���������U��7���.|n�f�jV���]}i���7]����wtV��{��W���������f����y�����r:��N��<7��i����a(}�����5��������;�����f�����I������:�|����S�~t��w�b���d�~�$M���]1�m2�l��F�����������lp�������Z��xq����)2��X���X&I/�|���f��y�~������E[�j
pR{�;�
w�U?�=���]�����b���di+s%,���'�s��f_9[J�O^[q��h����o���[���m'[��=#"8}���S��nP�y���� �c�n�p���6���F�7�gY,qE����k.��������O�&W��%h��0?�r�(��gSR������ze�~���3��.n�6WW���I���{��A�nxKw�&?qP%E�����;���%����f��.��p��m
Z�V�uZ��y�hla�������z��
0017-wal_decoding-design-document-v2.4-and-snapshot-build.patch.gzapplication/x-patch-gzipDownload
����Q0017-wal_decoding-design-document-v2.4-and-snapshot-build.patch�[iw����_���'K�DR�,��Zb�h{�<I������z��������Uz�)�K�)�Mu�B-��?�I$������M�������w�����Q0�������?���$7j!z�b�����~��3��Q��J��S���x!���d,S��
;���q"35��j�^O�3�1�7����t�?�}����'*/��W���_@��;|'V2|�+/��x6���,~����3��wh
���'YCL� ����b��y�h��������H�N�~7U�0�d$q7Lf�vO�7��/:��L��{��h ��~_�\��i*!*�	�L`n���:�z<:9wn.FW�����T�������Z]o.���[����b�RZR?i�6��*�@D��Do����W����o�A����h�gA&d�+1����X�X$U~A����_��t��gSr@!��Zv�<������'��o��f�E��O�f���k3���dz���|)~	fsq��*'�.�I*���������Dg3������K����������0�HZ+&���i�/����%�l�������W.�Tb�Y�w0RE2��:ne@�K�F���q�&~�1�/_���y�K?�A�E�@������%�����J�l�wb�6+�k�K��J����N��?�K��2�\�U<6SA'�NUJ�J�T$� ���������u�;�ys�J���,���/�h*b�	vh'���lP�s,-c�dX�RMd�)�P���jxs����	��00P�A����L3M�m��+����P�d&�L�*�q��Z��$-�e��P~?Z,�5!1k�@���$��b��Q�H���p-����K�*�d�J ��*���D�*�g8�.����v ��fN��I/�*�[��� �`Uj��H�������'r^����O
�r�3��;})�R���h/��3x��X�*!�'[%���O�2�����rd��Y)<���v�H{��l�P����d���Q,O+�-�Y�v��f�t�m�C�I�Y7RZ��j~�<w�Yo��;����~�t��|�6��N�E��*3����~�78�u�O�Y9�����	�Y�����[?OI����������Fg��j ���B��Q'�%��B��vMk5����6�U����c;�i/{�0��(#r0%�����8���6� U~��r�=h��z�a-u����4R~@1��8Iv�v)0g2���[�f���	���������x0@8S���G�Q����0"�4I����c�2 o!W��$���$"\��S2R^�&�a��.�b��#�F[�,#>������>u:������)����v����u��,���|AhFM��&��	
�S��Y���L�),�="n@Q�$��}%i�p�\(��MJ6q�+�#GWb"u�U�I���1$�\��a�/�����Z����z	���t��k��9M������)��;P81<'0���`G70�n��("=T�������m�p4�����[���G5%������$��]��b��58"��91�4�^��,D@}w��2�)o���=����1�`<���6�S����P�������%�$A�1�G�K`
pIw1���A2g]���U��^V2��LJ�H*RDy��
�N8L1�BZA������vR�����@�5	���(�����L�k�k���O@��`���&��R�M����/�_{�%���LN�

C\��=(F�)�	oW)�d�����,�����94��������8m���3�gh�d��iQ`�?��h_A<2��H������[&(�G�%��RY�VA
hR$�����"xxFK��>^�"sO���t���y���'u�����vP� #�,�K���M!_��sxN1�sr���i�����2)]�1q�e)5��������c�P��}�Z�<fS�X"u���$G��V�9)z���geZ�P����Sl��	��[����r��;r���0G�8��a���Ilp�*�C�>ZNU��-g��a��aQz��T8!��3�,A�Q���$��`�U�SJ�Y������J�U�c��wB#j�>9�f)�S���8@L�)k����G�~?�HLA�6�"s���9��R����j��Z��yE�t(5o�0��8�4�H�N#2�lnj�y�h�=5��f��Y��
Al������4��X���&�I��������p�;�p,D�� 9�R�l��c���H"a�EY��S����A�H��1�A��T�"�v@P)��8h�
V����"G��kC�����1�����wd&!5��HZw�h��  m�M������i�{;��1��(�*4E��Fs����fJ�mF7@)�-Le�2+��cj���;E� ��8l��(tS�������{0�U����EPmb32Xk�=zKD�����*��`9���r�
1 ����TC���k���pm���A�x�P\�7�ij�����Q�0hbY-y ����	��d�Y{���umj�:AD��H��._[����T\��R���Q�l���e���v�@�&�B��j&��lE��:��B�b�h:���<a��4rr���lF���An���B����LFcl���A[bm�s���#�wv
 ��xf�*����:��Kh� �r���?yu~�c4�[_��>� �=�m�T6n���3�;M���
����2�j1��E6+�Y\��Um�X�%$�����e�C���$���Sp���0�������h�U<9��=u8���l]v�F���rb��v��b[xH�U'�i�����r�x�2,�P�M8�W��2T��Q�'�
���Zfs��Q��<yQ�}i��{kX��W�y7��Z)�A2����mm����2��L�B�l9f��'�sj���'k���U��t��%�_�iw@I�&�D���E�=��a���p�� ��%'��	5������q�� �AJ��;�}��pdy��C���G�����P)�~�20�>�B�� ��?e����c�����t	�H�b��'LMl
�	�+d�&����h	#��&��B���\)g����"E�a9�N��'W��Od�[[i���5���rO~@[�L2]��Ziq�=��S���&C���d.���c����5�F�>��6z�AL$g����b�j4���=����F
��\0��})������h�s��T��M�O���(�}�'�p/^���>��,YJ�����B�W�]���W�P(N
����{7��I�,7�l$���7���>��6G/�_0��������97L���O�|x�:�nCvy��r��m���*b�	�����t)�nQ�������(������T�m��J����l,D��[i&ZQ�V~������"���(��Em���*���O�������E����f�f'�=;V~�sS~jEh�qB����/��TA����N��B������~Y��UHfNs������u
-�t�}8"����4�x[$:��s^����8�1�t����
���B��g��y:���('��}n�s�{�������I����Z��M:���[n�\,������$�w��Mn6��{�y!�a�7HvB�A	����rNn���->���nYn������OSL�H��4���2a�,r�$m�������4j��t���,I���Slq��r]sk0�M
��|�����c.eP���t�8�������F��K]Y�bb�e�5��@7w9��#w�:�a���lm�([�	Q�H�G]OT�QM*<o5�������+j�01H�������F��f�t�!����^����T��&��7�#?�$�N�FUv�M���c������?�����nc�i�qn�������7t��J�������\����;�~���T���Wz�^`�g�����ZU-�o�]��<��YI�T��]{�b�����nz)��������r=h�1�	U�2���w��9��)�X|����n
j�D51��R71��)*���r7s��wd�X��b}q�o��,�|5a�R��M�#�z�
�����_Vi)�����G�����$ ��6�[p�t�w`h�B�l����e��Wa O,>Of�%xZ���)����a5��I������\�r���%��4�e��p#V�A����n���9�"�#;�� ���V)���dB���n�i����������d��K>���@\�������`�u-�
�b��
��Zqt���q�(��b�X���<V~y����$���7�E�[���Oopp�������������~FW3�����V`���G�������`C��0����m�;H6���j����9"*��Jk�:�N��H���
Yk����$5@��tM5c���-��FXs7���(����(5Ke���=l��.���K`�:)Lw�%C���#�u�ech� . ���@h7Mv�wn�i&+s���"{���,L����v�g���A"��F3Pf��b�t�����`�$H�0�u�(�����������p���!��u�b%��a7�l��a��/���v�kn2�bP��Tza�|B���+����+�:����P�4�UI��%8a�!�4���� 
��3�"
�t���r��T�u�������F���-������/��X�n�������\���N�gH��@��%�l"M��ZH��y�of���c�Z�i�FJR-�7G��w����>1�I0$��x�������BYP��]gq��VR�	k�b���L�#]�������vY��c����Z��W�&rb�h��3������5��dh�<�����
��E���Q�h^���M�&TJ��o^����e�"w0�2��m�_)I�7���w1
;(�?�\�� ��n��)� C����������UwHvU���0��[ �J%,��^��a��\�w�Kh���mE��QqM�#tPL/�cl=99���-�����x����c�����������f��G3W��@��3��pW1O��/���m4�/��^��W������D�Lw���|5~}z���������:�j���z��Wv���]���oO//��F�����N:�Y�?����������7�����X��[�y��xo�����nL������W�����mY�I�_�����O�G��_��[f.���V��$����^�M�,����V� x���[���+�kt�f|#��Z���}��4�\cb1�M�n���W�NN������1�|������-�c#-q��+9����v��]g�7�s�o����.��Ooy�p%R���Q�����2�)3a�m�T����cc?)U�3���J����������>&�����|��w&A��y��6lg�����=�B���J��r�d-�VW�U���{�5�����jL�y ���^�n4��7�!=���=?-�������:-A�=��
�/Qvb�b��<
�f&�dg��{~489|����������!����R
��[}`���C�������$�w�^���2�����#�[���L~�>�l�7�A�a�|?�������e���-6����=��Ae1�}�s�����M����o���1�G8zl���EV��M>v}�����O0�i�����#�7%����Ry�����Z�v���x���0_�����]����C�SZ^b�7�x(U���)�
t���~��{���_��*�}���������V>��[L7��
�5����/E���*\������b���@��L{[�5/\�����UR�x�Q��eN&�-���|,��]kW[W��<���=I��n'����`lfa���vfV��H��BR�J�t�����Uu���<���tu�y����]p��������Q��L��!�W�m��2���E�1M����W��G��7�UI���]��#*��9Y;9�:>Y���U�=|��F�{������y�z�Gg�H������{X��. hS����(�����a�
i&��^d)Eyp��F�8���2����y�}N�lk[L�W1'�;��������_��*��j��%��f8�(�r��@!�L����!�I�djNRL��eP�{�����fY��+]Y����y;�xo���_�y���N�F'��� ���tun�~`<�_�0za	���m#��O���y�����7�~�{�L&����WU9r��LJ�D�������u���
�*
���l3����$0
�=��=�����xs�����NA��z}���)�>��r���T>����t���k�]B��uD&6�p@b��	����%�l�a[����,��U�@�O! �e(c������R���v��z��(f����@��Z�]$	��������l;4�@"�,uiz����d�JN��~!/�2�@*��n��JiT�����+G_X+�{�n�{��b�h�BJ��x��v���Z���l�</�_��)� ��z�8�cAd�q���<��q^8�C@�������~����>G)"�C�`p`r}+8���=�0O4M�k1�Rd�g�{[�|����]1+��E*N�1
!_	��hLz���W�n^�  cnE��1�x��0].M��P]��Y�{tKs�]�����{��~�B�_�e{�����A����wh�B����[�H����I�z������5��{*� K���K��$���2��&Y��#��V����~�i"�������_���i��X���y$���e����~�*HkZd�p�9���JKS��(�����8�T/b&M��������_,?���bdK ��bd���� �>����:SJ	4\`�	(QE�k�i��N�R����n�������r�����O��d �8;!~�	G������l�8��U;���r'�������%�&����|P%)]��2l(�>�
5����,k�R����k�X�E�kp^��*��������	�[������D�,7��2r�S�����!����ab�<
DnFH ,b��%l�${�s���Bn�2���f�i���������H���6)��"����D�E`�AN/�bcja�����*9�
2��:�R���&��R�'���x.��8?��r=�8�����{C�6�*�1��!j��?u������j@�d9`�@mE-<��D����w�<�d���f1I�(r�<�b���`�e���L���;��R�2>���bF7�y��p#u��-���?��MO�(Vd�E�
�
����w��E�)�P�WC�oi��*`0e�2vI��H�o��&:z�Y��%�.�1�2���p.Sl �,=lbH1���)��DC����X��u����D�!w���|�r�H�N#VP�9sR�"��Z�!�����/B�4�]����{m���"N�e>n�?��b��U��+wC����:�����h��b��B)����+�I������Mo�G6�d����!c���`��V
u(v�������+V9y��;�0J;Z��Y���q)[��2l��ZF��3��}q����g ��5�����H~���.�~�^�r�sI�K6`���]���v�
���������3�,��x4)Z��c���~|o�����l���H���?D���A\#��!����hL����r���N/�R�t�9W5�
���Gj����j��*�6�r�G{��1i����&lS�����S�m�]�3�YU:56�:���@&�~������*���6��Z�y1�2`9D^)<Stch���q���`�[��a���R+(q��e+��*}	���\���}��������$z�\�b."�>�h�l�Mn9�r��������������9�����^S%*_�,m['��4v��^��hh�W?�#g����<	e�KVf�n��A�rvE8����b�m�N������9��bTK�Z��.dW��9�!|�i��1�EF��V���(G��Bn�����$�^,��Hm�G�S�i�F)�p�D$���uKFA�;RgD9#=���X��������Sk��D���� ���3�y��xZ���}�m/KK#p
?5r��5f���*#=��6����u����X�n�>g�����c�T��Y�����T���B/a�@u0�%s2AT�� ��)"���#uT~@�c9�f�i����J�j��4�7Q��Me�6���e~���������
��
i���F���9���.���.�<�8���{g0�4C:(�����cuU�Hr8�+s#(cF��p�J���_PhO�4d��;{V����qQnE�.V��L"�?dty����k�I���l6o��5��[� P�����W�.{��F^M����{z@@�qL��/O��?l�8:<���-|�S����3o����u��$��z#����|��c�NO5���M�N���� sCC���������q��s�o2(�@�c��*�]�>9��7�:���8,�Y�ral��z([��=]�$�3F�c�
D�z���;
���DX��.&���3O��,+�~%$�*VM�Z,�������do�����O����s��y~�z���\�<^����3�d��e������������B���U8�;9�?|��-�������{��B�m��C�O��������l��</�|�����B�\���m��{�N�>,�`q�)����\�|�FJVA��1�!)��Ww6�,w��� I���o�7�h`�*��������0 �\�����x�>�r�O��m�<��:�Z�A�-��>R"�
=�@W���';�����F=��
�>8zFi4/y(�}�|�;���8����W0~-:��e�!����������fZ�k���D��,�9s�	@���q�!s%y�9��� 1c* �)�R6���t����l�N|&���A<��U�	%�>\���?�~�u������.�������D����#��Q����-����
o����W�v������/���^=^/�X�O�W����}\<6�3�N�l��-���SD���o�+���bm2����a�����-]�Z�h��xT$E�Z�Q>xW�od�.���}Q�E{G�m���z2�?���2Olli���n�����JY^��=���Q��>�����U8j�K�R(��R	������W�)��d����n�
(��K�}�E��R~n&��]�&�n�7(������p}#������1�@x����t)��AI?6x_�N5��`X��/V.��E��%�?f���NQ�� � ak9�3��@�M@[
��b�����t@�rC�v�.5A�>w��%(1���]�5�(i�t�h�/V�
���B�+21��`Zy����+��P�
J�|������L�t8���HV,�b�,�`-�K���S1DL�	~`O��_��o��:�?->��9	��P��qO�|C�Gb�W;��k7(s=�JC����K��������E�e��jpS����&a0D[��F��#rL�2�J=7���DXs;�t���h�����C�V�Y�'X�@�pvr��<�����o5�%�K�
N�, ;PC�1�x�i���0^c�N��}�8�*z�^*�V��[Bc�����V�i=
�A,t��C�8�?Z��V�hg(��������>Y���T��K���3G���������]�z7�������j��j��+��	�lei�W)>��1���k���I����,�T.����;B9@,	�����,�4�cY(��%�7�;�3�fD�H�]���8�����|izo�@*��D.}����<��Z�)������V�
^���!�uc����z���1i(�(�KI�5E�Ych�p���y\2
�d��(z��=��"^��]�&PB
N�4����5�E�{96F�Z	#�w,���0�j�z|A��{�=�8��J����3�7(�
/1Y������w�_������$�#����*J�	y�YfR�qX���C��(�Pbk�C�������e�A�������rs�>�:e��-�G+`��|7M��U70�U�u<��n�eB��k�r��U�r���p��vd���9+��~d��X���7�i,��p|E��)�#��~u���~���.�H�I��>K/�}�hL#��bg��SE;PL.�b���)�Ty��y����H$���cM���Wo�k��a��4R�Bv���28���b�w�r���fc�n�K�����RB(�XW}d"g�k;��_l?yW�Zq���l�x���1F<�T�4�����x�O74��*gb8[E�A5�����Y����"*�p\]����L!��$-�j�R�4�]���OX3\���I@�&@�c(��
E���i<6V�I�+��p,Q=��y)V��O-��j��$��[���7e����b�0�<�We}8�����)�p���.����e*xM���2'�F�:3D�v�,�5s������h���@ITS]viH�-b�B�J�v�s��*��V��Z,E`DE�S������p�$���mH�C�lO���i�7^p�i���:�Ok��<)'�
��Nm��y[o���� S������q�`a�B?��Ig�lo����|�Z��u1���b�����6�D\��^��2TR�-���]M=�5�!o�G�]�#Wc	�qp\��Id[7���
�:��b���-R����bK��m����zO�y�
���C��_�Ji57��O�X>k����DI*�y��
e]����g���
G�����[���">����1S�i2��\�`�"!��}����U���8VI��|n��p.��#�2w��xM7�J�Rg��i�C������-�m�.McMj�x�,qV[������n5Mm8�'�>l1+B���E�f�]���_��Z����W�?��+���x9�D
e�=g>����R'
SM>��SS��-�
�u*{oN(AP,W�7�=1V���s�O�*
����8�V'��!���T�;x�C
B��t�Y\�r2g�b(T�o��#������Y2�f���W�������r0�����2V*�uZ��������|���E�^�rQ����Mq>�u�$��|
"���l�:1����m#�fQ�W��*�s2O5�4;����������}��\� ��g-*U�Q���~0��dp�D���9I�������������,OR8$��a �V��b&���Z_&�U���E���J+�M�;�����s!��.�����<V���1���;2*�U��S��Wu,����R���������w����|�o��,��w���K�n�k��j<���]���
�6E����g�������x����W7���_����?����G���e�S��i}�X*<R*��tk[5�������#��DopV�Q�����K�?���7�)^\�n���!�%cI��e�pZ�;"�h8��S�����S��>�u�0i�1��Ei��-�Nq:����+D8[G���a���
��oO���S"�v��������Yj����H�i�*���������m8
� Y�	G�I�������Z��s���g���R���z�����n.A}#����F�(n����B/����r?��3����������j�\U�jmJo�`��WF}P��e�e��z����)�
=����}�z��ZQ�B�\p|�b�6i�S��������
"�Y&��Q�z���*����
��f�e����
��M�~�*�"//v>k�^���eG���]��'�������8a�m��f�o��Atty�j��3�;�Ed��+$�-���e�0G+]�G������G��ei`��=42i������y������b���NH\��� d+�-
�h���YUS��k��Wp�2�Q6r�����`467%1���=
�Jm�\�3���:b�!��A����Q��}� m�l�����g����v�>�E��W�[��lq�����H�B�^�*����w���������b��1E���~�k� �4�R���/-5L�$�b�����"q�V^�"������p��5f��iz�\z8C��Y&'9���:;�m�\0e�����C��h\&!
���(��zt�]�]��c�i�����Ql�0b?4I�����Ny�'�sX]�
4d��v��R���c����I?7c`��1�"�[6g����]��
�����`���M��r�wU�����N�;gd*!" SVOZE,��K��b1�,h�7����.-|$�9
a\/� ��Y\�	}�E��!������H��N��z������J�@�v�e�u���q�-Hn��7���,W&����y��Fbp�������h�u���~:�C���xs�MT��A<(K�
zR��8��2�p��H��C2���U?���b��.��1~�Nb
t8���zk��L��0Z�gh(��,tZ)q@���K9�d�WoBv2��w�f�>�^*���|�86p�\����KjuA�_7���k|T�mc�f��hW�xo��y$����	A;
����}�~���w� ��a/����I9�T40�K�T2�9�F��|����������Nq4�!>Wy����x��z=�mMl�\��d�K�(���3�;��0���R!�������1�P�9tf���
�oD��d��7���@��q~=E������e��o|����m%x��5������sj��������IR��������_�!a��E�;6��P�%�����;�R��R���	��m����.���p�eE�P(
���x�0��[��H�g�u(�%�+�~�����3p3����	PD#�e��CyB�_����3��gI�Z��3U�T�\\�������$^�sb:wq�xEB'x�}�n+�t��<���:�u.>���?~�}�W��+��v����
#5Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#2)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> wrote:

The git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch
xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4

On 2013-06-15 00:48:17 +0200, Andres Freund wrote:

Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains
       the tests for the feature. Uses 0014
0017: Snapshot building docs; not required

Version v5-01 attached

Confirmed that all 17 patch files now apply cleanly, and that `make
check-world` builds cleanly after each patch in turn.

Reviewing and testing the final result now.  If that all looks
good, will submit a separate review of each patch.

Simon, do you want to do the final review and commit after I do
each piece?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kevin Grittner
kgrittn@ymail.com
In reply to: Kevin Grittner (#5)
Re: changeset generation v5-01 - Patches & git tree

Kevin Grittner <kgrittn@ymail.com> wrote:

Confirmed that all 17 patch files now apply cleanly, and that `make
check-world` builds cleanly after each patch in turn.

Just to be paranoid, I did one last build with all 17 patch files
applied to 7dfd5cd21c0091e467b16b31a10e20bbedd0a836 using this
line:

make maintainer-clean ; ./configure --prefix=$PWD/Debug --enable-debug --enable-cassert --enable-depend --with-libxml --with-libxslt --with-openssl --with-perl --with-python && make -j4 world

and it died with this:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I../../../src/interfaces/libpq -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o pg_receivexlog.o pg_receivexlog.c -MMD -MP -MF .deps/pg_receivexlog.Po
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I. -I. -I../../../src/interfaces/libpq -I../../../src/bin/pg_dump -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o mainloop.o mainloop.c -MMD -MP -MF .deps/mainloop.Po
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g pg_receivellog.o receivelog.o streamutil.o  -L../../../src/port -lpgport -L../../../src/common -lpgcommon -L../../../src/interfaces/libpq -lpq -L../../../src/port -L../../../src/common -L/usr/lib  -Wl,--as-needed -Wl,-rpath,'/home/kgrittn/pg/master/Debug/lib',--enable-new-dtags  -lpgport -lpgcommon -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt -ldl -lm  -o pg_receivellog
gcc: error: pg_receivellog.o: No such file or directory
make[3]: *** [pg_receivellog] Error 1
make[3]: Leaving directory `/home/kgrittn/pg/master/src/bin/pg_basebackup'
make[2]: *** [all-pg_basebackup-recurse] Error 2
make[2]: *** Waiting for unfinished jobs....

It works with this patch-on-patch:

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a41b73c..18d02f3 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -42,6 +42,7 @@ installdirs:
 uninstall:
    rm -f '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
    rm -f '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
+   rm -f '$(DESTDIR)$(bindir)/pg_receivellog$(X)'
 
 clean distclean maintainer-clean:
-   rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o pg_receivellog.o
+   rm -f pg_basebackup$(X) pg_receivexlog$(X) pg_receivellog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o pg_receivellog.o

It appears to be an omission from file 0015.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Kevin Grittner
kgrittn@ymail.com
In reply to: Kevin Grittner (#6)
Re: changeset generation v5-01 - Patches & git tree

Kevin Grittner <kgrittn@ymail.com> wrote:

 uninstall:

    rm -f '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
    rm -f '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
+   rm -f '$(DESTDIR)$(bindir)/pg_receivellog$(X)'

Oops.  That part is not needed.

 clean distclean maintainer-clean:
-   rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o pg_receivellog.o
+   rm -f pg_basebackup$(X) pg_receivexlog$(X) pg_receivellog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o pg_receivellog.o

Just that part.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#4)
1 attachment(s)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> wrote:

Pushed and attached.

The contrib/test_logical_decoding/sql/ddl.sql script is generating
unexpected results.  For both table_with_pkey and
table_with_unique_not_null, updates of the primary key column are
showing:

old-pkey: id[int4]:0

... instead of the expected value of 2 or -2.

See attached.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

regression.diffsapplication/octet-stream; name=regression.diffsDownload
*** /home/kgrittn/pg/master/contrib/test_logical_decoding/expected/ddl.out	2013-06-22 12:00:44.061466858 -0500
--- /home/kgrittn/pg/master/contrib/test_logical_decoding/results/ddl.out	2013-06-23 10:45:40.129276362 -0500
***************
*** 436,445 ****
   table "table_with_pkey": UPDATE: id[int4]:2 data[int4]:3
   COMMIT
   BEGIN
!  table "table_with_pkey": UPDATE: old-pkey: id[int4]:2 new-tuple: id[int4]:-2 data[int4]:3
   COMMIT
   BEGIN
!  table "table_with_pkey": UPDATE: old-pkey: id[int4]:-2 new-tuple: id[int4]:2 data[int4]:3
   COMMIT
   BEGIN
   table "table_with_pkey": DELETE: id[int4]:2
--- 436,445 ----
   table "table_with_pkey": UPDATE: id[int4]:2 data[int4]:3
   COMMIT
   BEGIN
!  table "table_with_pkey": UPDATE: old-pkey: id[int4]:0 new-tuple: id[int4]:-2 data[int4]:3
   COMMIT
   BEGIN
!  table "table_with_pkey": UPDATE: old-pkey: id[int4]:0 new-tuple: id[int4]:2 data[int4]:3
   COMMIT
   BEGIN
   table "table_with_pkey": DELETE: id[int4]:2
***************
*** 482,491 ****
   table "table_with_unique_not_null": UPDATE: id[int4]:2 data[int4]:3
   COMMIT
   BEGIN
!  table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:2 new-tuple: id[int4]:-2 data[int4]:3
   COMMIT
   BEGIN
!  table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:-2 new-tuple: id[int4]:2 data[int4]:3
   COMMIT
   BEGIN
   table "table_with_unique_not_null": DELETE: id[int4]:2
--- 482,491 ----
   table "table_with_unique_not_null": UPDATE: id[int4]:2 data[int4]:3
   COMMIT
   BEGIN
!  table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:0 new-tuple: id[int4]:-2 data[int4]:3
   COMMIT
   BEGIN
!  table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:0 new-tuple: id[int4]:2 data[int4]:3
   COMMIT
   BEGIN
   table "table_with_unique_not_null": DELETE: id[int4]:2

======================================================================

#9Andres Freund
andres@2ndquadrant.com
In reply to: Kevin Grittner (#6)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-23 08:27:32 -0700, Kevin Grittner wrote:

Kevin Grittner <kgrittn@ymail.com> wrote:

Confirmed that all 17 patch files now apply cleanly, and that `make
check-world` builds cleanly after each patch in turn.

Just to be paranoid, I did one last build with all 17 patch files
applied to 7dfd5cd21c0091e467b16b31a10e20bbedd0a836 using this
line:

make maintainer-clean ; ./configure --prefix=$PWD/Debug --enable-debug --enable-cassert --enable-depend --with-libxml --with-libxslt --with-openssl --with-perl --with-python && make -j4 world

and it died with this:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I../../../src/interfaces/libpq -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o pg_receivexlog.o pg_receivexlog.c -MMD -MP -MF .deps/pg_receivexlog.Po
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I. -I. -I../../../src/interfaces/libpq -I../../../src/bin/pg_dump -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o mainloop.o mainloop.c -MMD -MP -MF .deps/mainloop.Po
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g pg_receivellog.o receivelog.o streamutil.o  -L../../../src/port -lpgport -L../../../src/common -lpgcommon -L../../../src/interfaces/libpq -lpq -L../../../src/port -L../../../src/common -L/usr/lib  -Wl,--as-needed -Wl,-rpath,'/home/kgrittn/pg/master/Debug/lib',--enable-new-dtags  -lpgport -lpgcommon -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt -ldl -lm  -o pg_receivellog
gcc: error: pg_receivellog.o: No such file or directory
make[3]: *** [pg_receivellog] Error 1
make[3]: Leaving directory `/home/kgrittn/pg/master/src/bin/pg_basebackup'
make[2]: *** [all-pg_basebackup-recurse] Error 2
make[2]: *** Waiting for unfinished jobs....

I have seen that once as well. It's really rather strange since
pg_receivellog.o is a clear prerequisite for pg_receivellog. I couldn't
reproduce it reliably though, even after doing some dozen rebuilds or so.

It works with this patch-on-patch:

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a41b73c..18d02f3 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -42,6 +42,7 @@ installdirs:
 uninstall:
    rm -f '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
    rm -f '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
+   rm -f '$(DESTDIR)$(bindir)/pg_receivellog$(X)'
 
 clean distclean maintainer-clean:
-   rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o pg_receivellog.o
+   rm -f pg_basebackup$(X) pg_receivexlog$(X) pg_receivellog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o pg_receivellog.o

It appears to be an omission from file 0015.

Yes, both are missing.

+ rm -f '$(DESTDIR)$(bindir)/pg_receivellog$(X)'

Oops. That part is not needed.

Hm. Why not?

I don't think either hunk has anything to do with that buildfailure
though - can you reproduce the error without?

Thanks,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@2ndquadrant.com
In reply to: Kevin Grittner (#8)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-23 10:32:05 -0700, Kevin Grittner wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

Pushed and attached.

The contrib/test_logical_decoding/sql/ddl.sql script is generating
unexpected results.  For both table_with_pkey and
table_with_unique_not_null, updates of the primary key column are
showing:

old-pkey: id[int4]:0

... instead of the expected value of 2 or -2.

See attached.

Hm. Any chance this was an incomplete rebuild? I seem to remember having
seen that once because some header dependency wasn't recognized
correctly after applying some patch.

Otherwise, could you give me:
* the version you aplied the patch on
* os/compiler

Because I can't reproduce it, despite some playing around...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-06-23 08:27:32 -0700, Kevin Grittner wrote:

gcc: error: pg_receivellog.o: No such file or directory
make[3]: *** [pg_receivellog] Error 1

I have seen that once as well. It's really rather strange since
pg_receivellog.o is a clear prerequisite for pg_receivellog. I couldn't
reproduce it reliably though, even after doing some dozen rebuilds or so.

What versions of gmake are you guys using? It wouldn't be the first
time we've tripped over bugs in parallel make. See for instance
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1fc698cf14d17a3a8ad018cf9ec100198a339447

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#11)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-23 16:48:41 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-06-23 08:27:32 -0700, Kevin Grittner wrote:

gcc: error: pg_receivellog.o: No such file or directory
make[3]: *** [pg_receivellog] Error 1

I have seen that once as well. It's really rather strange since
pg_receivellog.o is a clear prerequisite for pg_receivellog. I couldn't
reproduce it reliably though, even after doing some dozen rebuilds or so.

What versions of gmake are you guys using? It wouldn't be the first
time we've tripped over bugs in parallel make. See for instance
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1fc698cf14d17a3a8ad018cf9ec100198a339447

3.81 here. That was supposed to be the "safe" one, right? At least to
the bugs seen/fixed recently.

Kevin, any chance you still have more log than in the upthread mail
available?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#9)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-06-23 08:27:32 -0700, Kevin Grittner wrote:

make maintainer-clean ; ./configure --prefix=$PWD/Debug --enable-debug
--enable-cassert --enable-depend --with-libxml --with-libxslt --with-openssl
--with-perl --with-python && make -j4 world

[ build failure referencing pg_receivellog.o ]

I have seen that once as well. It's really rather strange since
pg_receivellog.o is a clear prerequisite for pg_receivellog. I couldn't
reproduce it reliably though, even after doing some dozen rebuilds or so.

It works with this patch-on-patch:

  clean distclean maintainer-clean:
-   rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o
pg_receivexlog.o pg_receivellog.o
+   rm -f pg_basebackup$(X) pg_receivexlog$(X) pg_receivellog$(X) $(OBJS)
pg_basebackup.o pg_receivexlog.o pg_receivellog.o

+  rm -f '$(DESTDIR)$(bindir)/pg_receivellog$(X)'

Oops.  That part is not needed.

Hm. Why not?

Well, I could easily be wrong on just about anything to do with
make files, but on a second look that appeared to be dealing with
eliminating an installed pg_receivellog binary, which is not
created.

I don't think either hunk has anything to do with that buildfailure
though - can you reproduce the error without?

I tried that scenario three times and it failed three times.  Then
I made the above changes and it worked.  Then I eliminated the one
on the uninstall target and tried a couple more times and it worked
on both attempts.  The scenario is to have a `make world` build in
the source tree, and run the above line starting with `make
maintainer-clean` and going to `make -j4 world`.

I did notice that without that change to the maintainer-clean
target I did not get a pg_receivellog.Po file in
src/bin/pg_basebackup/.deps/ -- and with it I do.  I admit to being
at about a 1.5 on a 10 point scale of make file competence -- I
just look for patterns used for things similar to what I want to do
and copy without much understanding of what it all means.  :-(  So
when I got an error on pg_receivellog which didn't happen on
pg_receivexlog, I looked for differences -- my suggestion has no
more basis than that and the fact that empirical testing seemed to
show that it worked.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#12)
1 attachment(s)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-06-23 16:48:41 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-06-23 08:27:32 -0700, Kevin Grittner wrote:

gcc: error: pg_receivellog.o: No such file or directory
make[3]: *** [pg_receivellog] Error 1

I have seen that once as well. It's really rather strange since
pg_receivellog.o is a clear prerequisite for pg_receivellog. I
couldn't reproduce it reliably though, even after doing some
dozen rebuilds or so.

What versions of gmake are you guys using?  It wouldn't be the
first time we've tripped over bugs in parallel make.  See for
instance
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1fc698cf14d17a3a8ad018cf9ec100198a339447

3.81 here. That was supposed to be the "safe" one, right? At
least to the bugs seen/fixed recently.

There is no executable named gmake in my distro, but...

kgrittn@Kevin-Desktop:~/pg/master$ make --version
GNU Make 3.81

Which is what I'm using.

Kevin, any chance you still have more log than in the upthread
mail available?

Well, I just copied from the console, and that was gone; but
reverting my change I get the same thing.  All console output
attached.  Let me know if you need something else.

Note that the dependency file disappeared:

kgrittn@Kevin-Desktop:~/pg/master$ ll src/bin/pg_basebackup/.deps/
total 24
drwxrwxr-x 2 kgrittn kgrittn 4096 Jun 24 08:57 ./
drwxrwxr-x 4 kgrittn kgrittn 4096 Jun 24 08:57 ../
-rw-rw-r-- 1 kgrittn kgrittn 1298 Jun 24 08:57 pg_basebackup.Po
-rw-rw-r-- 1 kgrittn kgrittn 1729 Jun 24 08:57 pg_receivexlog.Po
-rw-rw-r-- 1 kgrittn kgrittn 1646 Jun 24 08:57 receivelog.Po
-rw-rw-r-- 1 kgrittn kgrittn  953 Jun 24 08:57 streamutil.Po

It was there from the build with the change I made to the
maintainer-clean target, and went away when I built without it.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

pg_receivellog-build-failure.txt.gzapplication/x-gzip; name=pg_receivellog-build-failure.txt.gzDownload
#15Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#10)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-06-23 10:32:05 -0700, Kevin Grittner wrote:

The contrib/test_logical_decoding/sql/ddl.sql script is generating
unexpected results.  For both table_with_pkey and
table_with_unique_not_null, updates of the primary key column are
showing:

old-pkey: id[int4]:0

... instead of the expected value of 2 or -2.

See attached.

Hm. Any chance this was an incomplete rebuild?

With my hack on the pg_basebackup Makefile, `make -j4 world` is
finishing with no errors and:

PostgreSQL, contrib, and documentation successfully made. Ready to install.

I seem to remember having seen that once because some header
dependency wasn't recognized correctly after applying some patch.

I wonder whether this is related to the build problems we've been
discussing on the other fork of this thread.  I was surprised to
see this error when I got past the maintainer-clean full build
problems, because I thought I had seen clean `make check-world`
regression tests after applying each incremental patch file.  Until
I read this I had been assuming that somehow I missed the error on
the 16th and 17th iterations; but now I'm suspecting that I didn't
miss anything after all -- it may just be another symptom of the
build problems.

Otherwise, could you give me:
* the version you aplied the patch on

7dfd5cd21c0091e467b16b31a10e20bbedd0a836

* os/compiler

Linux Kevin-Desktop 3.5.0-34-generic #55-Ubuntu SMP Thu Jun 6 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2

Because I can't reproduce it, despite some playing around...

Maybe if you can reproduce the build problems I'm seeing....

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@2ndquadrant.com
In reply to: Kevin Grittner (#13)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-24 06:44:53 -0700, Kevin Grittner wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-06-23 08:27:32 -0700, Kevin Grittner wrote:

make maintainer-clean ; ./configure --prefix=$PWD/Debug --enable-debug
--enable-cassert --enable-depend --with-libxml --with-libxslt --with-openssl
--with-perl --with-python && make -j4 world

[ build failure referencing pg_receivellog.o ]

I have seen that once as well. It's really rather strange since
pg_receivellog.o is a clear prerequisite for pg_receivellog. I couldn't
reproduce it reliably though, even after doing some dozen rebuilds or so.

It works with this patch-on-patch:

  clean distclean maintainer-clean:
-   rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o
pg_receivexlog.o pg_receivellog.o
+   rm -f pg_basebackup$(X) pg_receivexlog$(X) pg_receivellog$(X) $(OBJS)
pg_basebackup.o pg_receivexlog.o pg_receivellog.o

+  rm -f '$(DESTDIR)$(bindir)/pg_receivellog$(X)'

Oops.  That part is not needed.

Hm. Why not?

Well, I could easily be wrong on just about anything to do with
make files, but on a second look that appeared to be dealing with
eliminating an installed pg_receivellog binary, which is not
created.

I think it actually is?

install: all installdirs
$(INSTALL_PROGRAM) pg_basebackup$(X) '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
$(INSTALL_PROGRAM) pg_receivexlog$(X) '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
$(INSTALL_PROGRAM) pg_receivellog$(X) '$(DESTDIR)$(bindir)/pg_receivellog$(X)'

I don't think either hunk has anything to do with that buildfailure
though - can you reproduce the error without?

I tried that scenario three times and it failed three times.  Then
I made the above changes and it worked.  Then I eliminated the one
on the uninstall target and tried a couple more times and it worked
on both attempts.  The scenario is to have a `make world` build in
the source tree, and run the above line starting with `make
maintainer-clean` and going to `make -j4 world`.

Hm. I think it might be something in makes intermediate target logic
biting us. Anyway, if the patch fixes that: Great ;). Merged it logally
since it's obviously missing.

I did notice that without that change to the maintainer-clean
target I did not get a pg_receivellog.Po file in
src/bin/pg_basebackup/.deps/ -- and with it I do.

Yea, according to your log it's not even built before pg_receivellog is
linked.

Thanks,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#16)
1 attachment(s)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-06-24 06:44:53 -0700, Kevin Grittner wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-06-23 08:27:32 -0700, Kevin Grittner wrote:

+  rm -f '$(DESTDIR)$(bindir)/pg_receivellog$(X)'

Oops.  That part is not needed.

Hm. Why not?

Well, I could easily be wrong on just about anything to do with
make files, but on a second look that appeared to be dealing with
eliminating an installed pg_receivellog binary, which is not
created.

I think it actually is?

Oh, yeah....  I see it now.  I warned you I could be wrong.  :-/

I just had a thought thought -- perhaps the dependency information
is being calculated incorrectly.  Attached is the dependency file
from the successful build (with the adjusted Makefile), which still
fails the test_logical_decoding regression test, with the same diff.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

pg_receivellog.Potext/x-gettext-translation; name=pg_receivellog.PoDownload
#18Andres Freund
andres@2ndquadrant.com
In reply to: Kevin Grittner (#15)
2 attachment(s)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-24 07:29:43 -0700, Kevin Grittner wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-06-23 10:32:05 -0700, Kevin Grittner wrote:

The contrib/test_logical_decoding/sql/ddl.sql script is generating
unexpected results.  For both table_with_pkey and
table_with_unique_not_null, updates of the primary key column are
showing:

old-pkey: id[int4]:0

... instead of the expected value of 2 or -2.

See attached.

Hm. Any chance this was an incomplete rebuild?

With my hack on the pg_basebackup Makefile, `make -j4 world` is
finishing with no errors and:

Hm. There were some issues with the test_logical_decoding Makefile not
cleaning up the regression installation properly. Which might have
caused the issue.

Could you try after applying the patches and executing a clean and then
rebuild?

Otherwise, could you try applying my git tree so we are sure we test the
same thing?

$ git remote add af git://git.postgresql.org/git/users/andresfreund/postgres.git
$ git fetch af
$ git checkout -b xlog-decoding af/xlog-decoding-rebasing-cf4
$ ./configure ...
$ make

Because I can't reproduce it, despite some playing around...

Maybe if you can reproduce the build problems I'm seeing....

Tried your recipe but still couldn't...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-wal_decoding-mergme-Fix-pg_basebackup-makefile.patchtext/x-patch; charset=us-asciiDownload
>From cdd0ed46ab75768f8a2e82394b04e6392d8ed32a Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 24 Jun 2013 11:52:23 +0200
Subject: [PATCH 1/2] wal_decoding: mergme: Fix pg_basebackup makefile

---
 src/bin/pg_basebackup/Makefile | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a41b73c..c251249 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -42,6 +42,9 @@ installdirs:
 uninstall:
 	rm -f '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
 	rm -f '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
+	rm -f '$(DESTDIR)$(bindir)/pg_receivellog$(X)'
 
 clean distclean maintainer-clean:
-	rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o pg_receivellog.o
+	rm -f pg_basebackup$(X) pg_receivexlog$(X) pg_receivellog$(X) \
+		pg_basebackup.o pg_receivexlog.o pg_receivellog.o \
+		$(OBJS)
-- 
1.8.2.rc2.4.g7799588.dirty

0002-wal_decoding-mergme-Fix-test_logical_decoding-Makefi.patchtext/x-patch; charset=us-asciiDownload
>From 022c2da1873de2fbc93ae524819932719ca41bdb Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 24 Jun 2013 16:47:48 +0200
Subject: [PATCH 2/2] wal_decoding: mergme: Fix test_logical_decoding Makefile

---
 contrib/test_logical_decoding/Makefile | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/contrib/test_logical_decoding/Makefile b/contrib/test_logical_decoding/Makefile
index 0e7d5d3..3850d44 100644
--- a/contrib/test_logical_decoding/Makefile
+++ b/contrib/test_logical_decoding/Makefile
@@ -4,18 +4,14 @@ OBJS = test_logical_decoding.o
 EXTENSION = test_logical_decoding
 DATA = test_logical_decoding--1.0.sql
 
-ifdef USE_PGXS
-PG_CONFIG = pg_config
-PGXS := $(shell $(PG_CONFIG) --pgxs)
-include $(PGXS)
-else
+# Note: because we don't tell the Makefile there are any regression tests,
+# we have to clean those result files explicitly
+EXTRA_CLEAN = -r $(pg_regress_clean_files)
+
 subdir = contrib/test_logical_decoding
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
-endif
-
-test_logical_decoding.o: test_logical_decoding.c
 
 # Disabled because these tests require "wal_level=logical", which
 # typical installcheck users do not have (e.g. buildfarm clients).
-- 
1.8.2.rc2.4.g7799588.dirty

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#4)
Re: changeset generation v5-01 - Patches & git tree

I'm looking at the combined patches 0003-0005, which are essentially all
about adding a function to obtain relation OID from (tablespace,
filenode). It takes care to look through the relation mapper, and uses
a new syscache underneath for performance.

One question about this patch, originally, was about the usage of
that relfilenode syscache. It is questionable because it would be the
only syscache to apply on top of a non-unique index. It is said that
this doesn't matter because the only non-unique values that can exist
would reference entries that have relfilenode = 0; and in turn this
doesn't matter because those values would be queried through the
relation mapper anyway, not from the syscache. (This is implemented in
the higher-level function.)

This means that there would be one syscache that is damn easy to misuse
.. and we've setup things so that syscaches are very easy to use in the
first place. From that perspective, this doesn't look good. However,
it's an easy mistake to notice and fix, so perhaps this is not a serious
problem. (I would much prefer for there to be a way to define partial
indexes in BKI.)

I'm not sure about the placing of the new SQL-callable function in
dbsize.c either. It is certainly not a function that has anything to do
with object sizes. The insides of it would belong more in lsyscache.c,
I think, except then that file does not otherwise concern itself with
the relation mapper so its scope would have to expand a bit. But this
is no place for the SQL-callable portion, so that would have to find a
different home as well.

The other option, of course, it to provide a separate caching layer for
these objects altogether, but given how concise this implementation is,
it doesn't sound too palatable.

Thoughts?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@2ndquadrant.com
In reply to: Alvaro Herrera (#19)
Re: changeset generation v5-01 - Patches & git tree

Hi,

On 2013-06-27 17:33:04 -0400, Alvaro Herrera wrote:

One question about this patch, originally, was about the usage of
that relfilenode syscache. It is questionable because it would be the
only syscache to apply on top of a non-unique index. It is said that
this doesn't matter because the only non-unique values that can exist
would reference entries that have relfilenode = 0; and in turn this
doesn't matter because those values would be queried through the
relation mapper anyway, not from the syscache. (This is implemented in
the higher-level function.)

Well, you can even query the syscache without hurt for mapped relations,
you just won't get an answer. The only thing you may not do because it
would yield multiple results is to query the syscache with
(tablespace, InvalidOid/0). Which is still not nice although it doesn't
make much sense to query with InvalidOid.

I'm not sure about the placing of the new SQL-callable function in
dbsize.c either. It is certainly not a function that has anything to do
with object sizes.

Not happy with that myself. I only placed the function there because
pg_relation_filenode() already was in it. Happy to change if somebody
has a good idea.

(I would much prefer for there to be a way to define partial
indexes in BKI.)

I don't think that's the hard part, it's that we don't use the full
machinery for updating indexes but rather the relatively simplistic
CatalogUpdateIndexes(). I am not sure we can guarantee that the required
infrastructure is available in all the cases to support doing generic
predicate evaluation.

Should bki really be the problem we probably could create the index
after bki based bootstrapping finished.

Thanks,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#20)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund wrote:

On 2013-06-27 17:33:04 -0400, Alvaro Herrera wrote:

One question about this patch, originally, was about the usage of
that relfilenode syscache. It is questionable because it would be the
only syscache to apply on top of a non-unique index. It is said that
this doesn't matter because the only non-unique values that can exist
would reference entries that have relfilenode = 0; and in turn this
doesn't matter because those values would be queried through the
relation mapper anyway, not from the syscache. (This is implemented in
the higher-level function.)

Well, you can even query the syscache without hurt for mapped relations,
you just won't get an answer. The only thing you may not do because it
would yield multiple results is to query the syscache with
(tablespace, InvalidOid/0). Which is still not nice although it doesn't
make much sense to query with InvalidOid.

Yeah, I agree that it doesn't make sense to query for that. The problem
is that something could reasonably be developed that uses the syscache
directly without checking whether the relfilenode is 0.

(I would much prefer for there to be a way to define partial
indexes in BKI.)

I don't think that's the hard part, it's that we don't use the full
machinery for updating indexes but rather the relatively simplistic
CatalogUpdateIndexes(). I am not sure we can guarantee that the required
infrastructure is available in all the cases to support doing generic
predicate evaluation.

You're right, CatalogIndexInsert() doesn't allow for predicates, so
fixing BKI would not help.

I still wonder about having a separate cache. Right now pg_class has
two indexes; adding this new one would mean a rather large decrease in
insert performance (50% more indexes to update than previously), which
is not good considering that it's inserted into for each and every temp
table creation -- that would become slower. This would be a net loss
for every user, even those that don't want logical replication. On the
other hand, table creation also has to add tuples to pg_attribute,
pg_depend, pg_shdepend and maybe other catalogs, so perhaps the
difference is negligible.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#19)
Re: changeset generation v5-01 - Patches & git tree

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I'm looking at the combined patches 0003-0005, which are essentially all
about adding a function to obtain relation OID from (tablespace,
filenode). It takes care to look through the relation mapper, and uses
a new syscache underneath for performance.

One question about this patch, originally, was about the usage of
that relfilenode syscache. It is questionable because it would be the
only syscache to apply on top of a non-unique index.

... which, I assume, is on top of a pg_class index that doesn't exist
today. Exactly what is the argument that says performance of this
function is sufficiently critical to justify adding both the maintenance
overhead of a new pg_class index, *and* a broken-by-design syscache?

Lose the cache and this probably gets a lot easier to justify. As is,
I think I'd vote to reject altogether.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#22)
Re: changeset generation v5-01 - Patches & git tree

On Thu, Jun 27, 2013 at 6:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I'm looking at the combined patches 0003-0005, which are essentially all
about adding a function to obtain relation OID from (tablespace,
filenode). It takes care to look through the relation mapper, and uses
a new syscache underneath for performance.

One question about this patch, originally, was about the usage of
that relfilenode syscache. It is questionable because it would be the
only syscache to apply on top of a non-unique index.

... which, I assume, is on top of a pg_class index that doesn't exist
today. Exactly what is the argument that says performance of this
function is sufficiently critical to justify adding both the maintenance
overhead of a new pg_class index, *and* a broken-by-design syscache?

Lose the cache and this probably gets a lot easier to justify. As is,
I think I'd vote to reject altogether.

I already voted that way, and nothing's happened since to change my mind.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#18)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> wrote:

Hm. There were some issues with the test_logical_decoding
Makefile not cleaning up the regression installation properly.
Which might have caused the issue.

Could you try after applying the patches and executing a clean
and then rebuild?

Tried, and problem persists.

Otherwise, could you try applying my git tree so we are sure we
test the same thing?

$ git remote add af git://git.postgresql.org/git/users/andresfreund/postgres.git
$ git fetch af
$ git checkout -b xlog-decoding af/xlog-decoding-rebasing-cf4
$ ./configure ...
$ make

Tried that, too, and problem persists.  The log shows the last
commit on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

Because you mention possible problems with the regression test
cleanup for test_logical_decoding I also tried:

rm -fr contrib/test_logical_decoding/
git reset --hard HEAD
make world
make check-world

I get the same failure, with primary key or unique index column
showing as 0 in results.

I am off on vacation tomorrow and next week.  Will dig into this
with gdb if not solved when I get back -- unless you have a better
suggestion for how to figure it out.

Once this is solved, I will be working with testing the final
result of all these layers, including creating a second output
plugin.  I want to confirm that multiple plugins play well
together.  I'm glad to see other eyes also on this patch set.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#22)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-27 18:18:50 -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I'm looking at the combined patches 0003-0005, which are essentially all
about adding a function to obtain relation OID from (tablespace,
filenode). It takes care to look through the relation mapper, and uses
a new syscache underneath for performance.

One question about this patch, originally, was about the usage of
that relfilenode syscache. It is questionable because it would be the
only syscache to apply on top of a non-unique index.

... which, I assume, is on top of a pg_class index that doesn't exist
today. Exactly what is the argument that says performance of this
function is sufficiently critical to justify adding both the maintenance
overhead of a new pg_class index, *and* a broken-by-design syscache?

Ok, so this requires some context. When we do the changeset extraction
we build a mvcc snapshot that for every heap wal record is consistent
with one made at the time the record has been inserted. Then, when we've
built that snapshot, we can use it to turn heap wal records into the
representation the user wants:

For that we first need to know which table a change comes from, since
otherwise we obviously cannot interpret the HeapTuple that's essentially
contained in the wal record without it. Since we have a correct mvcc
snapshot we can query pg_class for (tablespace, relfilenode) to get back
the relation. When we know the relation, the user (i.e. the output
pluggin) can use normal backend code to transform the HeapTuple into the
target representation, e.g. SQL, since we can build a TupleDesc. Since
the syscaches are synchronized with the built snapshot normal output
functions can be used.

What that means is that for every heap record in the target database in
the WAL we need to query pg_class to turn the relfilenode into a
pg_class.oid. So, we can easily replace syscache.c with some custom
caching code, but I don't think it's realistic to get rid of that
index. Otherwise we need to cache the entire pg_class in memory which
doesn't sound enticing.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#25)
Re: changeset generation v5-01 - Patches & git tree

On Fri, Jun 28, 2013 at 3:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:

What that means is that for every heap record in the target database in
the WAL we need to query pg_class to turn the relfilenode into a
pg_class.oid. So, we can easily replace syscache.c with some custom
caching code, but I don't think it's realistic to get rid of that
index. Otherwise we need to cache the entire pg_class in memory which
doesn't sound enticing.

The alternative I previously proposed was to make the WAL records
carry the relation OID. There are a few problems with that: one is
that it's a waste of space when logical replication is turned off, and
it might not be easy to only do it when logical replication is on.
Also, even when logic replication is turned on, things that make WAL
bigger aren't wonderful. On the other hand, it does avoid the
overhead of another index on pg_class.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#26)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-28 08:41:46 -0400, Robert Haas wrote:

On Fri, Jun 28, 2013 at 3:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:

What that means is that for every heap record in the target database in
the WAL we need to query pg_class to turn the relfilenode into a
pg_class.oid. So, we can easily replace syscache.c with some custom
caching code, but I don't think it's realistic to get rid of that
index. Otherwise we need to cache the entire pg_class in memory which
doesn't sound enticing.

The alternative I previously proposed was to make the WAL records
carry the relation OID. There are a few problems with that: one is
that it's a waste of space when logical replication is turned off, and
it might not be easy to only do it when logical replication is on.
Also, even when logic replication is turned on, things that make WAL
bigger aren't wonderful. On the other hand, it does avoid the
overhead of another index on pg_class.

I personally favor making catalog modifications a bit more more
expensive instead of increasing the WAL volume during routine
operations. I don't think index maintenance itself comes close to the
biggest cost for DDL we have atm.
It also increases the modifications needed to imporantant heap_*
functions which doesn't make me happy.

How do others see this tradeoff?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Andres Freund
andres@2ndquadrant.com
In reply to: Kevin Grittner (#24)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-27 21:52:03 -0700, Kevin Grittner wrote:

Tried that, too, and problem persists.� The log shows the last
commit on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

I get the same failure, with primary key or unique index column
showing as 0 in results.

I have run enough iterations of the test suite locally now that I am
confident it's not just happenstance that I don't see this :/. I am
going to clone your environment as closely as I can to see where the
issue might be as well as going over those codepaths...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Peter Eisentraut
peter_e@gmx.net
In reply to: Andres Freund (#27)
Re: changeset generation v5-01 - Patches & git tree

On 6/28/13 8:46 AM, Andres Freund wrote:

I personally favor making catalog modifications a bit more more
expensive instead of increasing the WAL volume during routine
operations. I don't think index maintenance itself comes close to the
biggest cost for DDL we have atm.

That makes sense to me in principle.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#27)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-06-28 08:41:46 -0400, Robert Haas wrote:

The alternative I previously proposed was to make the WAL records
carry the relation OID. There are a few problems with that: one is
that it's a waste of space when logical replication is turned off, and
it might not be easy to only do it when logical replication is on.
Also, even when logic replication is turned on, things that make WAL
bigger aren't wonderful. On the other hand, it does avoid the
overhead of another index on pg_class.

I personally favor making catalog modifications a bit more more
expensive instead of increasing the WAL volume during routine
operations.

This argument is nonsense, since it conveniently ignores the added WAL
entries created as a result of additional pg_class index manipulations.

Robert's idea sounds fairly reasonable to me; another 4 bytes per
insert/update/delete WAL entry isn't that big a deal, and it would
probably ease many debugging tasks as well as what you want to do.
So I'd vote for including the rel OID all the time, not conditionally.

The real performance argument against the patch as you have it is that
it saddles every PG installation with extra overhead for pg_class
updates whether or not that installation ever has or ever will make use
of changeset generation --- unlike including rel OIDs in WAL entries,
which might be merely difficult to handle conditionally, it's flat-out
impossible to turn such an index on or off. Moreover, even if one is
using changeset generation, the overhead is being imposed at the wrong
place, ie the master not the slave doing changeset extraction.

But that's not the only problem, nor even the worst one IMO. I said
before that a syscache with a non-unique key is broken by design, and
I stand by that estimate. Even assuming that this usage doesn't create
bugs in the code as it stands, it might well foreclose future changes or
optimizations that we'd like to make in the catcache code.

If you don't want to change WAL contents, what I think you should do
is create a new cache mechanism (perhaps by extending the relmapper)
that caches relfilenode to OID lookups and acts entirely inside the
changeset-generating slave. Hacking up the catcache instead of doing
that is an expedient kluge that will come back to bite us.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#30)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-28 10:49:26 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-06-28 08:41:46 -0400, Robert Haas wrote:

The alternative I previously proposed was to make the WAL records
carry the relation OID. There are a few problems with that: one is
that it's a waste of space when logical replication is turned off, and
it might not be easy to only do it when logical replication is on.
Also, even when logic replication is turned on, things that make WAL
bigger aren't wonderful. On the other hand, it does avoid the
overhead of another index on pg_class.

I personally favor making catalog modifications a bit more more
expensive instead of increasing the WAL volume during routine
operations.

This argument is nonsense, since it conveniently ignores the added WAL
entries created as a result of additional pg_class index manipulations.

Huh? Sure, pg_class manipulations get more expensive. But in most
clusters pg_class modifications are by far the minority compared to the
rest of the updates performed.

Robert's idea sounds fairly reasonable to me; another 4 bytes per
insert/update/delete WAL entry isn't that big a deal, and it would
probably ease many debugging tasks as well as what you want to do.
So I'd vote for including the rel OID all the time, not conditionally.

Ok, I can sure live with that. I don't think it's a problem to make it
conditionally if we want to. Making it unconditional would sure make WAL
debugging in general more pleasant though.

The real performance argument against the patch as you have it is that
it saddles every PG installation with extra overhead for pg_class
updates whether or not that installation ever has or ever will make use
of changeset generation --- unlike including rel OIDs in WAL entries,
which might be merely difficult to handle conditionally, it's flat-out
impossible to turn such an index on or off. Moreover, even if one is
using changeset generation, the overhead is being imposed at the wrong
place, ie the master not the slave doing changeset extraction.

There's no required slaves for doing changeset extraction
anymore. Various people opposed that pretty violently, so it's now all
happening on the master. Which IMHO turned out to be the right decision.

We can do it on Hot Standby nodes, but its absolutely not required.

But that's not the only problem, nor even the worst one IMO. I said
before that a syscache with a non-unique key is broken by design, and
I stand by that estimate. Even assuming that this usage doesn't create
bugs in the code as it stands, it might well foreclose future changes or
optimizations that we'd like to make in the catcache code.

Since the only duplicate key that possibly can occur in that cache is
InvalidOid, I wondered whether we could define a 'filter' that prohibits
those ending up in the cache? Then the cache would be unique.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#30)
Re: changeset generation v5-01 - Patches & git tree

On Fri, Jun 28, 2013 at 10:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert's idea sounds fairly reasonable to me; another 4 bytes per
insert/update/delete WAL entry isn't that big a deal, ...

How big a deal is it? This is a serious question, because I don't
know. Let's suppose that the average size of an XLOG_HEAP_INSERT
record is 100 bytes. Then if we add 4 bytes, isn't that a 4%
overhead? And doesn't that seem significant?

I'm just talking out of my rear end here because I don't know what the
real numbers are, but it's far from obvious to me that there's any
free lunch here. That having been said, just because indexing
relfilenode or adding relfilenodes to WAL records is expensive doesn't
mean we shouldn't do it. But I think we need to know the price tag
before we can judge whether to make the purchase.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#32)
Re: changeset generation v5-01 - Patches & git tree

Robert Haas escribi�:

On Fri, Jun 28, 2013 at 10:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert's idea sounds fairly reasonable to me; another 4 bytes per
insert/update/delete WAL entry isn't that big a deal, ...

How big a deal is it? This is a serious question, because I don't
know. Let's suppose that the average size of an XLOG_HEAP_INSERT
record is 100 bytes. Then if we add 4 bytes, isn't that a 4%
overhead? And doesn't that seem significant?

An INSERT wal record is:

typedef struct xl_heap_insert
{
xl_heaptid target; /* inserted tuple id */
bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
/* xl_heap_header & TUPLE DATA FOLLOWS AT END OF STRUCT */
} xl_heap_insert;

typedef struct xl_heap_header
{
uint16 t_infomask2;
uint16 t_infomask;
uint8 t_hoff;
} xl_heap_header;

So the fixed part is just 7 bytes + 5 bytes; tuple data follows that.
So adding four more bytes could indeed be significant (but by how much,
depends on the size of the tuple data). Adding a new pg_class index
would be larger in the sense that there are more WAL records, and
there's the extra vacuuming traffic; but on the other hand that would
only happen when tables are created. It seems safe to assume that in
normal use cases the ratio of tuple insertion vs. table creation is
large.

The only idea that springs to mind is to have the new pg_class index be
created conditionally, i.e. only when logical replication is going to be
used.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#33)
Re: changeset generation v5-01 - Patches & git tree

Alvaro Herrera escribi�:

An INSERT wal record is:

typedef struct xl_heap_insert
{
xl_heaptid target; /* inserted tuple id */
bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
/* xl_heap_header & TUPLE DATA FOLLOWS AT END OF STRUCT */
} xl_heap_insert;

Oops. xl_heaptid is not 6 bytes, but instead:

typedef struct xl_heaptid
{
RelFileNode node;
ItemPointerData tid;
} xl_heaptid;

typedef struct RelFileNode
{
Oid spcNode;
Oid dbNode;
Oid relNode;
} RelFileNode; /* 12 bytes */

typedef struct ItemPointerData
{
BlockIdData ip_blkid;
OffsetNumber ip_posid;
}; /* 6 bytes */

typedef struct BlockIdData
{
uint16 bi_hi;
uint16 bi_lo;
} BlockIdData; /* 4 bytes */

typedef uint16 OffsetNumber;

There's purposely no alignment padding anywhere, so xl_heaptid totals 22 bytes.

Therefore,

So the fixed part is just 22 bytes + 5 bytes; tuple data follows that.
So adding four more bytes could indeed be significant (but by how much,
depends on the size of the tuple data).

4 extra bytes on top of 27 is 14% of added overhead (considering only
the xlog header.)

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#32)
Re: changeset generation v5-01 - Patches & git tree

Robert Haas <robertmhaas@gmail.com> writes:

I'm just talking out of my rear end here because I don't know what the
real numbers are, but it's far from obvious to me that there's any
free lunch here. That having been said, just because indexing
relfilenode or adding relfilenodes to WAL records is expensive doesn't
mean we shouldn't do it. But I think we need to know the price tag
before we can judge whether to make the purchase.

Certainly, any of these solutions are going to cost us somewhere ---
either up-front cost or more expensive (and less reliable?) changeset
extraction, take your choice. I will note that somehow tablespaces got
put in despite having to add 4 bytes to every WAL record for that
feature, which was probably of less use than logical changeset
extraction will be.

But to tell the truth, I'm mostly exercised about the non-unique
syscache. I think that's simply a *bad* idea.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#35)
Re: changeset generation v5-01 - Patches & git tree

On Fri, Jun 28, 2013 at 11:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm just talking out of my rear end here because I don't know what the
real numbers are, but it's far from obvious to me that there's any
free lunch here. That having been said, just because indexing
relfilenode or adding relfilenodes to WAL records is expensive doesn't
mean we shouldn't do it. But I think we need to know the price tag
before we can judge whether to make the purchase.

Certainly, any of these solutions are going to cost us somewhere ---
either up-front cost or more expensive (and less reliable?) changeset
extraction, take your choice. I will note that somehow tablespaces got
put in despite having to add 4 bytes to every WAL record for that
feature, which was probably of less use than logical changeset
extraction will be.

Right. I actually think we booted that one. The database ID is a
constant for most people. The tablespace ID is not technically
redundant, but in 99.99% of cases you could figure it out from the
database ID + relation ID. The relation ID is where 99% of the
entropy is, but it probably only has 8-16 bits of entropy in most
real-world use cases. If we were doing this over we might want to
think about storing a proxy for the relfilenode rather than the
relfilenode itself, but there's not much good crying over it now.

But to tell the truth, I'm mostly exercised about the non-unique
syscache. I think that's simply a *bad* idea.

+1.

I don't think the extra index on pg_class is going to hurt that much,
even if we create it always, as long as we use a purpose-built caching
mechanism for it rather than forcing it through catcache. The people
who are going to suffer are the ones who create and drop a lot of
temporary tables, but even there I'm not sure how visible the overhead
will be on real-world workloads, and maybe the solution is to work
towards not having permanent catalog entries for temporary tables in
the first place. In any case, hurting people who use temporary tables
heavily seems better than adding overhead to every
insert/update/delete operation, which will hit all users who are not
read-only.

On the other hand, I can't entirely shake the feeling that adding the
information into WAL would be more reliable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#36)
Re: changeset generation v5-01 - Patches & git tree

Robert Haas <robertmhaas@gmail.com> writes:

On the other hand, I can't entirely shake the feeling that adding the
information into WAL would be more reliable.

That feeling has been nagging at me too. I can't demonstrate that
there's a problem when an ALTER TABLE is in process of rewriting a table
into a new relfilenode number, but I don't have a warm fuzzy feeling
about the reliability of reverse lookups for this. At the very least
it's going to require some hard-to-verify restriction about how we
can't start doing changeset reconstruction in the middle of a
transaction that's doing DDL.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#36)
Re: changeset generation v5-01 - Patches & git tree

On 28 June 2013 17:10, Robert Haas <robertmhaas@gmail.com> wrote:

But to tell the truth, I'm mostly exercised about the non-unique
syscache. I think that's simply a *bad* idea.

+1.

I don't think the extra index on pg_class is going to hurt that much,
even if we create it always, as long as we use a purpose-built caching
mechanism for it rather than forcing it through catcache.

Hmm, does seem like that would be better.

The people
who are going to suffer are the ones who create and drop a lot of
temporary tables, but even there I'm not sure how visible the overhead
will be on real-world workloads, and maybe the solution is to work
towards not having permanent catalog entries for temporary tables in
the first place. In any case, hurting people who use temporary tables
heavily seems better than adding overhead to every
insert/update/delete operation, which will hit all users who are not
read-only.

Thinks...

If we added a trigger that fired a NOTIFY for any new rows in pg_class that
relate to non-temporary relations that would optimise away any overhead for
temporary tables or when no changeset extraction was in progress.

The changeset extraction could build a private hash table to perform the
lookup and then LISTEN on a specific channel for changes.

That might work better than an index-plus-syscache.

On the other hand, I can't entirely shake the feeling that adding the
information into WAL would be more reliable.

I don't really like the idea of requiring the relid on the WAL record. WAL
is big enough already and we want people to turn this on, not avoid it.

This is just an index lookup. We do them all the time without any fear of
reliability issues.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#39Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#37)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-28 12:26:52 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On the other hand, I can't entirely shake the feeling that adding the
information into WAL would be more reliable.

That feeling has been nagging at me too. I can't demonstrate that
there's a problem when an ALTER TABLE is in process of rewriting a table
into a new relfilenode number, but I don't have a warm fuzzy feeling
about the reliability of reverse lookups for this.

I am pretty sure the mapping thing works, but it indeed requires some
complexity. And it's harder to debug because when you want to understand
what's going on the relfilenodes involved aren't in the catalog anymore.

At the very least
it's going to require some hard-to-verify restriction about how we
can't start doing changeset reconstruction in the middle of a
transaction that's doing DDL.

Currently changeset extraction needs to wait (and does so) till it found
a point where it has seen the start of all in-progress transactions. All
transaction that *commit* after the last partiall observed in-progress
transaction finished can be decoded.
To make that point visible for external tools to synchronize -
e.g. pg_dump - it exports the snapshot of exactly the moment when that
last in-progress transaction committed.

So, from what I gather there's a slight leaning towards *not* storing
the relation's oid in the WAL. Which means the concerns about the
uniqueness issues with the syscaches need to be addressed. So far I know
of three solutions:
1) develop a custom caching/mapping module
2) Make sure InvalidOid's (the only possible duplicate) can't end up the
syscache by adding a hook that prevents that on the catcache level
3) Make sure that there can't be any duplicates by storing the oid of
the relation in a mapped relations relfilenode

Opinions?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#39)
Re: changeset generation v5-01 - Patches & git tree

Since this discussion seems to have stalled, let me do a quick summary.
The goal of this subset of patches is to allow retroactive look up of
relations starting from a WAL record. Currently, the WAL record only
tracks the relfilenode that it affects, so there are two possibilities:

1. we add some way to find out the relation OID from the relfilenode,
2. we augment the WAL record with the relation OID.

Each solution has its drawbacks. For the former,
* we need a new cache
* we need a new pg_class index
* looking up the relation OID still requires some CPU runtime and memory
to keep the caches in; run invalidations, etc.

For the latter,
* each WAL record would become somewhat bigger. For WAL records with a
payload of 25 bytes (say insert a tuple which is 25 bytes long) this
means about 7% overhead.

There are some other issues, but these can be solved. For instance Tom
doesn't want a syscache on top of a non-unique index, and I agree on
that. But if we agree on this way forward, we can just go a different
route by keeping a separate cache layer.

So the question is, do we take the overhead of the new index (which
means overhead on DML operations -- supposedly rare) or do we take the
overhead of larger WAL records (which means overhead on all DDL
operations)?

Note we can make either thing apply to only people running logical
replication.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#40)
Re: changeset generation v5-01 - Patches & git tree

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

So the question is, do we take the overhead of the new index (which
means overhead on DML operations -- supposedly rare) or do we take the
overhead of larger WAL records (which means overhead on all DDL
operations)?

Note we can make either thing apply to only people running logical
replication.

I don't believe you can have or not have an index on pg_class as easily
as all that. The choice would have to be frozen at initdb time, so
people would have to pay the overhead if they thought there was even a
small possibility that they'd want logical replication later.

Flipping the content of WAL records might not be a terribly simple thing
to do either, but at least in principle it could be done during a
postmaster restart, without initdb.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#41)
Re: changeset generation v5-01 - Patches & git tree

On 2013-07-01 14:16:55 -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

So the question is, do we take the overhead of the new index (which
means overhead on DML operations -- supposedly rare) or do we take the
overhead of larger WAL records (which means overhead on all DDL
operations)?

Note we can make either thing apply to only people running logical
replication.

I don't believe you can have or not have an index on pg_class as easily
as all that. The choice would have to be frozen at initdb time, so
people would have to pay the overhead if they thought there was even a
small possibility that they'd want logical replication later.

It should be possible to create the index in a single database when we
start logical replication in that database? Running the index creation
with a fixed oid shouldn't require too much code. The oid won't be
reused by other pg_class entries since it would be a system one.
Alternatively we could always create the index's pg_class/index entry
but mark it as !indislive when logical replication isn't active for that
database. Then activating it would just require rebuilding that
index.

But then, I am not fully convinced that's worth the trouble since I
don't think pg_class index maintenance is the painspot in DDL atm.

Flipping the content of WAL records might not be a terribly simple thing
to do either, but at least in principle it could be done during a
postmaster restart, without initdb.

The main patch combines various booleans in the heap wal records into a
flags variable, so there should be enough space to keep track of it
without increasing size. Makes size calculations a bit more annoying
though as we use the xlog record length to calculate the heap tuple's
length, but that's not a large problem.
So we could just set the XLOG_HEAP_CONTAINS_CLASSOID flag if wal_level

= WAL_LEVEL_LOGICAL. Wal decoding then can throw a tantrum if it finds

a record without it and we're done.

We could even make that per database, but that seems to be something for
the future.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#22)
Re: changeset generation v5-01 - Patches & git tree

On 27 June 2013 23:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Exactly what is the argument that says performance of this
function is sufficiently critical to justify adding both the maintenance
overhead of a new pg_class index, *and* a broken-by-design syscache?

I think we all agree on changing the syscache.

I'm not clear why adding a new permanent index to pg_class is such a
problem. It's going to be a very thin index. I'm trying to imagine a use
case that has pg_class index maintenance as a major part of its workload
and I can't. An extra index on pg_attribute and I might agree with you. The
pg_class index would only be a noticeable % of catalog rows for very thin
temp tables, but would still even then be small; that isn't even necessary
work since we all agree that temp table overheads could and should be
optimised away somwhere. So blocking a new index because of that sounds
strange.

What issues do you foresee? How can we test them?

Or perhaps we should just add the index and see if we later discover a
measurable problem workload?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#44Andres Freund
andres@2ndquadrant.com
In reply to: Kevin Grittner (#24)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-27 21:52:03 -0700, Kevin Grittner wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

Hm. There were some issues with the test_logical_decoding
Makefile not cleaning up the regression installation properly.
Which might have caused the issue.

Could you try after applying the patches and executing a clean
and then rebuild?

Tried, and problem persists.

Otherwise, could you try applying my git tree so we are sure we
test the same thing?

$ git remote add af git://git.postgresql.org/git/users/andresfreund/postgres.git
$ git fetch af
$ git checkout -b xlog-decoding af/xlog-decoding-rebasing-cf4
$ ./configure ...
$ make

Tried that, too, and problem persists.  The log shows the last
commit on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

Ok. I think I have a slight idea what's going on. Could you check
whether recompiling with -O0 "fixes" the issue?

There's something strange going on here, not sure whether it's just a
bug that's hidden, by either not doing optimizations or by adding more
elog()s, or wheter it's a compiler bug.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#44)
1 attachment(s)
Re: changeset generation v5-01 - Patches & git tree

On 2013-07-05 14:03:56 +0200, Andres Freund wrote:

On 2013-06-27 21:52:03 -0700, Kevin Grittner wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

Hm. There were some issues with the test_logical_decoding
Makefile not cleaning up the regression installation properly.
Which might have caused the issue.

Could you try after applying the patches and executing a clean
and then rebuild?

Tried, and problem persists.

Otherwise, could you try applying my git tree so we are sure we
test the same thing?

$ git remote add af git://git.postgresql.org/git/users/andresfreund/postgres.git
$ git fetch af
$ git checkout -b xlog-decoding af/xlog-decoding-rebasing-cf4
$ ./configure ...
$ make

Tried that, too, and problem persists.  The log shows the last
commit on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

Ok. I think I have a slight idea what's going on. Could you check
whether recompiling with -O0 "fixes" the issue?

There's something strange going on here, not sure whether it's just a
bug that's hidden, by either not doing optimizations or by adding more
elog()s, or wheter it's a compiler bug.

Ok. It was supreme stupidity on my end. Sorry for the time you spent on
it.

Some versions of gcc (and probably other compilers) were removing
sections of code when optimizing because the code was doing undefined
things. Parts of the rdata chain were allocated locally in an
if (needs_key). Which obviously is utterly bogus... A warning would have
been nice though.

Fix pushed and attached.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-wal_decoding-mergme-Don-t-use-out-of-scope-local-var.patchtext/x-patch; charset=us-asciiDownload
>From ddbaa1dbf8e0283b41098f5a08a8d21d809b9a63 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 5 Jul 2013 15:07:19 +0200
Subject: [PATCH] wal_decoding: mergme: Don't use out-of-scope local variables
 as part of the rdata chain

Depending on optimization level and other configuration flags removed the
sections of code doing that sinced doing so invokes undefined behaviour making
it legal for the compiler to do so.
---
 src/backend/access/heap/heapam.c | 37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f51b73f..f9f1705 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5987,9 +5987,10 @@ log_heap_update(Relation reln, Buffer oldbuf,
 {
 	xl_heap_update xlrec;
 	xl_heap_header_len xlhdr;
+	xl_heap_header_len xlhdr_idx;
 	uint8		info;
 	XLogRecPtr	recptr;
-	XLogRecData rdata[4];
+	XLogRecData rdata[7];
 	Page		page = BufferGetPage(newbuf);
 
 	/*
@@ -6054,40 +6055,38 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	*/
 	if(need_tuple_data)
 	{
-		XLogRecData rdata_logical[4];
-
-		rdata[3].next = &(rdata_logical[0]);
+		rdata[3].next = &(rdata[4]);
 
-		rdata_logical[0].data = NULL,
-		rdata_logical[0].len = 0;
-		rdata_logical[0].buffer = newbuf;
-		rdata_logical[0].buffer_std = true;
-		rdata_logical[0].next = NULL;
+		rdata[4].data = NULL,
+		rdata[4].len = 0;
+		rdata[4].buffer = newbuf;
+		rdata[4].buffer_std = true;
+		rdata[4].next = NULL;
 		xlrec.flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
 
 		/* candidate key changed and we have a candidate key */
 		if (idx_tuple)
 		{
 			/* don't really need this, but its more comfy */
-			xl_heap_header_len xlhdr_idx;
 			xlhdr_idx.header.t_infomask2 = idx_tuple->t_data->t_infomask2;
 			xlhdr_idx.header.t_infomask = idx_tuple->t_data->t_infomask;
 			xlhdr_idx.header.t_hoff = idx_tuple->t_data->t_hoff;
 			xlhdr_idx.t_len = idx_tuple->t_len;
 
-			rdata_logical[0].next = &(rdata_logical[1]);
-			rdata_logical[1].data = (char *) &xlhdr_idx;
-			rdata_logical[1].len = SizeOfHeapHeaderLen;
-			rdata_logical[1].buffer = InvalidBuffer;
-			rdata_logical[1].next = &(rdata_logical[2]);
+			rdata[4].next = &(rdata[5]);
+			rdata[5].data = (char *) &xlhdr_idx;
+			rdata[5].len = SizeOfHeapHeaderLen;
+			rdata[5].buffer = InvalidBuffer;
+			rdata[5].next = &(rdata[6]);
 
 			/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
-			rdata_logical[2].data = (char *) idx_tuple->t_data
+			rdata[6].data = (char *) idx_tuple->t_data
 				+ offsetof(HeapTupleHeaderData, t_bits);
-			rdata_logical[2].len = idx_tuple->t_len
+			rdata[6].len = idx_tuple->t_len
 				- offsetof(HeapTupleHeaderData, t_bits);
-			rdata_logical[2].buffer = InvalidBuffer;
-			rdata_logical[2].next = NULL;
+			rdata[6].buffer = InvalidBuffer;
+			rdata[6].next = NULL;
+
 			xlrec.flags |= XLOG_HEAP_CONTAINS_OLD_KEY;
 		}
 	}
-- 
1.8.2.rc2.4.g7799588.dirty

#46Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#44)
Re: changeset generation v5-01 - Patches & git tree

On 07/05/2013 08:03 AM, Andres Freund wrote:

On 2013-06-27 21:52:03 -0700, Kevin Grittner wrote:

Tried that, too, and problem persists. The log shows the last commit
on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

Ok. I think I have a slight idea what's going on. Could you check
whether recompiling with -O0 "fixes" the issue?

There's something strange going on here, not sure whether it's just a
bug that's hidden, by either not doing optimizations or by adding more
elog()s, or wheter it's a compiler bug.

I am getting the same test failure Kevin is seeing.
This is on a x64 Debian wheezy machine with
gcc (Debian 4.7.2-5) 4.7.2

Building with -O0 results in passing tests.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47Andres Freund
andres@2ndquadrant.com
In reply to: Steve Singer (#46)
Re: changeset generation v5-01 - Patches & git tree

On 2013-07-05 09:28:45 -0400, Steve Singer wrote:

On 07/05/2013 08:03 AM, Andres Freund wrote:

On 2013-06-27 21:52:03 -0700, Kevin Grittner wrote:

Tried that, too, and problem persists. The log shows the last commit on
your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

Ok. I think I have a slight idea what's going on. Could you check
whether recompiling with -O0 "fixes" the issue?

There's something strange going on here, not sure whether it's just a
bug that's hidden, by either not doing optimizations or by adding more
elog()s, or wheter it's a compiler bug.

I am getting the same test failure Kevin is seeing.
This is on a x64 Debian wheezy machine with
gcc (Debian 4.7.2-5) 4.7.2

Building with -O0 results in passing tests.

Does the patch from
http://archives.postgresql.org/message-id/20130705132513.GB11640%40awork2.anarazel.de
or the git tree (which is rebased ontop of the mvcc catalog commit from
robert which needs some changes) fix it, even with optimizations?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#47)
Re: changeset generation v5-01 - Patches & git tree

On 07/05/2013 09:34 AM, Andres Freund wrote:

On 2013-07-05 09:28:45 -0400, Steve Singer wrote:

On 07/05/2013 08:03 AM, Andres Freund wrote:

On 2013-06-27 21:52:03 -0700, Kevin Grittner wrote:

Tried that, too, and problem persists. The log shows the last commit on
your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

Ok. I think I have a slight idea what's going on. Could you check
whether recompiling with -O0 "fixes" the issue?

There's something strange going on here, not sure whether it's just a
bug that's hidden, by either not doing optimizations or by adding more
elog()s, or wheter it's a compiler bug.

I am getting the same test failure Kevin is seeing.
This is on a x64 Debian wheezy machine with
gcc (Debian 4.7.2-5) 4.7.2

Building with -O0 results in passing tests.

Does the patch from
http://archives.postgresql.org/message-id/20130705132513.GB11640%40awork2.anarazel.de
or the git tree (which is rebased ontop of the mvcc catalog commit from
robert which needs some changes) fix it, even with optimizations?

Yes your latest git tree the tests pass with -O2

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#2)
Re: changeset generation v5-01 - Patches & git tree

On 06/14/2013 06:51 PM, Andres Freund wrote:

The git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4

We discussed issues related to passing options to the plugins a number
of months ago (
/messages/by-id/20130129015732.GA24238@awork2.anarazel.de)

I'm still having issues with the syntax you describe there.

START_LOGICAL_REPLICATION "1" 0/0 ("foo","bar")
unexpected termination of replication stream: ERROR: foo requires a
parameter

START_LOGICAL_REPLICATION "1" 0/0 ("foo" "bar")
"START_LOGICAL_REPLICATION "1" 0/0 ("foo" "bar")": ERROR: syntax error

START_LOGICAL_REPLICATION "1" 0/0 ("foo")
works okay

Steve

On 2013-06-15 00:48:17 +0200, Andres Freund wrote:

Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains
the tests for the feature. Uses 0014
0017: Snapshot building docs; not required

Version v5-01 attached

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50Andres Freund
andres@2ndquadrant.com
In reply to: Steve Singer (#49)
Re: changeset generation v5-01 - Patches & git tree

On 2013-07-05 11:33:20 -0400, Steve Singer wrote:

On 06/14/2013 06:51 PM, Andres Freund wrote:

The git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4

We discussed issues related to passing options to the plugins a number of
months ago ( /messages/by-id/20130129015732.GA24238@awork2.anarazel.de)

I'm still having issues with the syntax you describe there.

START_LOGICAL_REPLICATION "1" 0/0 ("foo","bar")
unexpected termination of replication stream: ERROR: foo requires a
parameter

I'd guess that's coming from your output plugin? You're using
defGetString() on DefElem without a value?

START_LOGICAL_REPLICATION "1" 0/0 ("foo" "bar")

Yes, the option *names* are identifiers, together with plugin & slot
names. The passed values need to be SCONSTs atm
(src/backend/replication/repl_gram.y):

plugin_opt_elem:
IDENT plugin_opt_arg
{
$$ = makeDefElem($1, $2);
}
;

plugin_opt_arg:
SCONST { $$ = (Node *) makeString($1); }
| /* EMPTY */ { $$ = NULL; }
;

So, it would have to be:
START_LOGICAL_REPLICATION "1" 0/0 ("foo" 'bar blub frob', "sup" 'star', "noarg")

Now that's not completely obvious, I admit :/. Better suggestions?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#39)
5 attachment(s)
Re: changeset generation v5-01 - Patches & git tree

On 2013-06-28 21:47:47 +0200, Andres Freund wrote:

So, from what I gather there's a slight leaning towards *not* storing
the relation's oid in the WAL. Which means the concerns about the
uniqueness issues with the syscaches need to be addressed. So far I know
of three solutions:
1) develop a custom caching/mapping module
2) Make sure InvalidOid's (the only possible duplicate) can't end up the
syscache by adding a hook that prevents that on the catcache level
3) Make sure that there can't be any duplicates by storing the oid of
the relation in a mapped relations relfilenode

So, here's 4 patches:
1) add RelationMapFilenodeToOid()
2) Add pg_class index on (reltablespace, relfilenode)
3a) Add custom cache that maps from filenode to oid
3b) Add catcache 'filter' that ensures the cache stays unique and use
that for the mapping
4) Add pg_relation_by_filenode() and use it in a regression test

3b) adds an optional 'filter' attribute to struct cachedesc in
syscache.c which is then passed to catcache.c. If it's existant
catcache.c uses it - after checking for a match in the cache - to
check whether the queried-for value possibly should end up in the
cache. If not it stores a whiteout entry as currently already done
for nonexistant entries.
It also reorders some catcache.h struct attributes to make sure
we're not growing them. Might make sense to apply that
independently, those are rather heavily used.

I slightly prefer 3b) because it's smaller, what's your opinions?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

3a-Introduce-a-new-relfilenodemap-cache-that-maps-filen.patchtext/x-patch; charset=us-asciiDownload
>From 4019f9556f3708c2d7515ce1d6e1f42c1b724e89 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 7 Jul 2013 18:24:41 +0200
Subject: [PATCH] Introduce a new relfilenodemap cache that maps filenodes to
 oids

To make invalidations work hook into relcache invalidations.
---
 src/backend/utils/adt/dbsize.c           |   1 +
 src/backend/utils/cache/Makefile         |   3 +-
 src/backend/utils/cache/inval.c          |   2 +-
 src/backend/utils/cache/relfilenodemap.c | 261 +++++++++++++++++++++++++++++++
 src/include/utils/relfilenodemap.h       |  18 +++
 5 files changed, 283 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/utils/cache/relfilenodemap.c
 create mode 100644 src/include/utils/relfilenodemap.h

diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 34482ab..c101fee 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -28,6 +28,7 @@
 #include "utils/builtins.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
+#include "utils/relfilenodemap.h"
 #include "utils/relmapper.h"
 #include "utils/syscache.h"
 
diff --git a/src/backend/utils/cache/Makefile b/src/backend/utils/cache/Makefile
index 32d722e..a943f8e 100644
--- a/src/backend/utils/cache/Makefile
+++ b/src/backend/utils/cache/Makefile
@@ -13,6 +13,7 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = attoptcache.o catcache.o evtcache.o inval.o plancache.o relcache.o \
-	relmapper.o spccache.o syscache.o lsyscache.o typcache.o ts_cache.o
+	relmapper.o relfilenodemap.o spccache.o syscache.o lsyscache.o \
+	typcache.o ts_cache.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 3356d0f..080f223 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -178,7 +178,7 @@ static int	maxSharedInvalidMessagesArray;
  */
 
 #define MAX_SYSCACHE_CALLBACKS 32
-#define MAX_RELCACHE_CALLBACKS 5
+#define MAX_RELCACHE_CALLBACKS 6
 
 static struct SYSCACHECALLBACK
 {
diff --git a/src/backend/utils/cache/relfilenodemap.c b/src/backend/utils/cache/relfilenodemap.c
new file mode 100644
index 0000000..9f543cd
--- /dev/null
+++ b/src/backend/utils/cache/relfilenodemap.c
@@ -0,0 +1,261 @@
+/*-------------------------------------------------------------------------
+ *
+ * relfilenodemap.c
+ *	  relfilenode to oid mapping cache.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/cache/relfilenode.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_tablespace.h"
+#include "utils/builtins.h"
+#include "utils/catcache.h"
+#include "utils/hsearch.h"
+#include "utils/inval.h"
+#include "utils/fmgroids.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/relmapper.h"
+
+/* Hash table for informations about each relfilenode <-> oid pair */
+static HTAB *RelfilenodeMapHash = NULL;
+
+/* built first time through in InitializeRelfilenodeMap */
+ScanKeyData relfilenode_skey[2];
+
+typedef struct
+{
+	Oid			reltablespace;
+	Oid			relfilenode;
+} RelfilenodeMapKey;
+
+typedef struct
+{
+	RelfilenodeMapKey key;	/* lookup key - must be first */
+	Oid			relid;			/* pg_class.oid */
+} RelfilenodeMapEntry;
+
+/*
+ * RelfilenodeMapInvalidateCallback
+ *		Flush mapping entries when pg_class is updated in a relevant fashion.
+ */
+static void
+RelfilenodeMapInvalidateCallback(Datum arg, Oid relid)
+{
+	HASH_SEQ_STATUS status;
+	RelfilenodeMapEntry *entry;
+
+	/* not active or deleted */
+	if (RelfilenodeMapHash == NULL)
+		return;
+
+	/* delete entire cache */
+	if (relid == InvalidOid)
+	{
+		hash_destroy(RelfilenodeMapHash);
+		RelfilenodeMapHash = NULL;
+		return;
+	}
+
+	hash_seq_init(&status, RelfilenodeMapHash);
+	while ((entry = (RelfilenodeMapEntry *) hash_seq_search(&status)) != NULL)
+	{
+		/*
+		 * Note that there might be multiple entries for one oid at the same
+		 * time while we're processing invalidations.
+		 */
+		if (entry->relid == relid)
+		{
+			if (hash_search(RelfilenodeMapHash,
+							(void *) &entry->key,
+							HASH_REMOVE,
+							NULL) == NULL)
+				elog(ERROR, "hash table corrupted");
+		}
+	}
+}
+
+static void
+InitializeRelfilenodeMap(void)
+{
+	HASHCTL		ctl;
+	static bool	initial_init_done = false;
+	int i;
+
+	/* Make sure we've initialized CacheMemoryContext. */
+	if (CacheMemoryContext == NULL)
+		CreateCacheMemoryContext();
+
+	/* Initialize the hash table. */
+	MemSet(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(RelfilenodeMapKey);
+	ctl.entrysize = sizeof(RelfilenodeMapEntry);
+	ctl.hash = tag_hash;
+	ctl.hcxt = CacheMemoryContext;
+
+	RelfilenodeMapHash =
+		hash_create("RelfilenodeMap cache", 1024, &ctl,
+					HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+
+	/*
+	 * For complete resets we simply delete the entire hash, but there's no
+	 * need to do the other stuff multiple times. Especially the initialization
+	 * of the relcche invalidation should only be done once.
+	 */
+	if (initial_init_done)
+		return;
+
+	/* build skey */
+	MemSet(&relfilenode_skey, 0, sizeof(relfilenode_skey));
+
+	for (i = 0; i < 2; i++)
+	{
+		fmgr_info_cxt(F_OIDEQ,
+					  &relfilenode_skey[i].sk_func,
+					  CacheMemoryContext);
+		relfilenode_skey[i].sk_strategy = BTEqualStrategyNumber;
+		relfilenode_skey[i].sk_subtype = InvalidOid;
+		relfilenode_skey[i].sk_collation = InvalidOid;
+	}
+
+	relfilenode_skey[0].sk_attno = Anum_pg_class_reltablespace;
+	relfilenode_skey[1].sk_attno = Anum_pg_class_relfilenode;
+
+	/* Watch for invalidation events. */
+	CacheRegisterRelcacheCallback(RelfilenodeMapInvalidateCallback,
+								  (Datum) 0);
+	initial_init_done = true;
+}
+
+/*
+ * Map a relation's (tablespace, filenode) to a relation's oid and cache the
+ * result.
+ *
+ * Instead of DEFAULTTABLESPACE_OID InvalidOid/0 can be passed as
+ * tablespace. The table identified by the parameter pair can be a shared,
+ * nailed or normal relation.
+ *
+ * Returns InvalidOid if no relation mapping the criteria could be found.
+ */
+Oid
+RelidByRelfilenode(Oid reltablespace, Oid relfilenode)
+{
+	RelfilenodeMapKey key;
+	RelfilenodeMapEntry *entry;
+	bool found;
+	SysScanDesc scandesc;
+	Relation relation;
+	HeapTuple ntp;
+	ScanKeyData skey[2];
+
+	if (RelfilenodeMapHash == NULL)
+		InitializeRelfilenodeMap();
+
+	/*
+	 * relations in the default tablespace are stored with InvalidOid as
+	 * pg_class.reltablespace.
+	 */
+	if (reltablespace == DEFAULTTABLESPACE_OID)
+		reltablespace = 0;
+
+	MemSet(&key, 0, sizeof(key));
+	key.reltablespace = reltablespace;
+	key.relfilenode = relfilenode;
+
+	/*
+	 * Check cache and enter entry if nothing could be found. Even if no target
+	 * relation can be found lateron we store the negative match and return a
+	 * InvalidOid from cache. That's not really necessary for performance since
+	 * querinyg invalid values isn't supposed to be a frequent thing, but it's
+	 * simpler implementation wise this way.
+	 */
+	entry = hash_search(RelfilenodeMapHash,
+						(void *) &key,
+						HASH_ENTER,
+						&found);
+
+	if (found)
+		return entry->relid;
+
+	/* ok, no previous cache entry, do it the hard way */
+
+	/* check shared tables */
+	if (reltablespace == GLOBALTABLESPACE_OID)
+	{
+		entry->relid = RelationMapFilenodeToOid(relfilenode, true);
+		return entry->relid;
+	}
+
+	/* check plain relations by looking in pg_class */
+	relation = heap_open(RelationRelationId, AccessShareLock);
+
+	/* copy scankey to local copy, it will be modified during the scan */
+	memcpy(skey, relfilenode_skey, sizeof(skey));
+
+	/* set scan arguments */
+	skey[0].sk_argument = ObjectIdGetDatum(reltablespace);
+	skey[1].sk_argument = ObjectIdGetDatum(relfilenode);
+
+	scandesc = systable_beginscan(relation,
+								  ClassTblspcRelfilenodeIndexId,
+								  true,
+								  NULL,
+								  2,
+								  skey);
+
+	found = false;
+
+	while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
+	{
+		bool isnull;
+
+		if (found)
+			elog(ERROR, "duplicate in GetOidByFilenode");
+		found = true;
+
+#ifdef USE_ASSERT_CHECKING
+		if (assert_enabled)
+		{
+			Oid check;
+			check = fastgetattr(ntp, Anum_pg_class_reltablespace,
+								RelationGetDescr(relation),
+								&isnull);
+
+			/*
+			 * reltablespace is already set to InvalidOid above if we're
+			 * looking for DEFAULTTABLESPACE_OID.
+			 */
+			if (isnull || check != reltablespace)
+				elog(ERROR, "borked reltablespace lookup");
+
+			check = fastgetattr(ntp, Anum_pg_class_relfilenode,
+								RelationGetDescr(relation),
+								&isnull);
+
+			if (isnull || check != relfilenode)
+				elog(ERROR, "borked relfilenode lookup");
+		}
+#endif
+		entry->relid = HeapTupleGetOid(ntp);
+	}
+
+	systable_endscan(scandesc);
+	heap_close(relation, AccessShareLock);
+
+	/* check for nailed tables, those will not have been in pg_class */
+	if (!found)
+		entry->relid = RelationMapFilenodeToOid(relfilenode, false);
+
+	return entry->relid;
+}
diff --git a/src/include/utils/relfilenodemap.h b/src/include/utils/relfilenodemap.h
new file mode 100644
index 0000000..cdb9bbf
--- /dev/null
+++ b/src/include/utils/relfilenodemap.h
@@ -0,0 +1,18 @@
+/*-------------------------------------------------------------------------
+ *
+ * relfilenodemap.h
+ *	  relfilenode to oid mapping cache.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/relfilenodemap.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef RELFILENODEMAP_H
+#define RELFILENODEMAP_H
+
+Oid RelidByRelfilenode(Oid reltablespace, Oid relfilenode);
+
+#endif   /* RELFILENODEMAP_H */
-- 
1.8.3.251.g1462b67

3b-Add-syscache-for-filenode-to-oid-mapping.patchtext/x-patch; charset=us-asciiDownload
>From ed21c52a5c3ed5837fda5a16871be7505e6bb02d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 7 Jul 2013 18:39:43 +0200
Subject: [PATCH] Add syscache for filenode to oid mapping

This cache is problematic because formallyindexes used by syscaches needs to be
unique, this one is not. This is "just" because of 0/InvalidOid are stored in
pg_class.relfilenode for nailed/shared catalog relations. Even if we should
never query those values we need to make sure nothing bad can happen in that
case. So add a 'filter' to the syscache infrastructure that allows to specify a
function which prevents tuples from ending up in a catcache.
---
 src/backend/utils/cache/catcache.c | 25 ++++++++++++++++-----
 src/backend/utils/cache/relcache.c | 45 +++++++++++++++++++++++++++++++++++++
 src/backend/utils/cache/syscache.c | 46 +++++++++++++++++++++++++++++++++++++-
 src/include/utils/catcache.h       | 21 ++++++++++-------
 src/include/utils/relcache.h       |  6 +++++
 src/include/utils/syscache.h       |  1 +
 6 files changed, 129 insertions(+), 15 deletions(-)

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index d12da76..536de2d 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -726,7 +726,8 @@ InitCatCache(int id,
 			 Oid indexoid,
 			 int nkeys,
 			 const int *key,
-			 int nbuckets)
+			 int nbuckets,
+			 PGFunction filter)
 {
 	CatCache   *cp;
 	MemoryContext oldcxt;
@@ -787,6 +788,7 @@ InitCatCache(int id,
 	cp->cc_indexoid = indexoid;
 	cp->cc_relisshared = false; /* temporary */
 	cp->cc_tupdesc = (TupleDesc) NULL;
+	cp->cc_filter = filter;
 	cp->cc_ntup = 0;
 	cp->cc_nbuckets = nbuckets;
 	cp->cc_nkeys = nkeys;
@@ -1162,6 +1164,18 @@ SearchCatCache(CatCache *cache,
 		}
 	}
 
+	ct = NULL;
+
+	if (cache->cc_filter != NULL &&
+		DatumGetBool(DirectFunctionCall2(cache->cc_filter,
+										 PointerGetDatum(cache),
+										 PointerGetDatum(&cur_skey))))
+	{
+		CACHE2_elog(DEBUG2, "SearchCatCache(%s): filtering lookup",
+					cache->cc_relname);
+		goto create_negative;
+	}
+
 	/*
 	 * Tuple was not found in cache, so we have to try to retrieve it directly
 	 * from the relation.  If found, we will add it to the cache; if not
@@ -1186,8 +1200,6 @@ SearchCatCache(CatCache *cache,
 								  cache->cc_nkeys,
 								  cur_skey);
 
-	ct = NULL;
-
 	while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 	{
 		ct = CatalogCacheCreateEntry(cache, ntp,
@@ -1204,10 +1216,11 @@ SearchCatCache(CatCache *cache,
 
 	heap_close(relation, AccessShareLock);
 
+create_negative:
 	/*
-	 * If tuple was not found, we need to build a negative cache entry
-	 * containing a fake tuple.  The fake tuple has the correct key columns,
-	 * but nulls everywhere else.
+	 * If tuple was not found or filtered, we need to build a negative cache
+	 * entry containing a fake tuple.  The fake tuple has the correct key
+	 * columns, but nulls everywhere else.
 	 *
 	 * In bootstrap mode, we don't build negative entries, because the cache
 	 * invalidation mechanism isn't alive and can't clear them if the tuple
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 66fb63b..c1eb3c4 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4859,3 +4859,48 @@ unlink_initfile(const char *initfilename)
 			elog(LOG, "could not remove cache file \"%s\": %m", initfilename);
 	}
 }
+
+Oid
+RelidByRelfilenode(Oid reltablespace, Oid relfilenode)
+{
+	Oid			lookup_tablespace;
+	Oid			heaprel;
+	HeapTuple	tuple;
+
+	if (reltablespace == 0)
+		reltablespace = DEFAULTTABLESPACE_OID;
+
+	/* in global tablespace, has to be a shared table */
+	if (reltablespace == GLOBALTABLESPACE_OID)
+	{
+		heaprel = RelationMapFilenodeToOid(relfilenode, true);
+	}
+	else
+	{
+		/*
+		 * relations in the default tablespace are stored with InvalidOid as
+		 * pg_class."reltablespace".
+		 */
+		if (reltablespace == DEFAULTTABLESPACE_OID)
+			lookup_tablespace = InvalidOid;
+		else
+			lookup_tablespace = reltablespace;
+
+
+		tuple = SearchSysCache2(RELFILENODE,
+								lookup_tablespace,
+								relfilenode);
+		/* ok, found it */
+		if (HeapTupleIsValid(tuple))
+		{
+			heaprel = HeapTupleHeaderGetOid(tuple->t_data);
+			ReleaseSysCache(tuple);
+		}
+		/* has to be nonexistant or a nailed table, but not shared */
+		else
+		{
+			heaprel = RelationMapFilenodeToOid(relfilenode, false);
+		}
+	}
+	return heaprel;
+}
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 1ff2f2b..0b3e901 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -110,8 +110,11 @@ struct cachedesc
 	int			nkeys;			/* # of keys needed for cache lookup */
 	int			key[4];			/* attribute numbers of key attrs */
 	int			nbuckets;		/* number of hash buckets for this cache */
+	PGFunction	filter;			/* optional filter to guarantee uniqueness */
 };
 
+static Datum filter_filenode_syscache(PG_FUNCTION_ARGS);
+
 static const struct cachedesc cacheinfo[] = {
 	{AggregateRelationId,		/* AGGFNOID */
 		AggregateFnoidIndexId,
@@ -598,6 +601,18 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		64
 	},
+	{RelationRelationId,		/* RELFILENODE */
+		ClassTblspcRelfilenodeIndexId,
+		2,
+		{
+			Anum_pg_class_reltablespace,
+			Anum_pg_class_relfilenode,
+			0,
+			0
+		},
+		1024,
+		filter_filenode_syscache
+	},
 	{RelationRelationId,		/* RELNAMENSP */
 		ClassNameNspIndexId,
 		2,
@@ -834,7 +849,8 @@ InitCatalogCache(void)
 										 cacheinfo[cacheId].indoid,
 										 cacheinfo[cacheId].nkeys,
 										 cacheinfo[cacheId].key,
-										 cacheinfo[cacheId].nbuckets);
+										 cacheinfo[cacheId].nbuckets,
+										 cacheinfo[cacheId].filter);
 		if (!PointerIsValid(SysCache[cacheId]))
 			elog(ERROR, "could not initialize cache %u (%d)",
 				 cacheinfo[cacheId].reloid, cacheId);
@@ -1208,3 +1224,31 @@ oid_compare(const void *a, const void *b)
 		return 0;
 	return (oa > ob) ? 1 : -1;
 }
+
+/*
+ * Filter away lookups with a InvalidOid relfilenode - those are nailed &
+ * shared relations and are managed via relmapper not via pg_class. Because of
+ * that InvalidOid is stored in pg_class.relfilenode for those making the index
+ * not unique. Make it unique by filtering away those rows.
+ *
+ * Arguments passed:
+ *  0: catcache
+ *  1: *ScanKey[4] identifying the to-be-filtered key
+ *
+ * Return true if the argument should be considered *nonexistant*.
+ */
+static Datum
+filter_filenode_syscache(PG_FUNCTION_ARGS)
+{
+	CatCache   *cache = (CatCache *) PG_GETARG_POINTER(0);
+	ScanKeyData	*skeys  = (ScanKeyData *) PG_GETARG_POINTER(1);
+
+	if (cache->cc_nkeys != 2)
+		elog(ERROR, "invalid parameter");
+
+	/* filter away if relfilenode == 0 */
+	if (DatumGetObjectId(skeys[1].sk_argument) == InvalidOid)
+		PG_RETURN_BOOL(true);
+
+	PG_RETURN_BOOL(false);
+}
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index b6e1c97..a671d31 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -37,21 +37,22 @@
 typedef struct catcache
 {
 	int			id;				/* cache identifier --- see syscache.h */
+	int			cc_ntup;		/* # of tuples currently in this cache */
+	int			cc_nbuckets;	/* # of hash buckets in this cache */
+	int			cc_nkeys;		/* # of keys (1..CATCACHE_MAXKEYS) */
 	slist_node	cc_next;		/* list link */
 	const char *cc_relname;		/* name of relation the tuples come from */
 	Oid			cc_reloid;		/* OID of relation the tuples come from */
 	Oid			cc_indexoid;	/* OID of index matching cache keys */
 	bool		cc_relisshared; /* is relation shared across databases? */
-	TupleDesc	cc_tupdesc;		/* tuple descriptor (copied from reldesc) */
-	int			cc_ntup;		/* # of tuples currently in this cache */
-	int			cc_nbuckets;	/* # of hash buckets in this cache */
-	int			cc_nkeys;		/* # of keys (1..CATCACHE_MAXKEYS) */
+	bool		cc_isname[CATCACHE_MAXKEYS];	/* flag "name" key columns */
 	int			cc_key[CATCACHE_MAXKEYS];		/* AttrNumber of each key */
 	PGFunction	cc_hashfunc[CATCACHE_MAXKEYS];	/* hash function for each key */
 	ScanKeyData cc_skey[CATCACHE_MAXKEYS];		/* precomputed key info for
 												 * heap scans */
-	bool		cc_isname[CATCACHE_MAXKEYS];	/* flag "name" key columns */
+	TupleDesc	cc_tupdesc;		/* tuple descriptor (copied from reldesc) */
 	dlist_head	cc_lists;		/* list of CatCList structs */
+	PGFunction	cc_filter;		/* optional filter to achieve uniqueness */
 #ifdef CATCACHE_STATS
 	long		cc_searches;	/* total # searches against this cache */
 	long		cc_hits;		/* # of matches against existing entry */
@@ -74,6 +75,9 @@ typedef struct catctup
 {
 	int			ct_magic;		/* for identifying CatCTup entries */
 #define CT_MAGIC   0x57261502
+
+	uint32		hash_value;		/* hash value for this tuple's keys */
+
 	CatCache   *my_cache;		/* link to owning catcache */
 
 	/*
@@ -107,7 +111,6 @@ typedef struct catctup
 	int			refcount;		/* number of active references */
 	bool		dead;			/* dead but not yet removed? */
 	bool		negative;		/* negative cache entry? */
-	uint32		hash_value;		/* hash value for this tuple's keys */
 	HeapTupleData tuple;		/* tuple management header */
 } CatCTup;
 
@@ -116,6 +119,9 @@ typedef struct catclist
 {
 	int			cl_magic;		/* for identifying CatCList entries */
 #define CL_MAGIC   0x52765103
+
+	uint32		hash_value;		/* hash value for lookup keys */
+
 	CatCache   *my_cache;		/* link to owning catcache */
 
 	/*
@@ -144,7 +150,6 @@ typedef struct catclist
 	bool		dead;			/* dead but not yet removed? */
 	bool		ordered;		/* members listed in index order? */
 	short		nkeys;			/* number of lookup keys specified */
-	uint32		hash_value;		/* hash value for lookup keys */
 	HeapTupleData tuple;		/* header for tuple holding keys */
 	int			n_members;		/* number of member tuples */
 	CatCTup    *members[1];		/* members --- VARIABLE LENGTH ARRAY */
@@ -166,7 +171,7 @@ extern void AtEOXact_CatCache(bool isCommit);
 
 extern CatCache *InitCatCache(int id, Oid reloid, Oid indexoid,
 			 int nkeys, const int *key,
-			 int nbuckets);
+			 int nbuckets, PGFunction filter);
 extern void InitCatCachePhase2(CatCache *cache, bool touch_index);
 
 extern HeapTuple SearchCatCache(CatCache *cache,
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8ac2549..46172f3 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -110,6 +110,12 @@ extern void RelationCacheInitFilePreInvalidate(void);
 extern void RelationCacheInitFilePostInvalidate(void);
 extern void RelationCacheInitFileRemove(void);
 
+/*
+ * Mapping from relfilenode to oid
+ */
+extern  Oid RelidByRelfilenode(Oid reltablespace, Oid relfilenode);
+
+
 /* should be used only by relcache.c and catcache.c */
 extern bool criticalRelcachesBuilt;
 
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index e41b3d2..66d5684 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -75,6 +75,7 @@ enum SysCacheIdentifier
 	PROCNAMEARGSNSP,
 	PROCOID,
 	RANGETYPE,
+	RELFILENODE,
 	RELNAMENSP,
 	RELOID,
 	RULERELNAME,
-- 
1.8.3.251.g1462b67

4-wal_decoding-Add-pg_relation_by_filenode-to-lookup-u.patchtext/x-patch; charset=us-asciiDownload
>From 564907fdb433947ed6f596cf37e983788dfbdeab Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH] wal_decoding: Add pg_relation_by_filenode to lookup up a
 relation by (tablespace, filenode)

This requires the previously added RELFILENODE syscache and the added
RelationMapFilenodeToOid function added in previous two commits.
---
 doc/src/sgml/func.sgml                    | 23 ++++++++++++++++++++++-
 src/backend/utils/adt/dbsize.c            | 27 +++++++++++++++++++++++++++
 src/include/catalog/pg_proc.h             |  2 ++
 src/include/utils/builtins.h              |  2 ++
 src/test/regress/expected/alter_table.out | 18 ++++++++++++++++++
 src/test/regress/sql/alter_table.sql      | 14 ++++++++++++++
 6 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5765ddf..d25e796 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15739,7 +15739,7 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
 
    <para>
     The functions shown in <xref linkend="functions-admin-dblocation"> assist
-    in identifying the specific disk files associated with database objects.
+    in identifying the specific disk files associated with database objects or doing the reverse.
    </para>
 
    <indexterm>
@@ -15748,6 +15748,9 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
    <indexterm>
     <primary>pg_relation_filepath</primary>
    </indexterm>
+   <indexterm>
+    <primary>pg_relation_by_filenode</primary>
+   </indexterm>
 
    <table id="functions-admin-dblocation">
     <title>Database Object Location Functions</title>
@@ -15776,6 +15779,15 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
         File path name of the specified relation
        </entry>
       </row>
+      <row>
+       <entry>
+        <literal><function>pg_relation_by_filenode(<parameter>tablespace</parameter> <type>oid</type>, <parameter>filenode</parameter> <type>oid</type>)</function></literal>
+        </entry>
+       <entry><type>regclass</type></entry>
+       <entry>
+        Find the associated relation of a filenode
+       </entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
@@ -15799,6 +15811,15 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
     the relation.
    </para>
 
+   <para>
+    <function>pg_relation_by_filenode</> is the reverse of
+    <function>pg_relation_filenode</>. Given a <quote>tablespace</> OID and
+    a <quote>filenode</> it returns the associated relation. The default
+    tablespace for user tables can be replaced with 0. Check the
+    documentation of <function>pg_relation_filenode</> for an explanation why
+    this cannot always easily answered by querying <structname>pg_class</>.
+   </para>
+
   </sect2>
 
   <sect2 id="functions-admin-genfile">
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 34482ab..988a8ff 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -756,6 +756,33 @@ pg_relation_filenode(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Get the relation via (reltablespace, relfilenode)
+ *
+ * This is expected to be used when somebody wants to match an individual file
+ * on the filesystem back to its table. Thats not trivially possible via
+ * pg_class because that doesn't contain the relfilenodes of shared and nailed
+ * tables.
+ *
+ * We don't fail but return NULL if we cannot find a mapping.
+ *
+ * Instead of knowing DEFAULTTABLESPACE_OID you can pass 0.
+ */
+Datum
+pg_relation_by_filenode(PG_FUNCTION_ARGS)
+{
+	Oid			reltablespace = PG_GETARG_OID(0);
+	Oid			relfilenode = PG_GETARG_OID(1);
+	Oid			heaprel = InvalidOid;
+
+	heaprel = RelidByRelfilenode(reltablespace, relfilenode);
+
+	if (!OidIsValid(heaprel))
+		PG_RETURN_NULL();
+	else
+		PG_RETURN_OID(heaprel);
+}
+
+/*
  * Get the pathname (relative to $PGDATA) of a relation
  *
  * See comments for pg_relation_filenode.
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 90aff3d..3856f57 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3448,6 +3448,8 @@ DATA(insert OID = 2998 ( pg_indexes_size		PGNSP PGUID 12 1 0 0 0 f f f f t f v 1
 DESCR("disk space usage for all indexes attached to the specified table");
 DATA(insert OID = 2999 ( pg_relation_filenode	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 26 "2205" _null_ _null_ _null_ _null_ pg_relation_filenode _null_ _null_ _null_ ));
 DESCR("filenode identifier of relation");
+DATA(insert OID = 3454 ( pg_relation_by_filenode PGNSP PGUID 12 1 0 0 0 f f f f t f s 2 0 2205 "26 26" _null_ _null_ _null_ _null_ pg_relation_by_filenode _null_ _null_ _null_ ));
+DESCR("filenode identifier of relation");
 DATA(insert OID = 3034 ( pg_relation_filepath	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 25 "2205" _null_ _null_ _null_ _null_ pg_relation_filepath _null_ _null_ _null_ ));
 DESCR("file path of relation");
 
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 667c58b..ddbedea 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -459,8 +459,10 @@ extern Datum pg_size_pretty(PG_FUNCTION_ARGS);
 extern Datum pg_size_pretty_numeric(PG_FUNCTION_ARGS);
 extern Datum pg_table_size(PG_FUNCTION_ARGS);
 extern Datum pg_indexes_size(PG_FUNCTION_ARGS);
+extern Datum pg_relation_by_filenode(PG_FUNCTION_ARGS);
 extern Datum pg_relation_filenode(PG_FUNCTION_ARGS);
 extern Datum pg_relation_filepath(PG_FUNCTION_ARGS);
+extern Datum pg_relation_is_scannable(PG_FUNCTION_ARGS);
 
 /* genfile.c */
 extern bytea *read_binary_file(const char *filename,
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 18daf95..2dfe113 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -2305,3 +2305,21 @@ Check constraints:
 
 DROP TABLE alter2.tt8;
 DROP SCHEMA alter2;
+-- Check that we map relation oids to filenodes and back correctly.
+-- Don't display all the mappings so the test output doesn't change
+-- all the time, but make sure we actually do test some values.
+SELECT
+    SUM((mapped_oid != oid OR mapped_oid IS NULL)::int) incorrectly_mapped,
+    count(*) > 200 have_mappings
+FROM (
+    SELECT
+        oid, reltablespace, relfilenode, relname,
+        pg_relation_by_filenode(reltablespace, pg_relation_filenode(oid)) mapped_oid
+    FROM pg_class
+    WHERE relkind IN ('r', 'i', 'S', 't', 'm')
+    ) mapped;
+ incorrectly_mapped | have_mappings 
+--------------------+---------------
+                  0 | t
+(1 row)
+
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index dcf8121..12b1338 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -1544,3 +1544,17 @@ ALTER TABLE IF EXISTS tt8 SET SCHEMA alter2;
 
 DROP TABLE alter2.tt8;
 DROP SCHEMA alter2;
+
+-- Check that we map relation oids to filenodes and back correctly.
+-- Don't display all the mappings so the test output doesn't change
+-- all the time, but make sure we actually do test some values.
+SELECT
+    SUM((mapped_oid != oid OR mapped_oid IS NULL)::int) incorrectly_mapped,
+    count(*) > 200 have_mappings
+FROM (
+    SELECT
+        oid, reltablespace, relfilenode, relname,
+        pg_relation_by_filenode(reltablespace, pg_relation_filenode(oid)) mapped_oid
+    FROM pg_class
+    WHERE relkind IN ('r', 'i', 'S', 't', 'm')
+    ) mapped;
-- 
1.8.3.251.g1462b67

1-wal_decoding-Add-RelationMapFilenodeToOid-function-t.patchtext/x-patch; charset=us-asciiDownload
>From cbedebac6a8d449a5127befe1525230c2132e06f Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH] wal_decoding: Add RelationMapFilenodeToOid function to
 relmapper.c

This function maps (reltablespace, relfilenode) to the table oid and thus acts
as a reverse of RelationMapOidToFilenode.
---
 src/backend/utils/cache/relmapper.c | 53 +++++++++++++++++++++++++++++++++++++
 src/include/utils/relmapper.h       |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/src/backend/utils/cache/relmapper.c b/src/backend/utils/cache/relmapper.c
index 2c7d9f3..039aa29 100644
--- a/src/backend/utils/cache/relmapper.c
+++ b/src/backend/utils/cache/relmapper.c
@@ -180,6 +180,59 @@ RelationMapOidToFilenode(Oid relationId, bool shared)
 	return InvalidOid;
 }
 
+/* RelationMapFilenodeToOid
+ *
+ * Do the reverse of the normal direction of mapping done in
+ * RelationMapOidToFilenode.
+ *
+ * This is not supposed to be used during normal running but rather for
+ * information purposes when looking at the filesystem or the xlog.
+ *
+ * Returns InvalidOid if the OID is not know which can easily happen if the
+ * filenode is not of a relation that is nailed or shared or if it simply
+ * doesn't exists anywhere.
+ */
+Oid
+RelationMapFilenodeToOid(Oid filenode, bool shared)
+{
+	const RelMapFile *map;
+	int32		i;
+
+	/* If there are active updates, believe those over the main maps */
+	if (shared)
+	{
+		map = &active_shared_updates;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+		map = &shared_map;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+	}
+	else
+	{
+		map = &active_local_updates;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+		map = &local_map;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+	}
+
+	return InvalidOid;
+}
+
 /*
  * RelationMapUpdateMap
  *
diff --git a/src/include/utils/relmapper.h b/src/include/utils/relmapper.h
index 8f0b438..071bc98 100644
--- a/src/include/utils/relmapper.h
+++ b/src/include/utils/relmapper.h
@@ -36,6 +36,8 @@ typedef struct xl_relmap_update
 
 extern Oid	RelationMapOidToFilenode(Oid relationId, bool shared);
 
+extern Oid	RelationMapFilenodeToOid(Oid relationId, bool shared);
+
 extern void RelationMapUpdateMap(Oid relationId, Oid fileNode, bool shared,
 					 bool immediate);
 
-- 
1.8.3.251.g1462b67

2-Add-index-on-pg_class-reltablespace-relfilenode.patchtext/x-patch; charset=us-asciiDownload
>From fc6022fcc9ba8394069870b0b2b0e32a4a648c70 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 7 Jul 2013 18:38:56 +0200
Subject: [PATCH] Add index on pg_class(reltablespace, relfilenode)

Used by RelidByRelfilenode either via relfilenodemap.c or via a special
syscache.

Needs a CATVERSION bump.
---
 src/include/catalog/indexing.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 19268fb..4860e98 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -106,6 +106,8 @@ DECLARE_UNIQUE_INDEX(pg_class_oid_index, 2662, on pg_class using btree(oid oid_o
 #define ClassOidIndexId  2662
 DECLARE_UNIQUE_INDEX(pg_class_relname_nsp_index, 2663, on pg_class using btree(relname name_ops, relnamespace oid_ops));
 #define ClassNameNspIndexId  2663
+DECLARE_INDEX(pg_class_tblspc_relfilenode_index, 3455, on pg_class using btree(reltablespace oid_ops, relfilenode oid_ops));
+#define ClassTblspcRelfilenodeIndexId  3455
 
 DECLARE_UNIQUE_INDEX(pg_collation_name_enc_nsp_index, 3164, on pg_collation using btree(collname name_ops, collencoding int4_ops, collnamespace oid_ops));
 #define CollationNameEncNspIndexId 3164
-- 
1.8.3.251.g1462b67

#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#51)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> writes:

3b) Add catcache 'filter' that ensures the cache stays unique and use
that for the mapping

I slightly prefer 3b) because it's smaller, what's your opinions?

This is just another variation on the theme of kluging the catcache to
do something it shouldn't. You're still building a catcache on a
non-unique index, and that is going to lead to trouble.

(I'm a bit surprised that there is no Assert in catcache.c checking
that the index nominated to support a catcache is unique ...)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#52)
Re: changeset generation v5-01 - Patches & git tree

On 2013-07-07 15:43:17 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

3b) Add catcache 'filter' that ensures the cache stays unique and use
that for the mapping

I slightly prefer 3b) because it's smaller, what's your opinions?

This is just another variation on the theme of kluging the catcache to
do something it shouldn't. You're still building a catcache on a
non-unique index, and that is going to lead to trouble.

I don't think the lurking dangers really are present. The index
essentially *is* unique since we filter away anything non-unique. The
catcache code hardly can be confused by tuples it never sees. That would
even work if we started preloading catcaches by doing scans of the
entire underlying relation or by caching all of a page when reading one
of its tuples.

I can definitely see that there are "aesthetical" reasons against doing
3b), that's why I've also done 3a). So I'll chalk you up to voting for
that...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54Kevin Grittner
kgrittn@ymail.com
In reply to: Kevin Grittner (#24)
Re: changeset generation v5-01 - Patches & git tree

Sorry for the delay in reviewing this.  I must make sure never to
take another vacation during a commitfest -- the backlog upon
return is a killer....

Kevin Grittner <kgrittn@ymail.com> wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

Otherwise, could you try applying my git tree so we are sure we
test the same thing?

$ git remote add af git://git.postgresql.org/git/users/andresfreund/postgres.git
$ git fetch af
$ git checkout -b xlog-decoding af/xlog-decoding-rebasing-cf4
$ ./configure ...
$ make

Tried that, too, and problem persists.  The log shows the last
commit on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

The good news: the regression tests now work for me, and I'm back
on testing this at a high level.

The bad news:

(1)  The code checked out from that branch does not merge with
master.  Not surprisingly, given the recent commits, xlog.c is a
problem.  Is there another branch I should now be using?  If not,
please let me know when I can test with something that applies on
top of the master branch.

(2)  An initial performance test didn't look very good.  I will be
running a more controlled test to confirm but the logical
replication of a benchmark with a lot of UPDATEs of compressed text
values seemed to suffer with the logical replication turned on.
Any suggestions or comments on that front, before I run the more
controlled benchmarks?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55Andres Freund
andres@2ndquadrant.com
In reply to: Kevin Grittner (#54)
Re: changeset generation v5-01 - Patches & git tree

On 2013-07-10 12:21:23 -0700, Kevin Grittner wrote:

Sorry for the delay in reviewing this.� I must make sure never to
take another vacation during a commitfest -- the backlog upon
return is a killer....

Heh. Yes. Been through it before...

Kevin Grittner <kgrittn@ymail.com> wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

Otherwise, could you try applying my git tree so we are sure we
test the same thing?

$ git remote add af git://git.postgresql.org/git/users/andresfreund/postgres.git
$ git fetch af
$ git checkout -b xlog-decoding af/xlog-decoding-rebasing-cf4
$ ./configure ...
$ make

Tried that, too, and problem persists.� The log shows the last
commit on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

The good news: the regression tests now work for me, and I'm back
on testing this at a high level.

The bad news:

(1)� The code checked out from that branch does not merge with
master.� Not surprisingly, given the recent commits, xlog.c is a
problem.� Is there another branch I should now be using?� If not,
please let me know when I can test with something that applies on
top of the master branch.

That one is actually relatively easy to resolve. The mvcc catalog scan
patch is slightly harder. I've pushed an updated patch that fixes the
latter in a slightly not-so-nice way. I am not sure yet how the final
fix for that's going to look like, depends on whether we will get rid of
SnapshotNow alltogether...

I'll push my local tree with that fixed in a sec.

(2)� An initial performance test didn't look very good.� I will be
running a more controlled test to confirm but the logical
replication of a benchmark with a lot of UPDATEs of compressed text
values seemed to suffer with the logical replication turned on.
Any suggestions or comments on that front, before I run the more
controlled benchmarks?

Hm. There theoretically shouldn't actually be anything added in that
path. Could you roughly sketch what that test is doing? Do you actually
stream those changes out or did you just turn on wal_level=logical?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#55)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> wrote:

Kevin Grittner <kgrittn@ymail.com> wrote:

(2)  An initial performance test didn't look very good.  I will be
running a more controlled test to confirm but the logical
replication of a benchmark with a lot of UPDATEs of compressed text
values seemed to suffer with the logical replication turned on.
Any suggestions or comments on that front, before I run the more
controlled benchmarks?

Hm. There theoretically shouldn't actually be anything added in that
path. Could you roughly sketch what that test is doing? Do you actually
stream those changes out or did you just turn on wal_level=logical?

It was an update of a every row in a table of 720000 rows, with
each row updated by primary key using a separate UPDATE statement,
modifying a large text column with a lot of repeating characters
(so compressed well).  I got a timing on a master build and I got a
timing with the patch in the environment used by
test_logical_decoding.  It took several times as long in the latter
run, but it was very much a preliminary test in preparation for
getting real numbers.  (I'm sure you know how much work it is to
set up for a good run of tests.)  I'm not sure that (for example)
the synchronous_commit setting was the same, which could matter a
lot.  I wouldn't put a lot of stock in it until I can re-create it
under a much more controlled test.

The one thing about the whole episode that gave me pause was that
the compression and decompression routines were very high on the
`perf top` output in the patched run and way down the list on the
run based on master.  I don't have a ready explanation for that,
unless your branch was missing a recent commit for speeding
compression which was present on master.  It might be worth
checking that you're not detoasting more often than you need.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57Andres Freund
andres@2ndquadrant.com
In reply to: Kevin Grittner (#56)
Re: changeset generation v5-01 - Patches & git tree

On 2013-07-10 15:14:58 -0700, Kevin Grittner wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

Kevin Grittner <kgrittn@ymail.com> wrote:

(2)� An initial performance test didn't look very good.� I will be
running a more controlled test to confirm but the logical
replication of a benchmark with a lot of UPDATEs of compressed text
values seemed to suffer with the logical replication turned on.
Any suggestions or comments on that front, before I run the more
controlled benchmarks?

Hm. There theoretically shouldn't actually be anything added in that
path. Could you roughly sketch what that test is doing? Do you actually
stream those changes out or did you just turn on wal_level=logical?

It was an update of a every row in a table of 720000 rows, with
each row updated by primary key using a separate UPDATE statement,
modifying a large text column with a lot of repeating characters
(so compressed well).� I got a timing on a master build and I got a
timing with the patch in the environment used by
test_logical_decoding.� It took several times as long in the latter
run, but it was very much a preliminary test in preparation for
getting real numbers.� (I'm sure you know how much work it is to
set up for a good run of tests.)� I'm not sure that (for example)
the synchronous_commit setting was the same, which could matter a
lot.� I wouldn't put a lot of stock in it until I can re-create it
under a much more controlled test.

So you didn't explicitly start anything to consume those changes?
I.e. using pg_receivellog or SELECT * FROM
start/init_logical_replication(...)?

Any chance there still was an old replication slot around?
SELECT * FROM pg_stat_logical_decoding;
should show them. But theoretically the make check in
test_logical_decoding should finish without one active...

The one thing about the whole episode that gave me pause was that
the compression and decompression routines were very high on the
`perf top` output in the patched run and way down the list on the
run based on master.

That's interesting. Unless there's something consuming the changestream
and the output plugin does something that actually requests
decompression of the Datums there shouldn't be *any* added/removed calls
to toast (de-)compression...
While consuming the changes there could be ReorderBufferToast* calls in
the profile. I haven't yet seem them in profiles, but that's not saying
all that much.

So:

I don't have a ready explanation for that, unless your branch was
missing a recent commit for speeding compression which was present on
master.

It didn't have 031cc55bbea6b3a6b67c700498a78fb1d4399476 - but I can't
really imagine that making *such* a big difference. But maybe you hit
some sweet spot with the data?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#57)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund <andres@2ndquadrant.com> wrote:

Any chance there still was an old replication slot around?

It is quite likely that there was.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#53)
Re: changeset generation v5-01 - Patches & git tree

On Sun, Jul 7, 2013 at 4:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-07-07 15:43:17 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

3b) Add catcache 'filter' that ensures the cache stays unique and use
that for the mapping

I slightly prefer 3b) because it's smaller, what's your opinions?

This is just another variation on the theme of kluging the catcache to
do something it shouldn't. You're still building a catcache on a
non-unique index, and that is going to lead to trouble.

I don't think the lurking dangers really are present. The index
essentially *is* unique since we filter away anything non-unique. The
catcache code hardly can be confused by tuples it never sees. That would
even work if we started preloading catcaches by doing scans of the
entire underlying relation or by caching all of a page when reading one
of its tuples.

I can definitely see that there are "aesthetical" reasons against doing
3b), that's why I've also done 3a). So I'll chalk you up to voting for
that...

I also vote for (3a). I did a quick once over of 1, 2, and 3a and
they look reasonable. Barring strenuous objections, I'd like to go
ahead and commit these, or perhaps an updated version of them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#59)
Re: changeset generation v5-01 - Patches & git tree

On Tue, Jul 16, 2013 at 9:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jul 7, 2013 at 4:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-07-07 15:43:17 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

3b) Add catcache 'filter' that ensures the cache stays unique and use
that for the mapping

I slightly prefer 3b) because it's smaller, what's your opinions?

This is just another variation on the theme of kluging the catcache to
do something it shouldn't. You're still building a catcache on a
non-unique index, and that is going to lead to trouble.

I don't think the lurking dangers really are present. The index
essentially *is* unique since we filter away anything non-unique. The
catcache code hardly can be confused by tuples it never sees. That would
even work if we started preloading catcaches by doing scans of the
entire underlying relation or by caching all of a page when reading one
of its tuples.

I can definitely see that there are "aesthetical" reasons against doing
3b), that's why I've also done 3a). So I'll chalk you up to voting for
that...

I also vote for (3a). I did a quick once over of 1, 2, and 3a and
they look reasonable. Barring strenuous objections, I'd like to go
ahead and commit these, or perhaps an updated version of them.

Hearing no objections, I have done this. Per off-list discussion with
Andres, I also included patch 4, which gives us regression test
coverage for this code, and have fixed a few bugs and a bunch of
stylistic things that bugged me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#2)
Re: changeset generation v5-01 - Patches & git tree

On Fri, Jun 14, 2013 at 6:51 PM, Andres Freund <andres@2ndquadrant.com> wrote:

The git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4

On 2013-06-15 00:48:17 +0200, Andres Freund wrote:

Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains
the tests for the feature. Uses 0014
0017: Snapshot building docs; not required

I've now also committed patch #7 from this series. My earlier commit
fulfilled the needs of patches #3, #4, and #5; and somewhat longer ago
I committed #1. I am not entirely convinced of the necessity or
desirability of patch #6, but as of now I haven't studied the issues
closely. Patch #2 does not seem useful in isolation; it adds new
regression-testing stuff but doesn't use it anywhere.

I doubt that any of the remaining patches (#8-#17) can be applied
separately without understanding the shape of the whole patch set, so
I think I, or someone else, will need to set aside more time for
detailed review before proceeding further with this patch set. I
suggest that we close out the CommitFest entry for this patch set one
way or another, as there is no way we're going to get the whole thing
done under the auspices of CF1.

I'll try to find some more time to spend on this relatively soon, but
I think this is about as far as I can take this today.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#61)
Re: changeset generation v5-01 - Patches & git tree

On 2013-07-22 13:50:08 -0400, Robert Haas wrote:

On Fri, Jun 14, 2013 at 6:51 PM, Andres Freund <andres@2ndquadrant.com> wrote:

The git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4

On 2013-06-15 00:48:17 +0200, Andres Freund wrote:

Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains
the tests for the feature. Uses 0014
0017: Snapshot building docs; not required

I've now also committed patch #7 from this series. My earlier commit
fulfilled the needs of patches #3, #4, and #5; and somewhat longer ago
I committed #1.

Thanks!

I am not entirely convinced of the necessity or
desirability of patch #6, but as of now I haven't studied the issues
closely.

Fair enough. It's certainly possible to work around not having it, but
it seems cleaner to introduce the notion of an invalid CommandId like we
have for transaction ids et al.
Allowing 2^32-2 instead of 2^32-1 subtransactions doesn't seem like a
problem to me ;)

Patch #2 does not seem useful in isolation; it adds new
regression-testing stuff but doesn't use it anywhere.

Yes. I found it useful to test stuff around making replication
synchronous or such, but while I think we should have a facility like it
in core for both, logical and physical replication, I don't think this
patch is ready for prime time due to it's busy looping. I've even marked
it as such above ;)
My first idea to properly implement that seems to be to reuse the
syncrep infrastructure but that doesn't look trivial.

I doubt that any of the remaining patches (#8-#17) can be applied
separately without understanding the shape of the whole patch set, so
I think I, or someone else, will need to set aside more time for
detailed review before proceeding further with this patch set. I
suggest that we close out the CommitFest entry for this patch set one
way or another, as there is no way we're going to get the whole thing
done under the auspices of CF1.

Generally agreed. The biggest chunk of the code is in #13 anyway...

Some may be applyable independently:

0010: Log xl_running_xact regularly in the bgwriter: required

Should be useful independently since it can significantly speed up
startup of physical replicas. Ony many systems checkpoint_timeout
will be set to an hour which can make the time till a standby gets
consistent be quite high since that will be first time it sees a
xl_running_xacts again.

0011: make fsync_fname() public; required, needs to be in a different file

Isn't in the shape for it atm, but could be applied as an
independent infrastructure patch. And it should be easy enough to
clean it up.

0012: Relcache support for an Relation's primary key: required

Might actually be a good idea independently as well. E.g. the
materalized key patch could use the information that there's a
candidate key around to avoid a good bit of useless work.

I'll try to find some more time to spend on this relatively soon, but
I think this is about as far as I can take this today.

Was pretty helpful already, so ... ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#2)
Re: changeset generation v5-01 - Patches & git tree

Andres Freund wrote:

The git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4

I gave this recently rebased branch a skim. In general, the separation
between decode.c/reorderbuffer.c/snapbuild.c seems a lot nicer now than
on previous iterations -- good job there.

Here are some quick notes I took while reading the patch itself. I
haven't gone through it really carefully, yet.

- I wonder whether DecodeCommit and DecodeAbort should really be a single
routine. Right now, the former might call the later; and the latter is
aware of this. Seems awkward.

- We skip insert/update/delete if not my database Id; however, we don't skip
commit in the same case. If there are two walrecvrs on a cluster, on
different databases, does this lead to us trying to remove files
twice, if a xact commits which deleted some files? Is this a problem?
Should we try to skip such database-specific actions in global
WAL records?

- There's rmgr-specific knowledge in decode.c. I wonder if, similar to
redo and desc routines, that shouldn't instead be pluggable functions
for each rmgr.

- What's with ReorderBufferRestoreCleanup()? Shouldn't it be in logical.c?

- reorderbuffer.c does several different things. Can it be split?
Perhaps in pieces such as
* stuff to manage memory (slab cache thingies)
* TXN iterator
* other logically separate parts?
* the rest

- Having to expose LocalExecuteInvalidationMessage() looks awkward. Is there
another way?

- I think we need a better name for "treat_as_catalog_table" (and
RelationIsTreatedAsCatalogTable). Maybe replication_catalog or
something similar?

- Don't do this:
  + * RecentGlobal(Data)?Xmin is initialized to InvalidTransactionId, to ensure that no
  because later greps for RecentGlobalDataXmin and RecentGlobalXmin will
  fail to find it.  It seems better to spell both names, so
  "RecentGlobalDataXmin and RecentGlobalXmin are initialized to ..."

- the pg_receivellog command line is strange. Apparently I need one or
more of --start,--init,--stop, but if stop, then the other two must
not be present; and if startpos, then init and stop cannot be
specified. (There's a typo there that says "cannot combine with
--start" when it really means "cannot combine with --stop", BTW). I
think this would make more sense to have init,start,stop be commands,
in pg_ctl's spirit; so there would be no double-dash. IOW
SOMEPATH/pg_receivellog --startpos=123 start
and so on. Also, we need SGML docs for this new utility.

Any particular reason for removing this line:
-/* Get a new XLogReader */
+
extern XLogReaderState *XLogReaderAllocate(XLogPageReadCB pagereadfunc,
void *private_data);

Typo here (2n*d*Quadrant):
+= Snapshot Building =
+:author: Andres Freund, 2nQuadrant Ltd

I don't see the point of XLogRecordBuffer.record_data; we already have a
pointer to the XLogRecord, and the data can readily be obtained using
XLogRecGetData. So why provide the same thing twice? It seems to me
that if instead of passing the XLogRecordBuffer we just provide the
XLogRecord, and separately the "origptr" where needed, we could avoid
having to expose the XLogRecordBuffer stuff unnecessarily.

In this comment:
+ * FIXME: We need something resembling the real SnapshotNow to handle things
+ * like enum lookups from indices correctly.
what do we need consider in light of the new comment proposed by Robert
CA+TgmobvTjRj_doXxQ0wgA1a1JLYPVYqtR3m+Cou_ousabnmXg@mail.gmail.com

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64Andres Freund
andres@2ndquadrant.com
In reply to: Alvaro Herrera (#63)
Re: changeset generation v5-01 - Patches & git tree

On 2013-08-27 11:32:30 -0400, Alvaro Herrera wrote:

Andres Freund wrote:

The git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4

I gave this recently rebased branch a skim. In general, the separation
between decode.c/reorderbuffer.c/snapbuild.c seems a lot nicer now than
on previous iterations -- good job there.

Thanks for having a look!

Here are some quick notes I took while reading the patch itself. I
haven't gone through it really carefully, yet.

- I wonder whether DecodeCommit and DecodeAbort should really be a single
routine. Right now, the former might call the later; and the latter is
aware of this. Seems awkward.

Yes, I am not happy with that either. I'll play with combining them and
check whether that looks beter.

- We skip insert/update/delete if not my database Id; however, we don't skip
commit in the same case. If there are two walrecvrs on a cluster, on
different databases, does this lead to us trying to remove files
twice, if a xact commits which deleted some files? Is this a problem?
Should we try to skip such database-specific actions in global
WAL records?

Hm. We should be able to skip it for long commit records at least. I
think I lost that code along the unification.

There's no danger of removing anything global afaics since we're not
replaying using the original replay routines and all the slot/sender
specific stuff has unique names.

- There's rmgr-specific knowledge in decode.c. I wonder if, similar to
redo and desc routines, that shouldn't instead be pluggable functions
for each rmgr.

I don't think that's a good idea. I've quickly played with it before and
it doesn't seem to end happy. It would require opening up more
semi-public interfaces and in the end, we're only interested of in-core
stuff. Even if it were possible to add new indexes by plugging new
rmgrs, we wouldn't care.

- What's with ReorderBufferRestoreCleanup()? Shouldn't it be in logical.c?

No, that's just for removing ondisk data at the end of a
transaction. I'll improve the comment.

- reorderbuffer.c does several different things. Can it be split?
Perhaps in pieces such as
* stuff to manage memory (slab cache thingies)
* TXN iterator
* other logically separate parts?
* the rest

Hm. I don't really see much point in splitting it along those
lines. None of those really makes sense without the other parts and the
file isn't *that* huge.

- Having to expose LocalExecuteInvalidationMessage() looks awkward. Is there
another way?

Hm. I don't immediately see any way. We need to execute invalidation
messages just within one backend. There just is no exposed functionality
for that yet since it wasn't needed so far. We could expose something
like LocalExecuteInvalidationMessage*s*() instead of doing the loop in
reorderbuffer.c, but that's about it.

- I think we need a better name for "treat_as_catalog_table" (and
RelationIsTreatedAsCatalogTable). Maybe replication_catalog or
something similar?

I think we're going to end up needing that for more than just
replication, so I'd like to keep replication out of the name. I don't
like the current name either though, so any other ideas?

- Don't do this:
+ * RecentGlobal(Data)?Xmin is initialized to InvalidTransactionId, to ensure that no
because later greps for RecentGlobalDataXmin and RecentGlobalXmin will
fail to find it.  It seems better to spell both names, so
"RecentGlobalDataXmin and RecentGlobalXmin are initialized to ..."

Ok.

- the pg_receivellog command line is strange. Apparently I need one or
more of --start,--init,--stop, but if stop, then the other two must
not be present; and if startpos, then init and stop cannot be
specified. (There's a typo there that says "cannot combine with
--start" when it really means "cannot combine with --stop", BTW). I
think this would make more sense to have init,start,stop be commands,
in pg_ctl's spirit; so there would be no double-dash. IOW
SOMEPATH/pg_receivellog --startpos=123 start
and so on.

The reasoning here is somewhat complex and I am not happy with the
status quo, so I like getting input here.

The individual verbs mean:
* init: create a replication slot
* start: continue streaming in an existing replication slot
* stop: remove replication slot

The reason you cannot specify anything with --stop is that a) --start
streams until you abort the utility. So there's no chance of running
--stop after it. b) --init and --stop seems like a pointless combination
since you cannot actually do anything with the slot.
--init and --start combined, on the other hand are useful for testing,
which is why I allow them so far, but I wouldn't have problems removing
that capability.

The reason you cannot combine --init or --init --start with --startpos
is that --startpos has to refer to a location that could have actually
streamed to the client. Before a replication slot is established the
client doesn't know anything about such an address, so --init --start
cannot know any useful --startpos, that's why it's forbidden to pass
one.

The idea behind startpos is that you can tell the server "I have
replayed transactions up to this LSN" and the server will only give you
only transactions that have commited after this.

Also, we need SGML docs for this new utility.

And a lot more than only for this utility :(

Any particular reason for removing this line:
-/* Get a new XLogReader */
+
extern XLogReaderState *XLogReaderAllocate(XLogPageReadCB pagereadfunc,
void *private_data);

Hrmpf. Merge error. I've integrated too many different versions of too
different xlogreaders ;)

I don't see the point of XLogRecordBuffer.record_data; we already have a
pointer to the XLogRecord, and the data can readily be obtained using
XLogRecGetData. So why provide the same thing twice? It seems to me
that if instead of passing the XLogRecordBuffer we just provide the
XLogRecord, and separately the "origptr" where needed, we could avoid
having to expose the XLogRecordBuffer stuff unnecessarily.

By now we also need the end location of a wal record. So we have to pass
three addresses around for everything which isn't very convenient. If
you vastly prefer passing around three parameters I can do that, but I'd
rather not.
The original reason for doing so was, to be honest, that my own
xlogreader's API was different...

In this comment:
+ * FIXME: We need something resembling the real SnapshotNow to handle things
+ * like enum lookups from indices correctly.
what do we need consider in light of the new comment proposed by Robert
CA+TgmobvTjRj_doXxQ0wgA1a1JLYPVYqtR3m+Cou_ousabnmXg@mail.gmail.com

I did most of the code changes for this, but this made me realize that
there are quite some more comments and even a function name to be
adapted. Will work on that.

Thanks!

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#1)
5 attachment(s)
Re: logical changeset generation v5

Hi,

I've attached a couple of the preliminary patches to $subject which I've
recently cleaned up in the hope that we can continue improving on those
in a piecemal fashion.
I am preparing submission of a newer version of the major patch but
unfortunately progress on that is slower than I'd like...

In the order of chance of applying them individuall they are:

0005 wal_decoding: Log xl_running_xact's at a higher frequency than checkpoints are done
* benefits hot standby startup
0003 wal_decoding: Allow walsender's to connect to a specific database
* biggest problem is how to specify the connection we connect
to. Currently with the patch walsender connects to a database if it's
not named "replication" (via dbname). Perhaps it's better to invent a
replication_dbname parameter?
0006 wal_decoding: copydir: move fsync_fname to fd.[c.h] and make it public
* Pretty trivial and boring.
0007 wal_decoding: Add information about a tables primary key to struct RelationData
* Could be used in the matview refresh code
0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0006-wal_decoding-copydir-move-fsync_fname-to-fd.-c.h-and.patchtext/x-patch; charset=us-asciiDownload
>From 80cc2aafde8f513fea61e6ab6898b7e6b3627d8d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 06/13] wal_decoding: copydir: move fsync_fname to fd.[c.h] and
 make it public

---
 src/backend/storage/file/copydir.c | 59 --------------------------------------
 src/backend/storage/file/fd.c      | 56 ++++++++++++++++++++++++++++++++++++
 src/include/storage/fd.h           |  1 +
 3 files changed, 57 insertions(+), 59 deletions(-)

diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 391359c..427a0df 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -27,9 +27,6 @@
 #include "miscadmin.h"
 
 
-static void fsync_fname(char *fname, bool isdir);
-
-
 /*
  * copydir: copy a directory
  *
@@ -207,59 +204,3 @@ copy_file(char *fromfile, char *tofile)
 
 	pfree(buffer);
 }
-
-
-/*
- * fsync a file
- *
- * Try to fsync directories but ignore errors that indicate the OS
- * just doesn't allow/require fsyncing directories.
- */
-static void
-fsync_fname(char *fname, bool isdir)
-{
-	int			fd;
-	int			returncode;
-
-	/*
-	 * Some OSs require directories to be opened read-only whereas other
-	 * systems don't allow us to fsync files opened read-only; so we need both
-	 * cases here
-	 */
-	if (!isdir)
-		fd = OpenTransientFile(fname,
-							   O_RDWR | PG_BINARY,
-							   S_IRUSR | S_IWUSR);
-	else
-		fd = OpenTransientFile(fname,
-							   O_RDONLY | PG_BINARY,
-							   S_IRUSR | S_IWUSR);
-
-	/*
-	 * Some OSs don't allow us to open directories at all (Windows returns
-	 * EACCES)
-	 */
-	if (fd < 0 && isdir && (errno == EISDIR || errno == EACCES))
-		return;
-
-	else if (fd < 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not open file \"%s\": %m", fname)));
-
-	returncode = pg_fsync(fd);
-
-	/* Some OSs don't allow us to fsync directories at all */
-	if (returncode != 0 && isdir && errno == EBADF)
-	{
-		CloseTransientFile(fd);
-		return;
-	}
-
-	if (returncode != 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not fsync file \"%s\": %m", fname)));
-
-	CloseTransientFile(fd);
-}
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 436b901..de4d902 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -385,6 +385,62 @@ pg_flush_data(int fd, off_t offset, off_t amount)
 
 
 /*
+ * fsync_fname -- fsync a file or directory, handling errors properly
+ *
+ * Try to fsync a file or directory. When doing the latter, ignore errors that
+ * indicate the OS just doesn't allow/require fsyncing directories.
+ */
+void
+fsync_fname(char *fname, bool isdir)
+{
+	int			fd;
+	int			returncode;
+
+	/*
+	 * Some OSs require directories to be opened read-only whereas other
+	 * systems don't allow us to fsync files opened read-only; so we need both
+	 * cases here
+	 */
+	if (!isdir)
+		fd = OpenTransientFile(fname,
+							   O_RDWR | PG_BINARY,
+							   S_IRUSR | S_IWUSR);
+	else
+		fd = OpenTransientFile(fname,
+							   O_RDONLY | PG_BINARY,
+							   S_IRUSR | S_IWUSR);
+
+	/*
+	 * Some OSs don't allow us to open directories at all (Windows returns
+	 * EACCES)
+	 */
+	if (fd < 0 && isdir && (errno == EISDIR || errno == EACCES))
+		return;
+
+	else if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", fname)));
+
+	returncode = pg_fsync(fd);
+
+	/* Some OSs don't allow us to fsync directories at all */
+	if (returncode != 0 && isdir && errno == EBADF)
+	{
+		CloseTransientFile(fd);
+		return;
+	}
+
+	if (returncode != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync file \"%s\": %m", fname)));
+
+	CloseTransientFile(fd);
+}
+
+
+/*
  * InitFileAccess --- initialize this module during backend startup
  *
  * This is called during either normal or standalone backend start.
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 90b4933..2a60229 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -113,6 +113,7 @@ extern int	pg_fsync_no_writethrough(int fd);
 extern int	pg_fsync_writethrough(int fd);
 extern int	pg_fdatasync(int fd);
 extern int	pg_flush_data(int fd, off_t offset, off_t amount);
+extern void fsync_fname(char *fname, bool isdir);
 
 /* Filename components for OpenTemporaryFile */
 #define PG_TEMP_FILES_DIR "pgsql_tmp"
-- 
1.8.3.251.g1462b67

0007-wal_decoding-Add-information-about-a-tables-primary-.patchtext/x-patch; charset=us-asciiDownload
>From e6b541f7b11ac77fbb9afaf4a0f961057d2f25b1 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 07/13] wal_decoding: Add information about a tables primary
 key to struct RelationData

'rd_primary' now contains the Oid of an index over uniquely identifying
columns. Several types of indexes are interesting and are collected in that
order:
* Primary Key
* oid index
* the first (OID order) unique, immediate, non-partial and
  non-expression index over one or more NOT NULL'ed columns

To gather rd_primary value RelationGetIndexList() needs to have been called.

This is helpful because for logical replication we frequently - on the sending
and receiving side - need to lookup that index and RelationGetIndexList already
gathers all the necessary information.

This could be used to replace tablecmd.c's transformFkeyGetPrimaryKey, but
would change the meaning of that, so it seems to require additional discussion.
---
 src/backend/utils/cache/relcache.c | 52 +++++++++++++++++++++++++++++++++++---
 src/include/utils/rel.h            | 12 +++++++++
 2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 66fb63b..c588c29 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3465,7 +3465,9 @@ RelationGetIndexList(Relation relation)
 	ScanKeyData skey;
 	HeapTuple	htup;
 	List	   *result;
-	Oid			oidIndex;
+	Oid			oidIndex = InvalidOid;
+	Oid			pkeyIndex = InvalidOid;
+	Oid			candidateIndex = InvalidOid;
 	MemoryContext oldcxt;
 
 	/* Quick exit if we already computed the list. */
@@ -3522,17 +3524,61 @@ RelationGetIndexList(Relation relation)
 		Assert(!isnull);
 		indclass = (oidvector *) DatumGetPointer(indclassDatum);
 
+		if (!IndexIsValid(index))
+			continue;
+
 		/* Check to see if it is a unique, non-partial btree index on OID */
-		if (IndexIsValid(index) &&
-			index->indnatts == 1 &&
+		if (index->indnatts == 1 &&
 			index->indisunique && index->indimmediate &&
 			index->indkey.values[0] == ObjectIdAttributeNumber &&
 			indclass->values[0] == OID_BTREE_OPS_OID &&
 			heap_attisnull(htup, Anum_pg_index_indpred))
 			oidIndex = index->indexrelid;
+
+		if (index->indisunique &&
+			index->indimmediate &&
+			heap_attisnull(htup, Anum_pg_index_indpred))
+		{
+			/* always prefer primary keys */
+			if (index->indisprimary)
+				pkeyIndex = index->indexrelid;
+			else if (!OidIsValid(pkeyIndex)
+					&& !OidIsValid(oidIndex)
+					&& !OidIsValid(candidateIndex))
+			{
+				int key;
+				bool found = true;
+				for (key = 0; key < index->indnatts; key++)
+				{
+					int16 attno = index->indkey.values[key];
+					Form_pg_attribute attr;
+					/* internal column, like oid */
+					if (attno <= 0)
+						continue;
+
+					attr = relation->rd_att->attrs[attno - 1];
+					if (!attr->attnotnull)
+					{
+						found = false;
+						break;
+					}
+				}
+				if (found)
+					candidateIndex = index->indexrelid;
+			}
+		}
 	}
 
 	systable_endscan(indscan);
+
+	if (OidIsValid(pkeyIndex))
+		relation->rd_primary = pkeyIndex;
+	/* prefer oid indexes over normal candidate ones */
+	else if (OidIsValid(oidIndex))
+		relation->rd_primary = oidIndex;
+	else if (OidIsValid(candidateIndex))
+		relation->rd_primary = candidateIndex;
+
 	heap_close(indrel, AccessShareLock);
 
 	/* Now save a copy of the completed list in the relcache entry. */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 589c9a8..0281b4b 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -111,6 +111,18 @@ typedef struct RelationData
 	TriggerDesc *trigdesc;		/* Trigger info, or NULL if rel has none */
 
 	/*
+	 * The 'best' primary or candidate key that has been found, only set
+	 * correctly if RelationGetIndexList has been called/rd_indexvalid > 0.
+	 *
+	 * Indexes are chosen in the following order:
+	 * * Primary Key
+	 * * oid index
+	 * * the first (OID order) unique, immediate, non-partial and
+	 *   non-expression index over one or more NOT NULL'ed columns
+	 */
+	Oid rd_primary;
+
+	/*
 	 * rd_options is set whenever rd_rel is loaded into the relcache entry.
 	 * Note that you can NOT look into rd_rel for this data.  NULL means "use
 	 * defaults".
-- 
1.8.3.251.g1462b67

0002-wal_decoding-Introduce-InvalidCommandId-and-declare-.patchtext/x-patch; charset=us-asciiDownload
>From 3442c3a4e44c5a64efbe651b745a6f86f69cfdab Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 02/13] wal_decoding: Introduce InvalidCommandId and declare
 that to be the new maximum for CommandCounterIncrement

This is useful to be able to represent a CommandId thats invalid. There was no
such value before.

This decreases the possible number of subtransactions by one which seems
unproblematic. Its also not a problem for pg_upgrade because cmin/cmax are
never looked at outside the context of their own transaction (spare timetravel
access, but thats new anyway).
---
 src/backend/access/transam/xact.c | 4 ++--
 src/include/c.h                   | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 31e868d..0591f3f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -766,12 +766,12 @@ CommandCounterIncrement(void)
 	if (currentCommandIdUsed)
 	{
 		currentCommandId += 1;
-		if (currentCommandId == FirstCommandId) /* check for overflow */
+		if (currentCommandId == InvalidCommandId)
 		{
 			currentCommandId -= 1;
 			ereport(ERROR,
 					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-					 errmsg("cannot have more than 2^32-1 commands in a transaction")));
+					 errmsg("cannot have more than 2^32-2 commands in a transaction")));
 		}
 		currentCommandIdUsed = false;
 
diff --git a/src/include/c.h b/src/include/c.h
index 5961183..14bfdcd 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -368,6 +368,7 @@ typedef uint32 MultiXactOffset;
 typedef uint32 CommandId;
 
 #define FirstCommandId	((CommandId) 0)
+#define InvalidCommandId	(~(CommandId)0)
 
 /*
  * Array indexing support
-- 
1.8.3.251.g1462b67

0003-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patchtext/x-patch; charset=us-asciiDownload
>From ac48fc2f5c5f0031494cfabb0bca46f0bbef47d2 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 03/13] wal_decoding: Allow walsender's to connect to a
 specific database

Currently the decision whether to connect to a database or not is made by
checking whether the passed "dbname" parameter is "replication". Unfortunately
this makes it impossible to connect a to a database named
replication... Possibly it would be better to use a separate connection
parameter like replication_dbname=xxx?

This is useful for future walsender commands which need database interaction.
---
 doc/src/sgml/protocol.sgml                         |  5 +++-
 src/backend/postmaster/postmaster.c                | 13 +++++++++--
 .../libpqwalreceiver/libpqwalreceiver.c            |  4 ++--
 src/backend/replication/walsender.c                | 27 ++++++++++++++++++----
 src/backend/utils/init/postinit.c                  |  5 ++++
 src/bin/pg_basebackup/pg_basebackup.c              |  4 ++--
 src/bin/pg_basebackup/pg_receivexlog.c             |  4 ++--
 src/bin/pg_basebackup/receivelog.c                 |  4 ++--
 8 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0b2e60e..51b4435 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1304,7 +1304,10 @@ To initiate streaming replication, the frontend sends the
 <literal>replication</> parameter in the startup message. This tells the
 backend to go into walsender mode, wherein a small set of replication commands
 can be issued instead of SQL statements. Only the simple query protocol can be
-used in walsender mode.
+used in walsender mode. A <literal>dbname</> of <literal>replication</> will
+start a walsender not connected to any database, specifying any other database
+will connect to that. Connecting to a specific database is only required for
+logical replication so far.
 
 The commands accepted in walsender mode are:
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 01d2618..7520e42 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1983,9 +1983,18 @@ retry1:
 	if (strlen(port->user_name) >= NAMEDATALEN)
 		port->user_name[NAMEDATALEN - 1] = '\0';
 
-	/* Walsender is not related to a particular database */
-	if (am_walsender)
+	/*
+	 * Generic walsender, e.g. for streaming replication, is not connected to a
+	 * particular database. But walsenders used for logical replication need to
+	 * connect to a specific database. Unfortunately the initial choices for
+	 * distinguishing normal connections from replication connections included
+	 * dbname=replication being specified for the latter.
+	 * We now assume that a database name
+	 */
+	if (am_walsender && strcmp(port->database_name, "replication") == 0)
 		port->database_name[0] = '\0';
+	else if (am_walsender)
+		elog(DEBUG1, "WAL sender attaching to database %s", port->database_name);
 
 	/*
 	 * Done putting stuff in TopMemoryContext.
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6bc0aa1..ee0f1fe 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -130,7 +130,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
 						"the primary server: %s",
 						PQerrorMessage(streamConn))));
 	}
-	if (PQnfields(res) != 3 || PQntuples(res) != 1)
+	if (PQnfields(res) != 4 || PQntuples(res) != 1)
 	{
 		int			ntuples = PQntuples(res);
 		int			nfields = PQnfields(res);
@@ -138,7 +138,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
 		PQclear(res);
 		ereport(ERROR,
 				(errmsg("invalid response from primary server"),
-				 errdetail("Expected 1 tuple with 3 fields, got %d tuples with %d fields.",
+				 errdetail("Expected 1 tuple with 4 fields, got %d tuples with %d fields.",
 						   ntuples, nfields)));
 	}
 	primary_sysid = PQgetvalue(res, 0, 0);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index eae6e59..f6463fc 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -47,6 +47,7 @@
 #include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_type.h"
+#include "commands/dbcommands.h"
 #include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -243,10 +244,12 @@ IdentifySystem(void)
 	char		tli[11];
 	char		xpos[MAXFNAMELEN];
 	XLogRecPtr	logptr;
+	char*        dbname = NULL;
 
 	/*
-	 * Reply with a result set with one row, three columns. First col is
-	 * system ID, second is timeline ID, and third is current xlog location.
+	 * Reply with a result set with one row, four columns. First col is system
+	 * ID, second is timeline ID, third is current xlog location and the fourth
+	 * contains the database name if we are connected to one.
 	 */
 
 	snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
@@ -265,9 +268,14 @@ IdentifySystem(void)
 
 	snprintf(xpos, sizeof(xpos), "%X/%X", (uint32) (logptr >> 32), (uint32) logptr);
 
+	if (MyDatabaseId != InvalidOid)
+		dbname = get_database_name(MyDatabaseId);
+	else
+		dbname = "(none)";
+
 	/* Send a RowDescription message */
 	pq_beginmessage(&buf, 'T');
-	pq_sendint(&buf, 3, 2);		/* 3 fields */
+	pq_sendint(&buf, 4, 2);		/* 4 fields */
 
 	/* first field */
 	pq_sendstring(&buf, "systemid");	/* col name */
@@ -295,17 +303,28 @@ IdentifySystem(void)
 	pq_sendint(&buf, -1, 2);
 	pq_sendint(&buf, 0, 4);
 	pq_sendint(&buf, 0, 2);
+
+	/* fourth field */
+	pq_sendstring(&buf, "dbname");
+	pq_sendint(&buf, 0, 4);
+	pq_sendint(&buf, 0, 2);
+	pq_sendint(&buf, TEXTOID, 4);
+	pq_sendint(&buf, -1, 2);
+	pq_sendint(&buf, 0, 4);
+	pq_sendint(&buf, 0, 2);
 	pq_endmessage(&buf);
 
 	/* Send a DataRow message */
 	pq_beginmessage(&buf, 'D');
-	pq_sendint(&buf, 3, 2);		/* # of columns */
+	pq_sendint(&buf, 4, 2);		/* # of columns */
 	pq_sendint(&buf, strlen(sysid), 4); /* col1 len */
 	pq_sendbytes(&buf, (char *) &sysid, strlen(sysid));
 	pq_sendint(&buf, strlen(tli), 4);	/* col2 len */
 	pq_sendbytes(&buf, (char *) tli, strlen(tli));
 	pq_sendint(&buf, strlen(xpos), 4);	/* col3 len */
 	pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
+	pq_sendint(&buf, strlen(dbname), 4);	/* col4 len */
+	pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
 
 	pq_endmessage(&buf);
 }
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 2c7f0f1..56c352c 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -725,7 +725,12 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 			ereport(FATAL,
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("must be superuser or replication role to start walsender")));
+	}
 
+	if (am_walsender &&
+	    (in_dbname == NULL || in_dbname[0] == '\0') &&
+	    dboid == InvalidOid)
+	{
 		/* process any options passed in the startup packet */
 		if (MyProcPort != NULL)
 			process_startup_options(MyProcPort, am_superuser);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a1e12a8..89e2376 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1361,11 +1361,11 @@ BaseBackup(void)
 				progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
-	if (PQntuples(res) != 1 || PQnfields(res) != 3)
+	if (PQntuples(res) != 1 || PQnfields(res) != 4)
 	{
 		fprintf(stderr,
 				_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-				progname, PQntuples(res), PQnfields(res), 1, 3);
+				progname, PQntuples(res), PQnfields(res), 1, 4);
 		disconnect_and_exit(1);
 	}
 	sysidentifier = pg_strdup(PQgetvalue(res, 0, 0));
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 787a395..fe8aef6 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -252,11 +252,11 @@ StreamLog(void)
 				progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
-	if (PQntuples(res) != 1 || PQnfields(res) != 3)
+	if (PQntuples(res) != 1 || PQnfields(res) != 4)
 	{
 		fprintf(stderr,
 				_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-				progname, PQntuples(res), PQnfields(res), 1, 3);
+				progname, PQntuples(res), PQnfields(res), 1, 4);
 		disconnect_and_exit(1);
 	}
 	servertli = atoi(PQgetvalue(res, 0, 1));
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index d56a4d7..22a5340 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -534,11 +534,11 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
 			PQclear(res);
 			return false;
 		}
-		if (PQnfields(res) != 3 || PQntuples(res) != 1)
+		if (PQnfields(res) != 4 || PQntuples(res) != 1)
 		{
 			fprintf(stderr,
 					_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-					progname, PQntuples(res), PQnfields(res), 1, 3);
+					progname, PQntuples(res), PQnfields(res), 1, 4);
 			PQclear(res);
 			return false;
 		}
-- 
1.8.3.251.g1462b67

0005-wal_decoding-Log-xl_running_xact-s-at-a-higher-frequ.patchtext/x-patch; charset=us-asciiDownload
>From 23fbc42744fbe74d0ea230dcab8275011177c7a6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 05/13] wal_decoding: Log xl_running_xact's at a higher
 frequency than checkpoints are done

Logging information about running xacts more frequently is beneficial for both,
hot standby which can reach consistency faster and release some resources
earlier using this information, and future logical replication which can
initialize quicker using this.

Do so in the background writer which seems to be the best choice as its
regularly running and shouldn't be busy for too long without getting back into
its main loop.

Also mark xl_running_xact records as being relevant for async commit so the wal
writer writes them out soonish instead of possibly waiting a long time.
---
 src/backend/postmaster/bgwriter.c | 48 +++++++++++++++++++++++++++++++++++++++
 src/backend/storage/ipc/standby.c | 25 ++++++++++++++++----
 src/include/storage/standby.h     |  2 +-
 3 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 286ae86..dd62917 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -54,9 +54,11 @@
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
+#include "storage/standby.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
+#include "utils/timestamp.h"
 
 
 /*
@@ -76,6 +78,10 @@ int			BgWriterDelay = 200;
 static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t shutdown_requested = false;
 
+static TimestampTz last_logged_snap_ts;
+static XLogRecPtr last_logged_snap_recptr = InvalidXLogRecPtr;
+static uint32 log_snap_interval_ms = 15000;
+
 /* Signal handlers */
 
 static void bg_quickdie(SIGNAL_ARGS);
@@ -142,6 +148,12 @@ BackgroundWriterMain(void)
 	CurrentResourceOwner = ResourceOwnerCreate(NULL, "Background Writer");
 
 	/*
+	 * We just started, assume there has been either a shutdown or
+	 * end-of-recovery snapshot.
+	 */
+	last_logged_snap_ts = GetCurrentTimestamp();
+
+	/*
 	 * Create a memory context that we will do all our work in.  We do this so
 	 * that we can reset the context during error recovery and thereby avoid
 	 * possible memory leaks.  Formerly this code just ran in
@@ -276,6 +288,42 @@ BackgroundWriterMain(void)
 		}
 
 		/*
+		 * Log a new xl_running_xacts every now and then so replication can get
+		 * into a consistent state faster and clean up resources more
+		 * frequently. The costs of this are relatively low, so doing it 4
+		 * times a minute seems fine.
+		 *
+		 * We assume the interval for writing xl_running_xacts is significantly
+		 * bigger than BgWriterDelay, so we don't complicate the overall
+		 * timeout handling but just assume we're going to get called often
+		 * enough even if hibernation mode is active. It's not that important
+		 * that log_snap_interval_ms is met strictly.
+		 *
+		 * We do this logging in the bgwriter as its the only process thats run
+		 * regularly and returns to its mainloop all the
+		 * time. E.g. Checkpointer, when active, is barely ever in its
+		 * mainloop.
+		 */
+		if (XLogStandbyInfoActive() && !RecoveryInProgress())
+		{
+			TimestampTz timeout = 0;
+			TimestampTz now = GetCurrentTimestamp();
+			timeout = TimestampTzPlusMilliseconds(last_logged_snap_ts,
+												  log_snap_interval_ms);
+
+			/*
+			 * only log if enough time has passed and some xlog record has been
+			 * inserted.
+			 */
+			if (now >= timeout &&
+				last_logged_snap_recptr != GetXLogInsertRecPtr())
+			{
+				last_logged_snap_recptr = LogStandbySnapshot();
+				last_logged_snap_ts = now;
+			}
+		}
+
+		/*
 		 * Sleep until we are signaled or BgWriterDelay has elapsed.
 		 *
 		 * Note: the feedback control loop in BgBufferSync() expects that we
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index c704412..6f0de13 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -42,7 +42,7 @@ static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlis
 									   ProcSignalReason reason);
 static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
 static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
-static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
+static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
 static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
 
 
@@ -853,10 +853,13 @@ standby_redo(XLogRecPtr lsn, XLogRecord *record)
  * currently running xids, performed by StandbyReleaseOldLocks().
  * Zero xids should no longer be possible, but we may be replaying WAL
  * from a time when they were possible.
+ *
+ * Returns the RecPtr of the last inserted record.
  */
-void
+XLogRecPtr
 LogStandbySnapshot(void)
 {
+	XLogRecPtr recptr;
 	RunningTransactions running;
 	xl_standby_lock *locks;
 	int			nlocks;
@@ -876,9 +879,12 @@ LogStandbySnapshot(void)
 	 * record we write, because standby will open up when it sees this.
 	 */
 	running = GetRunningTransactionData();
-	LogCurrentRunningXacts(running);
+	recptr = LogCurrentRunningXacts(running);
+
 	/* GetRunningTransactionData() acquired XidGenLock, we must release it */
 	LWLockRelease(XidGenLock);
+
+	return recptr;
 }
 
 /*
@@ -889,7 +895,7 @@ LogStandbySnapshot(void)
  * is a contiguous chunk of memory and never exists fully until it is
  * assembled in WAL.
  */
-static void
+static XLogRecPtr
 LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 {
 	xl_running_xacts xlrec;
@@ -939,6 +945,17 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 			 CurrRunningXacts->oldestRunningXid,
 			 CurrRunningXacts->latestCompletedXid,
 			 CurrRunningXacts->nextXid);
+
+	/*
+	 * Ensure running_xacts information is synced to disk not too far in the
+	 * future. We don't want to stall anything though, so we let the wal writer
+	 * do it during normal operation. XLogSetAsyncXactLSN() conveniently will
+	 * mark the LSN as to-be-synced and nudge the WALWriter into action if
+	 * sleeping.
+	 */
+	XLogSetAsyncXactLSN(recptr);
+
+	return recptr;
 }
 
 /*
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 7f3f051..d4a8fe4 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -113,6 +113,6 @@ typedef RunningTransactionsData *RunningTransactions;
 extern void LogAccessExclusiveLock(Oid dbOid, Oid relOid);
 extern void LogAccessExclusiveLockPrepare(void);
 
-extern void LogStandbySnapshot(void);
+extern XLogRecPtr LogStandbySnapshot(void);
 
 #endif   /* STANDBY_H */
-- 
1.8.3.251.g1462b67

#66Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#65)
Re: logical changeset generation v5

On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund <andres@2ndquadrant.com> wrote:

0005 wal_decoding: Log xl_running_xact's at a higher frequency than checkpoints are done
* benefits hot standby startup

Review:

1. I think more comments are needed here to explain why we need this.
I don't know if the comments should go into the functions modified by
this patch or in some other location, but I don't find what's here now
adequate for understanding.

2. I think the variable naming could be better. If nothing else, I'd
spell out "snapshot" rather than abbreviating it to "snap". I'd also
add comments explaining what each of those variables does. And why
isn't log_snap_interval_ms a #define rather than a variable? (Don't
even talk to me about using gdb on a running instance. If you're even
thinking about that, this needs to be a GUC.)

3. Why does LogCurrentRunningXacts() need to call
XLogSetAsyncXactLSN()? Hopefully any WAL record is going to get
sync'd in a reasonably timely fashion; I can't see off-hand why this
one should need special handling.

0003 wal_decoding: Allow walsender's to connect to a specific database
* biggest problem is how to specify the connection we connect
to. Currently with the patch walsender connects to a database if it's
not named "replication" (via dbname). Perhaps it's better to invent a
replication_dbname parameter?

I understand why logical replication needs to connect to a database,
but I don't understand why any other walsender would need to connect
to a database. Absent a clear use case for such a thing, I don't
think we should allow it. Ignorant suggestion: perhaps the database
name could be stored in the logical replication slot.

0006 wal_decoding: copydir: move fsync_fname to fd.[c.h] and make it public
* Pretty trivial and boring.

Seems fine.

0007 wal_decoding: Add information about a tables primary key to struct RelationData
* Could be used in the matview refresh code

I think you and Kevin should discuss whether this is actually the
right way to do this. ISTM that if logical replication and
materialized views end up selecting different approaches to this
problem, everybody loses.

0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement

I'm still unconvinced we want this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#66)
Re: logical changeset generation v5

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:

On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund <andres@2ndquadrant.com> wrote:

0005 wal_decoding: Log xl_running_xact's at a higher frequency than checkpoints are done
* benefits hot standby startup

Review:

1. I think more comments are needed here to explain why we need this.
I don't know if the comments should go into the functions modified by
this patch or in some other location, but I don't find what's here now
adequate for understanding.

Hm. What information are you actually missing? I guess the
XLogSetAsyncXactLSN() needs a bit more context based on your question,
what else?
Not sure if it makes sense to explain in detail why it helps us to get
into a consistent state faster?

2. I think the variable naming could be better. If nothing else, I'd
spell out "snapshot" rather than abbreviating it to "snap". I'd also
add comments explaining what each of those variables does.

Ok.

And why
isn't log_snap_interval_ms a #define rather than a variable? (Don't
even talk to me about using gdb on a running instance. If you're even
thinking about that, this needs to be a GUC.)

Ugh. It certainly doesn't have anything to do with wanting to change it
on a running system using gdb. Brrr.

I think I wanted it to be a constant variable but forgot the const. I
personally prefer 'static const' to #define's if its legal C, but I
guess the project's style differs, so I'll change that.

3. Why does LogCurrentRunningXacts() need to call
XLogSetAsyncXactLSN()? Hopefully any WAL record is going to get
sync'd in a reasonably timely fashion; I can't see off-hand why this
one should need special handling.

No, we don't force writing out wal records in a timely fashion if
there's no pressure in wal_buffers, basically only on commits and
various XLogFlush()es. It doesn't make much of a difference if the
entire system is busy, but if it's not the wal writer will sleep. The
alternative would be to XLogFlush() the record, but that would actually
block, which isn't really what we want/need.

0003 wal_decoding: Allow walsender's to connect to a specific database
* biggest problem is how to specify the connection we connect
to. Currently with the patch walsender connects to a database if it's
not named "replication" (via dbname). Perhaps it's better to invent a
replication_dbname parameter?

I understand why logical replication needs to connect to a database,
but I don't understand why any other walsender would need to connect
to a database.

Well, logical replication actually streams out data using the walsender,
so that's the reason why I want to add it there. But there have been
cases in the past where we wanted to do stuff in the walsender that need
database access, but we couldn't do so because you cannot connect to
one.

Absent a clear use case for such a thing, I don't
think we should allow it. Ignorant suggestion: perhaps the database
name could be stored in the logical replication slot.

The problem is that you need to InitPostgres() with a database. You
cannot do that again, after connecting with an empty database which we
do in a plain walsender.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#67)
Re: logical changeset generation v5

On Tue, Sep 3, 2013 at 12:05 PM, Andres Freund <andres@2ndquadrant.com> wrote:

1. I think more comments are needed here to explain why we need this.
I don't know if the comments should go into the functions modified by
this patch or in some other location, but I don't find what's here now
adequate for understanding.

Hm. What information are you actually missing? I guess the
XLogSetAsyncXactLSN() needs a bit more context based on your question,
what else?
Not sure if it makes sense to explain in detail why it helps us to get
into a consistent state faster?

Well, we must have had some idea in mind when the original Hot Standby
patch went in that doing this once per checkpoint was good enough.
Now we think we need it every 15 seconds, but not more or less often.
So, why the change of heart? To my way of thinking, it seems as
though we ought to always begin replay at a checkpoint, so the standby
ought always to see one of these records immediately. Obviously
that's not good enough, but why not? And why is every 15 seconds good
enough?

3. Why does LogCurrentRunningXacts() need to call
XLogSetAsyncXactLSN()? Hopefully any WAL record is going to get
sync'd in a reasonably timely fashion; I can't see off-hand why this
one should need special handling.

No, we don't force writing out wal records in a timely fashion if
there's no pressure in wal_buffers, basically only on commits and
various XLogFlush()es. It doesn't make much of a difference if the
entire system is busy, but if it's not the wal writer will sleep. The
alternative would be to XLogFlush() the record, but that would actually
block, which isn't really what we want/need.

The WAL writer is supposed to call XLogBackgroundFlush() every time
WalWriterDelay expires. Yeah, it can hibernate, but if it's
hibernating, then we should respect that decision for this WAL record
type also.

0003 wal_decoding: Allow walsender's to connect to a specific database
* biggest problem is how to specify the connection we connect
to. Currently with the patch walsender connects to a database if it's
not named "replication" (via dbname). Perhaps it's better to invent a
replication_dbname parameter?

I understand why logical replication needs to connect to a database,
but I don't understand why any other walsender would need to connect
to a database.

Well, logical replication actually streams out data using the walsender,
so that's the reason why I want to add it there. But there have been
cases in the past where we wanted to do stuff in the walsender that need
database access, but we couldn't do so because you cannot connect to
one.

Could you be more specific?

Absent a clear use case for such a thing, I don't
think we should allow it. Ignorant suggestion: perhaps the database
name could be stored in the logical replication slot.

The problem is that you need to InitPostgres() with a database. You
cannot do that again, after connecting with an empty database which we
do in a plain walsender.

Are you saying that the logical replication slot can't be read before
calling InitPostgres()?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#68)
Re: logical changeset generation v5

On 2013-09-03 12:22:22 -0400, Robert Haas wrote:

On Tue, Sep 3, 2013 at 12:05 PM, Andres Freund <andres@2ndquadrant.com> wrote:

1. I think more comments are needed here to explain why we need this.
I don't know if the comments should go into the functions modified by
this patch or in some other location, but I don't find what's here now
adequate for understanding.

Hm. What information are you actually missing? I guess the
XLogSetAsyncXactLSN() needs a bit more context based on your question,
what else?
Not sure if it makes sense to explain in detail why it helps us to get
into a consistent state faster?

Well, we must have had some idea in mind when the original Hot Standby
patch went in that doing this once per checkpoint was good enough.
Now we think we need it every 15 seconds, but not more or less often.
So, why the change of heart?

I think the primary reason for that was that it's was a pretty
complicated patchset and we needed to start somewhere. By now we do have
reports of standbys taking their time to get consistent.

To my way of thinking, it seems as though we ought to always begin
replay at a checkpoint, so the standby ought always to see one of
these records immediately. Obviously that's not good enough, but why
not?

We always see one after the checkpoint (well, actually before the
checkpoint record, but ...), correct. The problem is just that reading a
single xact_running record doesn't automatically make you consistent. If
there's a single suboverflowed transaction running on the primary when
the xl_runing_xacts is logged we won't be able to switch to
consistent. Check procarray.c:ProcArrayApplyRecoveryInfo() for some fun
and some optimizations.
Since the only place where we currently have the information to
potentially become consistent is ProcArrayApplyRecoveryInfo() we will
have to wait checkpoint_timeout time till we get consistent. Which
sucks as there are good arguments to set that to 1h.
That especially sucks as you loose consistency everytime you restart the
standby...

And why is every 15 seconds good enough?

Waiting 15s to become consistent instead of checkpoint_timeout seems to
be ok to me and to be a good tradeoff between overhead and waiting. We
can certainly discuss other values or making it configurable. The latter
seemed to be unnecessary to me, but I have don't have a problem
implementing it. I just don't want to document it :P

3. Why does LogCurrentRunningXacts() need to call
XLogSetAsyncXactLSN()? Hopefully any WAL record is going to get
sync'd in a reasonably timely fashion; I can't see off-hand why this
one should need special handling.

No, we don't force writing out wal records in a timely fashion if
there's no pressure in wal_buffers, basically only on commits and
various XLogFlush()es. It doesn't make much of a difference if the
entire system is busy, but if it's not the wal writer will sleep. The
alternative would be to XLogFlush() the record, but that would actually
block, which isn't really what we want/need.

The WAL writer is supposed to call XLogBackgroundFlush() every time
WalWriterDelay expires. Yeah, it can hibernate, but if it's
hibernating, then we should respect that decision for this WAL record
type also.

Why should we respect it? There is work to be done and the wal writer
has no way of knowing that without us telling it? Normally we rely on
commit records and XLogFlush()es to trigger the wal writer.
Alternatively we can start a transaction and set synchronous_commit =
off, but that seems like a complication to me.

I understand why logical replication needs to connect to a database,
but I don't understand why any other walsender would need to connect
to a database.

Well, logical replication actually streams out data using the walsender,
so that's the reason why I want to add it there. But there have been
cases in the past where we wanted to do stuff in the walsender that need
database access, but we couldn't do so because you cannot connect to
one.

Could you be more specific?

I only remember 3959.1349384333@sss.pgh.pa.us but I think it has come up
before.

Absent a clear use case for such a thing, I don't
think we should allow it. Ignorant suggestion: perhaps the database
name could be stored in the logical replication slot.

The problem is that you need to InitPostgres() with a database. You
cannot do that again, after connecting with an empty database which we
do in a plain walsender.

Are you saying that the logical replication slot can't be read before
calling InitPostgres()?

The slot can be read just fine, but we won't know that we should do
that. Walsender accepts commands via PostgresMain()'s mainloop which has
done a InitPostgres(dbname) before. Which we need to do because we need
the environment it sets up.

The database is stored in the slots btw (as oid, not as a name though) ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#69)
Re: logical changeset generation v5

On Tue, Sep 3, 2013 at 12:57 PM, Andres Freund <andres@2ndquadrant.com> wrote:

To my way of thinking, it seems as though we ought to always begin
replay at a checkpoint, so the standby ought always to see one of
these records immediately. Obviously that's not good enough, but why
not?

We always see one after the checkpoint (well, actually before the
checkpoint record, but ...), correct. The problem is just that reading a
single xact_running record doesn't automatically make you consistent. If
there's a single suboverflowed transaction running on the primary when
the xl_runing_xacts is logged we won't be able to switch to
consistent. Check procarray.c:ProcArrayApplyRecoveryInfo() for some fun
and some optimizations.
Since the only place where we currently have the information to
potentially become consistent is ProcArrayApplyRecoveryInfo() we will
have to wait checkpoint_timeout time till we get consistent. Which
sucks as there are good arguments to set that to 1h.
That especially sucks as you loose consistency everytime you restart the
standby...

Right, OK.

And why is every 15 seconds good enough?

Waiting 15s to become consistent instead of checkpoint_timeout seems to
be ok to me and to be a good tradeoff between overhead and waiting. We
can certainly discuss other values or making it configurable. The latter
seemed to be unnecessary to me, but I have don't have a problem
implementing it. I just don't want to document it :P

I don't think it particularly needs to be configurable, but I wonder
if we can't be a bit smarter about when we do it. For example,
suppose we logged it every 15 s but only until we log a non-overflowed
snapshot. I realize that the overhead of a WAL record every 15
seconds is fairly small, but the load on some systems is all but
nonexistent. It would be nice not to wake up the HD unnecessarily.

The WAL writer is supposed to call XLogBackgroundFlush() every time
WalWriterDelay expires. Yeah, it can hibernate, but if it's
hibernating, then we should respect that decision for this WAL record
type also.

Why should we respect it?

Because I don't see any reason to believe that this WAL record is any
more important or urgent than any other WAL record that might get
logged.

I understand why logical replication needs to connect to a database,
but I don't understand why any other walsender would need to connect
to a database.

Well, logical replication actually streams out data using the walsender,
so that's the reason why I want to add it there. But there have been
cases in the past where we wanted to do stuff in the walsender that need
database access, but we couldn't do so because you cannot connect to
one.

Could you be more specific?

I only remember 3959.1349384333@sss.pgh.pa.us but I think it has come up
before.

It seems we need some more design there. Perhaps entering replication
mode could be triggered by writing either dbname=replication or
replication=yes. But then, do the replication commands simply become
SQL commands? I've certainly seen hackers use them that way. And I
can imagine that being a sensible approach, but this patch seems like
it's only covering a fairly small fraction of what really ought to be
a single commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#70)
Re: logical changeset generation v5

On 2013-09-03 15:56:15 -0400, Robert Haas wrote:

On Tue, Sep 3, 2013 at 12:57 PM, Andres Freund <andres@2ndquadrant.com> wrote:

And why is every 15 seconds good enough?

Waiting 15s to become consistent instead of checkpoint_timeout seems to
be ok to me and to be a good tradeoff between overhead and waiting. We
can certainly discuss other values or making it configurable. The latter
seemed to be unnecessary to me, but I have don't have a problem
implementing it. I just don't want to document it :P

I don't think it particularly needs to be configurable, but I wonder
if we can't be a bit smarter about when we do it. For example,
suppose we logged it every 15 s but only until we log a non-overflowed
snapshot.

There's actually more benefits than just overflowed snapshots (pruning
of the known xids machinery, exclusive lock cleanup).

I realize that the overhead of a WAL record every 15
seconds is fairly small, but the load on some systems is all but
nonexistent. It would be nice not to wake up the HD unnecessarily.

The patch as-is only writes if there has been WAL written since the last
time it logged a running_xacts. I think it's not worth building more
smarts than that?

The WAL writer is supposed to call XLogBackgroundFlush() every time
WalWriterDelay expires. Yeah, it can hibernate, but if it's
hibernating, then we should respect that decision for this WAL record
type also.

Why should we respect it?

Because I don't see any reason to believe that this WAL record is any
more important or urgent than any other WAL record that might get
logged.

I can't follow the logic behind that statement. Just about all WAL
records are either pretty immediately flushed afterwards or are done in
the context of a transaction where we flush (or do a
XLogSetAsyncXactLSN) at transaction commit.

XLogBackgroundFlush() won't necessarily flush the running_xacts
record. Unless you've set the async xact lsn it will only flush complete
blocks. So what can happen (I've seen that more than once in testing,
took me a while to debug) that a checkpoint is started in a busy period
but nothing happens after it finished. Since the checkpoint triggered
running_xact is triggered *before* we do the smgr flush it still is
overflowed. Then, after activity has died down, the bgwriter issues the
running xact record, but it's filling a block and thus it never get's
flushed.

To me the alternatives are to do a XLogSetAsyncXactLSN() or an
XLogFlush(). The latter is more aggressive and can block for a
measurable amount of time, which is why I don't want to do it in the
bgwriter.

I understand why logical replication needs to connect to a database,
but I don't understand why any other walsender would need to connect
to a database.

Well, logical replication actually streams out data using the walsender,
so that's the reason why I want to add it there. But there have been
cases in the past where we wanted to do stuff in the walsender that need
database access, but we couldn't do so because you cannot connect to
one.

Could you be more specific?

I only remember 3959.1349384333@sss.pgh.pa.us but I think it has come up
before.

It seems we need some more design there. Perhaps entering replication
mode could be triggered by writing either dbname=replication or
replication=yes. But then, do the replication commands simply become
SQL commands? I've certainly seen hackers use them that way. And I
can imagine that being a sensible approach, but this patch seems like
it's only covering a fairly small fraction of what really ought to be
a single commit.

Yes. I think you're right that we need more input/design here. I've
previously started threads about it, but nobody replied :(.

The problem with using dbname=replication as a trigger for anything is
that we actually allow databases to be created with that name. Perhaps
that was a design mistake.

I wondered about turning replication from a boolean into something like
off|0, on|1, database. dbname= gets only used in the latter
variant. That would be compatible with previous versions and would even
support using old tools (since all of them seem to do replication=1).

But then, do the replication commands simply become
SQL commands? I've certainly seen hackers use them that way.

I don't think that it's a good way at this point to make them to plain
SQL. There is more infrastructure (signal handlers, permissions,
different timeouts) & memory required for walsenders, so using plain SQL
there seems beyond the scope of this.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#72Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#71)
Re: logical changeset generation v5

On Tue, Sep 3, 2013 at 7:10 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I don't think it particularly needs to be configurable, but I wonder
if we can't be a bit smarter about when we do it. For example,
suppose we logged it every 15 s but only until we log a non-overflowed
snapshot.

There's actually more benefits than just overflowed snapshots (pruning
of the known xids machinery, exclusive lock cleanup).

I know that, but I thought the master and slave could only lose sync
on those things after a master crash and that once per checkpoint
cycle was enough for those other benefits. Am I wrong?

The patch as-is only writes if there has been WAL written since the last
time it logged a running_xacts. I think it's not worth building more
smarts than that?

Hmm, maybe.

Because I don't see any reason to believe that this WAL record is any
more important or urgent than any other WAL record that might get
logged.

I can't follow the logic behind that statement. Just about all WAL
records are either pretty immediately flushed afterwards or are done in
the context of a transaction where we flush (or do a
XLogSetAsyncXactLSN) at transaction commit.

XLogBackgroundFlush() won't necessarily flush the running_xacts
record.

OK, this was the key point I was missing.

It seems we need some more design there. Perhaps entering replication
mode could be triggered by writing either dbname=replication or
replication=yes. But then, do the replication commands simply become
SQL commands? I've certainly seen hackers use them that way. And I
can imagine that being a sensible approach, but this patch seems like
it's only covering a fairly small fraction of what really ought to be
a single commit.

Yes. I think you're right that we need more input/design here. I've
previously started threads about it, but nobody replied :(.

The problem with using dbname=replication as a trigger for anything is
that we actually allow databases to be created with that name. Perhaps
that was a design mistake.

It seemed like a good idea at the time, but maybe it wasn't. I'm not
sure where to go with it at this point; a forcible backward
compatibility break would probably screw things up for a lot of
people.

I wondered about turning replication from a boolean into something like
off|0, on|1, database. dbname= gets only used in the latter
variant. That would be compatible with previous versions and would even
support using old tools (since all of them seem to do replication=1).

I don't love that, but I don't hate it, either. But it still doesn't
answer the following question, which I think is important: if I (or
someone else) commits this patch, how will that make things better for
users? At the moment it's just adding a knob that doesn't do anything
for you when you twist it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#73Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#72)
Re: logical changeset generation v5

On 2013-09-04 10:02:05 -0400, Robert Haas wrote:

On Tue, Sep 3, 2013 at 7:10 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I don't think it particularly needs to be configurable, but I wonder
if we can't be a bit smarter about when we do it. For example,
suppose we logged it every 15 s but only until we log a non-overflowed
snapshot.

There's actually more benefits than just overflowed snapshots (pruning
of the known xids machinery, exclusive lock cleanup).

I know that, but I thought the master and slave could only lose sync
on those things after a master crash and that once per checkpoint
cycle was enough for those other benefits. Am I wrong?

The xid tracking can keep track without the additional records but it
sometimes needs a good bit more memory to do so if the primary burns to
xids quite fast. Everytime we see an running xacts record we can do
cleanup (that's the ExpireOldKnownAssignedTransactionIds() in
ProcArrayApplyRecoveryInfo()).

The problem with using dbname=replication as a trigger for anything is
that we actually allow databases to be created with that name. Perhaps
that was a design mistake.

It seemed like a good idea at the time, but maybe it wasn't. I'm not
sure where to go with it at this point; a forcible backward
compatibility break would probably screw things up for a lot of
people.

Yes, breaking things now doesn't seem like a good idea.

I wondered about turning replication from a boolean into something like
off|0, on|1, database. dbname= gets only used in the latter
variant. That would be compatible with previous versions and would even
support using old tools (since all of them seem to do replication=1).

I don't love that, but I don't hate it, either.

Ok. Will update the patch that way. Seems better than it's current state.

But it still doesn't
answer the following question, which I think is important: if I (or
someone else) commits this patch, how will that make things better for
users? At the moment it's just adding a knob that doesn't do anything
for you when you twist it.

I am not sure it's a good idea to commit it before we're sure were going
to commit the changeset extraction. It's an independently reviewable and
testable piece of code that's simple enough to understand quickly in
contrast to the large changeset extraction patch. That's why I kept it
separate.
On the other hand, as you know, it's not without precedent to commit
pieces of infrastructure that aren't really useful for the enduser
(think FDW).

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#74Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#66)
Re: lcr v5 - introduction of InvalidCommandId

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:

0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement

I'm still unconvinced we want this.

Ok, so the reason for the existance of this patch is that currently
there is no way to represent a "unset" CommandId. This is a problem for
the following patches because we need to log the cmin, cmax of catalog
rows and obviously there can be rows where cmax is unset.
The reason I chose to change the definition of CommandIds is that the
other ondisk types we use like TransactionIds, XLogRecPtrs and such have
an "invalid" type, CommandIds don't. Changing their definition to have 0
- analogous to the previous examples - as their invalid value is not a
problem because CommandIds from pg_upgraded clusters may never be used
for anything. Going from 2^32 to 2^32-1 possible CommandIds doesn't seem
like a problem to me. Imo the CommandIds should have been defined that
way from the start.

Makes some sense?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#74)
Re: lcr v5 - introduction of InvalidCommandId

On Wed, Sep 4, 2013 at 12:07 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:

0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement

I'm still unconvinced we want this.

Ok, so the reason for the existance of this patch is that currently
there is no way to represent a "unset" CommandId. This is a problem for
the following patches because we need to log the cmin, cmax of catalog
rows and obviously there can be rows where cmax is unset.

For heap tuples, we solve this problem by using flag bits. Why not
adopt the same approach?

The reason I chose to change the definition of CommandIds is that the
other ondisk types we use like TransactionIds, XLogRecPtrs and such have
an "invalid" type, CommandIds don't. Changing their definition to have 0
- analogous to the previous examples - as their invalid value is not a
problem because CommandIds from pg_upgraded clusters may never be used
for anything. Going from 2^32 to 2^32-1 possible CommandIds doesn't seem
like a problem to me. Imo the CommandIds should have been defined that
way from the start.

Makes some sense?

I don't have a problem with this if other people think it's a good
idea. But I think it needs a few +1s and not too many -1s first, and
so far (AFAIK) no one else has weighed in with an opinion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#66)
2 attachment(s)
Re: logical changeset generation v5

Hi,

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:

On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund <andres@2ndquadrant.com> wrote:

0005 wal_decoding: Log xl_running_xact's at a higher frequency than checkpoints are done
* benefits hot standby startup

I tried to update the patch to address the comments you made.

0003 wal_decoding: Allow walsender's to connect to a specific database
* biggest problem is how to specify the connection we connect
to. Currently with the patch walsender connects to a database if it's
not named "replication" (via dbname). Perhaps it's better to invent a
replication_dbname parameter?

I've updated the patch so it extends the "replication" startup parameter
to not only specify a boolean but also "database". In the latter case it
will connect to the database specified in "dbname".
As discussed downthread, this patch doesn't have an immediate advantage
for users until the changeset extraction patch itself is
applied. Whether or not it should be applied separately is unclear.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0002-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patchtext/x-patch; charset=us-asciiDownload
>From 2aa39548f5990e9663e95f011f25a89a0dc8d8a1 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 2/9] wal_decoding: Allow walsender's to connect to a specific
 database

Extend the existing 'replication' parameter to not only allow a boolean value
but also "database". If the latter is specified we connect to the database
specified in 'dbname'.

This is useful for future walsender commands which need database interaction,
e.g. changeset extraction.
---
 doc/src/sgml/protocol.sgml                         | 24 +++++++++---
 src/backend/postmaster/postmaster.c                | 23 ++++++++++--
 .../libpqwalreceiver/libpqwalreceiver.c            |  4 +-
 src/backend/replication/walsender.c                | 43 +++++++++++++++++++---
 src/backend/utils/init/postinit.c                  |  5 +++
 src/bin/pg_basebackup/pg_basebackup.c              |  4 +-
 src/bin/pg_basebackup/pg_receivexlog.c             |  4 +-
 src/bin/pg_basebackup/receivelog.c                 |  4 +-
 src/include/replication/walsender.h                |  1 +
 9 files changed, 89 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0b2e60e..2ea14e5 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1301,10 +1301,13 @@
 
 <para>
 To initiate streaming replication, the frontend sends the
-<literal>replication</> parameter in the startup message. This tells the
-backend to go into walsender mode, wherein a small set of replication commands
-can be issued instead of SQL statements. Only the simple query protocol can be
-used in walsender mode.
+<literal>replication</> parameter in the startup message. A boolean value
+of <literal>true</> tells the backend to go into walsender mode, wherein a
+small set of replication commands can be issued instead of SQL statements. Only
+the simple query protocol can be used in walsender mode.
+Passing a <literal>database</> as the value instructs walsender to connect to
+the database specified in the <literal>dbname</> paramter which will in future
+allow some additional commands to the ones specified below to be run.
 
 The commands accepted in walsender mode are:
 
@@ -1314,7 +1317,7 @@ The commands accepted in walsender mode are:
     <listitem>
      <para>
       Requests the server to identify itself. Server replies with a result
-      set of a single row, containing three fields:
+      set of a single row, containing four fields:
      </para>
 
      <para>
@@ -1356,6 +1359,17 @@ The commands accepted in walsender mode are:
       </listitem>
       </varlistentry>
 
+      <varlistentry>
+      <term>
+       dbname
+      </term>
+      <listitem>
+      <para>
+       Database connected to or NULL.
+      </para>
+      </listitem>
+      </varlistentry>
+
       </variablelist>
      </para>
     </listitem>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 01d2618..a31b01d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1894,10 +1894,21 @@ retry1:
 				port->cmdline_options = pstrdup(valptr);
 			else if (strcmp(nameptr, "replication") == 0)
 			{
-				if (!parse_bool(valptr, &am_walsender))
+				/*
+				 * Due to backward compatibility concerns replication is a
+				 * bybrid beast which allows the value to be either a boolean
+				 * or the string 'database'. The latter connects to a specific
+				 * database which is e.g. required for changeset extraction.
+				 */
+				if (strcmp(valptr, "database") == 0)
+				{
+					am_walsender = true;
+					am_db_walsender = true;
+				}
+				else if (!parse_bool(valptr, &am_walsender))
 					ereport(FATAL,
 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-							 errmsg("invalid value for boolean option \"replication\"")));
+							 errmsg("invalid value for option \"replication\", legal values are false, 0, true, 1 or database")));
 			}
 			else
 			{
@@ -1983,8 +1994,12 @@ retry1:
 	if (strlen(port->user_name) >= NAMEDATALEN)
 		port->user_name[NAMEDATALEN - 1] = '\0';
 
-	/* Walsender is not related to a particular database */
-	if (am_walsender)
+	/*
+	 * Generic walsender, e.g. for streaming replication, is not connected to a
+	 * particular database. But walsenders used for logical replication need to
+	 * connect to a specific database.
+	 */
+	if (am_walsender && !am_db_walsender)
 		port->database_name[0] = '\0';
 
 	/*
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6bc0aa1..ee0f1fe 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -130,7 +130,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
 						"the primary server: %s",
 						PQerrorMessage(streamConn))));
 	}
-	if (PQnfields(res) != 3 || PQntuples(res) != 1)
+	if (PQnfields(res) != 4 || PQntuples(res) != 1)
 	{
 		int			ntuples = PQntuples(res);
 		int			nfields = PQnfields(res);
@@ -138,7 +138,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
 		PQclear(res);
 		ereport(ERROR,
 				(errmsg("invalid response from primary server"),
-				 errdetail("Expected 1 tuple with 3 fields, got %d tuples with %d fields.",
+				 errdetail("Expected 1 tuple with 4 fields, got %d tuples with %d fields.",
 						   ntuples, nfields)));
 	}
 	primary_sysid = PQgetvalue(res, 0, 0);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index afd559d..b00a91a 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -46,7 +46,10 @@
 #include "access/timeline.h"
 #include "access/transam.h"
 #include "access/xlog_internal.h"
+#include "access/xact.h"
+
 #include "catalog/pg_type.h"
+#include "commands/dbcommands.h"
 #include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -89,9 +92,10 @@ WalSndCtlData *WalSndCtl = NULL;
 WalSnd	   *MyWalSnd = NULL;
 
 /* Global state */
-bool		am_walsender = false;		/* Am I a walsender process ? */
+bool		am_walsender = false;		/* Am I a walsender process? */
 bool		am_cascading_walsender = false;		/* Am I cascading WAL to
-												 * another standby ? */
+												 * another standby? */
+bool		am_db_walsender = false;		/* connect to database? */
 
 /* User-settable parameters for walsender */
 int			max_wal_senders = 0;	/* the maximum number of concurrent walsenders */
@@ -243,10 +247,12 @@ IdentifySystem(void)
 	char		tli[11];
 	char		xpos[MAXFNAMELEN];
 	XLogRecPtr	logptr;
+	char*        dbname = NULL;
 
 	/*
-	 * Reply with a result set with one row, three columns. First col is
-	 * system ID, second is timeline ID, and third is current xlog location.
+	 * Reply with a result set with one row, four columns. First col is system
+	 * ID, second is timeline ID, third is current xlog location and the fourth
+	 * contains the database name if we are connected to one.
 	 */
 
 	snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
@@ -265,9 +271,23 @@ IdentifySystem(void)
 
 	snprintf(xpos, sizeof(xpos), "%X/%X", (uint32) (logptr >> 32), (uint32) logptr);
 
+	if (MyDatabaseId != InvalidOid)
+	{
+		MemoryContext cur = CurrentMemoryContext;
+
+		/* syscache access needs a transaction env. */
+		StartTransactionCommand();
+		/* make dbname live outside TX context */
+		MemoryContextSwitchTo(cur);
+		dbname = get_database_name(MyDatabaseId);
+		CommitTransactionCommand();
+		/* CommitTransactionCommand switches to TopMemoryContext */
+		MemoryContextSwitchTo(cur);
+	}
+
 	/* Send a RowDescription message */
 	pq_beginmessage(&buf, 'T');
-	pq_sendint(&buf, 3, 2);		/* 3 fields */
+	pq_sendint(&buf, 4, 2);		/* 4 fields */
 
 	/* first field */
 	pq_sendstring(&buf, "systemid");	/* col name */
@@ -295,17 +315,28 @@ IdentifySystem(void)
 	pq_sendint(&buf, -1, 2);
 	pq_sendint(&buf, 0, 4);
 	pq_sendint(&buf, 0, 2);
+
+	/* fourth field */
+	pq_sendstring(&buf, "dbname");
+	pq_sendint(&buf, 0, 4);
+	pq_sendint(&buf, 0, 2);
+	pq_sendint(&buf, TEXTOID, 4);
+	pq_sendint(&buf, -1, 2);
+	pq_sendint(&buf, 0, 4);
+	pq_sendint(&buf, 0, 2);
 	pq_endmessage(&buf);
 
 	/* Send a DataRow message */
 	pq_beginmessage(&buf, 'D');
-	pq_sendint(&buf, 3, 2);		/* # of columns */
+	pq_sendint(&buf, 4, 2);		/* # of columns */
 	pq_sendint(&buf, strlen(sysid), 4); /* col1 len */
 	pq_sendbytes(&buf, (char *) &sysid, strlen(sysid));
 	pq_sendint(&buf, strlen(tli), 4);	/* col2 len */
 	pq_sendbytes(&buf, (char *) tli, strlen(tli));
 	pq_sendint(&buf, strlen(xpos), 4);	/* col3 len */
 	pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
+	pq_sendint(&buf, strlen(dbname), 4);	/* col4 len */
+	pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
 
 	pq_endmessage(&buf);
 }
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 2c7f0f1..56c352c 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -725,7 +725,12 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 			ereport(FATAL,
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("must be superuser or replication role to start walsender")));
+	}
 
+	if (am_walsender &&
+	    (in_dbname == NULL || in_dbname[0] == '\0') &&
+	    dboid == InvalidOid)
+	{
 		/* process any options passed in the startup packet */
 		if (MyProcPort != NULL)
 			process_startup_options(MyProcPort, am_superuser);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a1e12a8..89e2376 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1361,11 +1361,11 @@ BaseBackup(void)
 				progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
-	if (PQntuples(res) != 1 || PQnfields(res) != 3)
+	if (PQntuples(res) != 1 || PQnfields(res) != 4)
 	{
 		fprintf(stderr,
 				_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-				progname, PQntuples(res), PQnfields(res), 1, 3);
+				progname, PQntuples(res), PQnfields(res), 1, 4);
 		disconnect_and_exit(1);
 	}
 	sysidentifier = pg_strdup(PQgetvalue(res, 0, 0));
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 787a395..fe8aef6 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -252,11 +252,11 @@ StreamLog(void)
 				progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
-	if (PQntuples(res) != 1 || PQnfields(res) != 3)
+	if (PQntuples(res) != 1 || PQnfields(res) != 4)
 	{
 		fprintf(stderr,
 				_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-				progname, PQntuples(res), PQnfields(res), 1, 3);
+				progname, PQntuples(res), PQnfields(res), 1, 4);
 		disconnect_and_exit(1);
 	}
 	servertli = atoi(PQgetvalue(res, 0, 1));
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index d56a4d7..22a5340 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -534,11 +534,11 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
 			PQclear(res);
 			return false;
 		}
-		if (PQnfields(res) != 3 || PQntuples(res) != 1)
+		if (PQnfields(res) != 4 || PQntuples(res) != 1)
 		{
 			fprintf(stderr,
 					_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-					progname, PQntuples(res), PQnfields(res), 1, 3);
+					progname, PQntuples(res), PQnfields(res), 1, 4);
 			PQclear(res);
 			return false;
 		}
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2cc7ddf..5097235 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -19,6 +19,7 @@
 /* global state */
 extern bool am_walsender;
 extern bool am_cascading_walsender;
+extern bool am_db_walsender;
 extern bool wake_wal_senders;
 
 /* user-settable parameters */
-- 
1.8.3.251.g1462b67

0003-wal_decoding-Log-xl_running_xact-s-at-a-higher-frequ.patchtext/x-patch; charset=us-asciiDownload
>From 770c858ebebe229bb5239c8370fe25f51df0f2a6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 3/9] wal_decoding: Log xl_running_xact's at a higher frequency
 than checkpoints are done

Logging information about running xacts more frequently is beneficial for both,
hot standby which can reach consistency faster and release some resources
earlier using this information, and future logical replication which can
initialize quicker using this.

Do so in the background writer which seems to be the best choice as its
regularly running and shouldn't be busy for too long without getting back into
its main loop.

Also mark xl_running_xact records as being relevant for async commit so the wal
writer writes them out soonish instead of possibly waiting a long time.
---
 src/backend/postmaster/bgwriter.c | 62 +++++++++++++++++++++++++++++++++++++++
 src/backend/storage/ipc/standby.c | 27 ++++++++++++++---
 src/include/storage/standby.h     |  2 +-
 3 files changed, 86 insertions(+), 5 deletions(-)

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 286ae86..13d57c5 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -54,9 +54,11 @@
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
+#include "storage/standby.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
+#include "utils/timestamp.h"
 
 
 /*
@@ -71,6 +73,20 @@ int			BgWriterDelay = 200;
 #define HIBERNATE_FACTOR			50
 
 /*
+ * Interval in which standby snapshots are logged into the WAL stream, in
+ * milliseconds.
+ */
+#define LOG_SNAPSHOT_INTERVAL_MS 15000
+
+/*
+ * LSN and timestamp at which we last issued a LogStandbySnapshot(), to avoid
+ * doing so too often or repeatedly if there has been no other write activity
+ * in the system.
+ */
+static TimestampTz last_snapshot_ts;
+static XLogRecPtr last_snapshot_lsn = InvalidXLogRecPtr;
+
+/*
  * Flags set by interrupt handlers for later service in the main loop.
  */
 static volatile sig_atomic_t got_SIGHUP = false;
@@ -142,6 +158,12 @@ BackgroundWriterMain(void)
 	CurrentResourceOwner = ResourceOwnerCreate(NULL, "Background Writer");
 
 	/*
+	 * We just started, assume there has been either a shutdown or
+	 * end-of-recovery snapshot.
+	 */
+	last_snapshot_ts = GetCurrentTimestamp();
+
+	/*
 	 * Create a memory context that we will do all our work in.  We do this so
 	 * that we can reset the context during error recovery and thereby avoid
 	 * possible memory leaks.  Formerly this code just ran in
@@ -276,6 +298,46 @@ BackgroundWriterMain(void)
 		}
 
 		/*
+		 * Log a new xl_running_xacts every now and then so replication can get
+		 * into a consistent state faster (think of suboverflowed snapshots)
+		 * and clean up resources (locks, KnownXids*) more frequently. The
+		 * costs of this are relatively low, so doing it 4 times
+		 * (LOG_SNAPSHOT_INTERVAL_MS) a minute seems fine.
+		 *
+		 * We assume the interval for writing xl_running_xacts is
+		 * significantly bigger than BgWriterDelay, so we don't complicate the
+		 * overall timeout handling but just assume we're going to get called
+		 * often enough even if hibernation mode is active. It's not that
+		 * important that log_snap_interval_ms is met strictly. To make sure
+		 * we're not waking the disk up unneccesarily on an idle system we
+		 * check whether there has been any WAL inserted since the last time
+		 * we've logged a running xacts.
+		 *
+		 * We do this logging in the bgwriter as its the only process thats
+		 * run regularly and returns to its mainloop all the
+		 * time. E.g. Checkpointer, when active, is barely ever in its
+		 * mainloop and thus makes it hard to log regularly.
+		 */
+		if (XLogStandbyInfoActive() && !RecoveryInProgress())
+		{
+			TimestampTz timeout = 0;
+			TimestampTz now = GetCurrentTimestamp();
+			timeout = TimestampTzPlusMilliseconds(last_snapshot_ts,
+												  LOG_SNAPSHOT_INTERVAL_MS);
+
+			/*
+			 * only log if enough time has passed and some xlog record has been
+			 * inserted.
+			 */
+			if (now >= timeout &&
+				last_snapshot_lsn != GetXLogInsertRecPtr())
+			{
+				last_snapshot_lsn = LogStandbySnapshot();
+				last_snapshot_ts = now;
+			}
+		}
+
+		/*
 		 * Sleep until we are signaled or BgWriterDelay has elapsed.
 		 *
 		 * Note: the feedback control loop in BgBufferSync() expects that we
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index c704412..97da1a0 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -42,7 +42,7 @@ static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlis
 									   ProcSignalReason reason);
 static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
 static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
-static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
+static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
 static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
 
 
@@ -853,10 +853,13 @@ standby_redo(XLogRecPtr lsn, XLogRecord *record)
  * currently running xids, performed by StandbyReleaseOldLocks().
  * Zero xids should no longer be possible, but we may be replaying WAL
  * from a time when they were possible.
+ *
+ * Returns the RecPtr of the last inserted record.
  */
-void
+XLogRecPtr
 LogStandbySnapshot(void)
 {
+	XLogRecPtr recptr;
 	RunningTransactions running;
 	xl_standby_lock *locks;
 	int			nlocks;
@@ -876,9 +879,12 @@ LogStandbySnapshot(void)
 	 * record we write, because standby will open up when it sees this.
 	 */
 	running = GetRunningTransactionData();
-	LogCurrentRunningXacts(running);
+	recptr = LogCurrentRunningXacts(running);
+
 	/* GetRunningTransactionData() acquired XidGenLock, we must release it */
 	LWLockRelease(XidGenLock);
+
+	return recptr;
 }
 
 /*
@@ -889,7 +895,7 @@ LogStandbySnapshot(void)
  * is a contiguous chunk of memory and never exists fully until it is
  * assembled in WAL.
  */
-static void
+static XLogRecPtr
 LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 {
 	xl_running_xacts xlrec;
@@ -939,6 +945,19 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 			 CurrRunningXacts->oldestRunningXid,
 			 CurrRunningXacts->latestCompletedXid,
 			 CurrRunningXacts->nextXid);
+
+	/*
+	 * Ensure running_xacts information is synced to disk not too far in the
+	 * future. We don't want to stall anything though (i.e. use XLogFlush()),
+	 * so we let the wal writer do it during normal
+	 * operation. XLogSetAsyncXactLSN() conveniently will mark the LSN as
+	 * to-be-synced and nudge the WALWriter into action if sleeping. Check
+	 * XLogBackgroundFlush() for details why a record might not be flushed
+	 * without it.
+	 */
+	XLogSetAsyncXactLSN(recptr);
+
+	return recptr;
 }
 
 /*
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 7f3f051..d4a8fe4 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -113,6 +113,6 @@ typedef RunningTransactionsData *RunningTransactions;
 extern void LogAccessExclusiveLock(Oid dbOid, Oid relOid);
 extern void LogAccessExclusiveLockPrepare(void);
 
-extern void LogStandbySnapshot(void);
+extern XLogRecPtr LogStandbySnapshot(void);
 
 #endif   /* STANDBY_H */
-- 
1.8.3.251.g1462b67

#77Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#75)
Re: lcr v5 - introduction of InvalidCommandId

On 2013-09-05 12:44:18 -0400, Robert Haas wrote:

On Wed, Sep 4, 2013 at 12:07 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:

0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement

I'm still unconvinced we want this.

Ok, so the reason for the existance of this patch is that currently
there is no way to represent a "unset" CommandId. This is a problem for
the following patches because we need to log the cmin, cmax of catalog
rows and obviously there can be rows where cmax is unset.

For heap tuples, we solve this problem by using flag bits. Why not
adopt the same approach?

We can, while it makes the amount of data stored/logged slightly larger
and it seems to lead to less idiomatic code to me, so if there's another
-1 I'll go that way.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#78Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#66)
1 attachment(s)
Re: lcr v5 - primary/candidate key in relcache

Hi Kevin,

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:

On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund <andres@2ndquadrant.com> wrote:

0007 wal_decoding: Add information about a tables primary key to struct RelationData
* Could be used in the matview refresh code

I think you and Kevin should discuss whether this is actually the
right way to do this. ISTM that if logical replication and
materialized views end up selecting different approaches to this
problem, everybody loses.

The patch we're discussion here adds a new struct RelationData field
called 'rd_primary' (should possibly be renamed) which contains
information about the "best" candidate key available for a table.

From the header comments:
/*
* The 'best' primary or candidate key that has been found, only set
* correctly if RelationGetIndexList has been called/rd_indexvalid > 0.
*
* Indexes are chosen in the following order:
* * Primary Key
* * oid index
* * the first (OID order) unique, immediate, non-partial and
* non-expression index over one or more NOT NULL'ed columns
*/
Oid rd_primary;

I thought we could use that in matview.c:refresh_by_match_merge() to
select a more efficient diff if rd_primary has a valid index. In that
case you only'd need to compare that index's fields which should result
in an more efficient plan.

Maybe it's also useful in other cases for you?

If it's relevant at all, would you like to have a different priority
list than the one above?

Regards,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0004-wal_decoding-Add-information-about-a-tables-primary-.patchtext/x-patch; charset=us-asciiDownload
From ee85b3bd8d8cc25fa547c004f6f6ea6bccef7c66 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 4/9] wal_decoding: Add information about a tables primary key
 to struct RelationData

'rd_primary' now contains the Oid of an index over uniquely identifying
columns. Several types of indexes are interesting and are collected in that
order:
* Primary Key
* oid index
* the first (OID order) unique, immediate, non-partial and
  non-expression index over one or more NOT NULL'ed columns

To gather rd_primary value RelationGetIndexList() needs to have been called.

This is helpful because for logical replication we frequently - on the sending
and receiving side - need to lookup that index and RelationGetIndexList already
gathers all the necessary information.

This could be used to replace tablecmd.c's transformFkeyGetPrimaryKey, but
would change the meaning of that, so it seems to require additional discussion.
---
 src/backend/utils/cache/relcache.c | 52 +++++++++++++++++++++++++++++++++++---
 src/include/utils/rel.h            | 12 +++++++++
 2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 66fb63b..c588c29 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3465,7 +3465,9 @@ RelationGetIndexList(Relation relation)
 	ScanKeyData skey;
 	HeapTuple	htup;
 	List	   *result;
-	Oid			oidIndex;
+	Oid			oidIndex = InvalidOid;
+	Oid			pkeyIndex = InvalidOid;
+	Oid			candidateIndex = InvalidOid;
 	MemoryContext oldcxt;
 
 	/* Quick exit if we already computed the list. */
@@ -3522,17 +3524,61 @@ RelationGetIndexList(Relation relation)
 		Assert(!isnull);
 		indclass = (oidvector *) DatumGetPointer(indclassDatum);
 
+		if (!IndexIsValid(index))
+			continue;
+
 		/* Check to see if it is a unique, non-partial btree index on OID */
-		if (IndexIsValid(index) &&
-			index->indnatts == 1 &&
+		if (index->indnatts == 1 &&
 			index->indisunique && index->indimmediate &&
 			index->indkey.values[0] == ObjectIdAttributeNumber &&
 			indclass->values[0] == OID_BTREE_OPS_OID &&
 			heap_attisnull(htup, Anum_pg_index_indpred))
 			oidIndex = index->indexrelid;
+
+		if (index->indisunique &&
+			index->indimmediate &&
+			heap_attisnull(htup, Anum_pg_index_indpred))
+		{
+			/* always prefer primary keys */
+			if (index->indisprimary)
+				pkeyIndex = index->indexrelid;
+			else if (!OidIsValid(pkeyIndex)
+					&& !OidIsValid(oidIndex)
+					&& !OidIsValid(candidateIndex))
+			{
+				int key;
+				bool found = true;
+				for (key = 0; key < index->indnatts; key++)
+				{
+					int16 attno = index->indkey.values[key];
+					Form_pg_attribute attr;
+					/* internal column, like oid */
+					if (attno <= 0)
+						continue;
+
+					attr = relation->rd_att->attrs[attno - 1];
+					if (!attr->attnotnull)
+					{
+						found = false;
+						break;
+					}
+				}
+				if (found)
+					candidateIndex = index->indexrelid;
+			}
+		}
 	}
 
 	systable_endscan(indscan);
+
+	if (OidIsValid(pkeyIndex))
+		relation->rd_primary = pkeyIndex;
+	/* prefer oid indexes over normal candidate ones */
+	else if (OidIsValid(oidIndex))
+		relation->rd_primary = oidIndex;
+	else if (OidIsValid(candidateIndex))
+		relation->rd_primary = candidateIndex;
+
 	heap_close(indrel, AccessShareLock);
 
 	/* Now save a copy of the completed list in the relcache entry. */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 589c9a8..0281b4b 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -111,6 +111,18 @@ typedef struct RelationData
 	TriggerDesc *trigdesc;		/* Trigger info, or NULL if rel has none */
 
 	/*
+	 * The 'best' primary or candidate key that has been found, only set
+	 * correctly if RelationGetIndexList has been called/rd_indexvalid > 0.
+	 *
+	 * Indexes are chosen in the following order:
+	 * * Primary Key
+	 * * oid index
+	 * * the first (OID order) unique, immediate, non-partial and
+	 *   non-expression index over one or more NOT NULL'ed columns
+	 */
+	Oid rd_primary;
+
+	/*
 	 * rd_options is set whenever rd_rel is loaded into the relcache entry.
 	 * Note that you can NOT look into rd_rel for this data.  NULL means "use
 	 * defaults".
-- 
1.8.3.251.g1462b67

#79Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#77)
Re: lcr v5 - introduction of InvalidCommandId

On Thu, Sep 5, 2013 at 12:59 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-09-05 12:44:18 -0400, Robert Haas wrote:

On Wed, Sep 4, 2013 at 12:07 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:

0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement

I'm still unconvinced we want this.

Ok, so the reason for the existance of this patch is that currently
there is no way to represent a "unset" CommandId. This is a problem for
the following patches because we need to log the cmin, cmax of catalog
rows and obviously there can be rows where cmax is unset.

For heap tuples, we solve this problem by using flag bits. Why not
adopt the same approach?

We can, while it makes the amount of data stored/logged slightly larger
and it seems to lead to less idiomatic code to me, so if there's another
-1 I'll go that way.

OK. Consider me more of a -0 than a -1. Like I say, I don't really
want to block it; I just don't feel comfortable committing it unless a
few other people say something like "I don't see a problem with that".
Or maybe point me to relevant changeset extraction code that's going
to get messier.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#79)
Re: lcr v5 - introduction of InvalidCommandId

Robert Haas <robertmhaas@gmail.com> writes:

OK. Consider me more of a -0 than a -1. Like I say, I don't really
want to block it; I just don't feel comfortable committing it unless a
few other people say something like "I don't see a problem with that".

FWIW, I've always thought it was a wart that there wasn't a recognized
InvalidCommandId value. It was never pressing to fix it before, but
if LCR needs it, let's do so. I definitely *don't* find it cleaner to
eat up another flag bit to avoid that. We don't have many to spare.

Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
but I suppose we can't have that without an on-disk compatibility break.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#81Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#80)
Re: lcr v5 - introduction of InvalidCommandId

Hi,

Thanks for weighin in.

On 2013-09-05 14:21:33 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

OK. Consider me more of a -0 than a -1. Like I say, I don't really
want to block it; I just don't feel comfortable committing it unless a
few other people say something like "I don't see a problem with that".

FWIW, I've always thought it was a wart that there wasn't a recognized
InvalidCommandId value. It was never pressing to fix it before, but
if LCR needs it, let's do so.

Yes, its a bit anomalous to the other types.

I definitely *don't* find it cleaner to
eat up another flag bit to avoid that. We don't have many to spare.

It wouldn't need to be a flag bit in any existing struct, so that's not
a problem.

Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
but I suppose we can't have that without an on-disk compatibility break.

The patch actually does change it exactly that way. My argument for that
being valid is that CommandIds don't play any role outside of their own
transaction. Now, somebody could argue that SELECT cmin, cmax can be
done outside the transaction, but: Those values are already pretty much
meaningless today since cmin/cmax have been merged. They also don't
check whether the field is initialized at all.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#82Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#81)
Re: lcr v5 - introduction of InvalidCommandId

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-09-05 14:21:33 -0400, Tom Lane wrote:

Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
but I suppose we can't have that without an on-disk compatibility break.

The patch actually does change it exactly that way.

Oh. I hadn't looked at the patch, but I had (mis)read what Robert said
to think that you were proposing introducing InvalidCommandId = 0xFFFFFFFF
while leaving FirstCommandId alone. That would make more sense to me as
(1) it doesn't change the interpretation of anything that's (likely to be)
on disk; (2) it allows the check for overflow in CommandCounterIncrement
to not involve recovering from an *actual* overflow. With the horsing
around we've been seeing from the gcc boys lately, I don't have a warm
feeling about whether they won't break that test someday on the grounds
that "overflow is undefined behavior".

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83Peter Geoghegan
pg@heroku.com
In reply to: Andres Freund (#81)
Re: lcr v5 - introduction of InvalidCommandId

On Thu, Sep 5, 2013 at 11:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
but I suppose we can't have that without an on-disk compatibility break.

The patch actually does change it exactly that way. My argument for that
being valid is that CommandIds don't play any role outside of their own
transaction.

Right. It seems like this should probably be noted in the
documentation under "5.4. System Columns" -- I just realized that it
isn't.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#84Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#82)
Re: lcr v5 - introduction of InvalidCommandId

On 2013-09-05 14:37:01 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-09-05 14:21:33 -0400, Tom Lane wrote:

Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
but I suppose we can't have that without an on-disk compatibility break.

The patch actually does change it exactly that way.

Oh. I hadn't looked at the patch, but I had (mis)read what Robert said
to think that you were proposing introducing InvalidCommandId = 0xFFFFFFFF
while leaving FirstCommandId alone. That would make more sense to me as
(1) it doesn't change the interpretation of anything that's (likely to be)
on disk; (2) it allows the check for overflow in CommandCounterIncrement
to not involve recovering from an *actual* overflow. With the horsing
around we've been seeing from the gcc boys lately

Ok, I can do it that way. LCR obviously shouldn't care.

I don't have a warm
feeling about whether they won't break that test someday on the grounds
that "overflow is undefined behavior".

Unsigned overflow is pretty strictly defined, so I don't see much danger
there. Also, we'd feel the pain pretty definitely with xids...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#85Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#84)
1 attachment(s)
Re: lcr v5 - introduction of InvalidCommandId

On 2013-09-05 21:02:44 +0200, Andres Freund wrote:

On 2013-09-05 14:37:01 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-09-05 14:21:33 -0400, Tom Lane wrote:

Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
but I suppose we can't have that without an on-disk compatibility break.

The patch actually does change it exactly that way.

Oh. I hadn't looked at the patch, but I had (mis)read what Robert said
to think that you were proposing introducing InvalidCommandId = 0xFFFFFFFF
while leaving FirstCommandId alone. That would make more sense to me as
(1) it doesn't change the interpretation of anything that's (likely to be)
on disk; (2) it allows the check for overflow in CommandCounterIncrement
to not involve recovering from an *actual* overflow. With the horsing
around we've been seeing from the gcc boys lately

Ok, I can do it that way. LCR obviously shouldn't care.

It doesn't care to the point that the patch already does exactly what
you propose. It's just my memory that remembered things differently.

So, a very slightly updated patch attached.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Introduce-InvalidCommandId.patchtext/x-patch; charset=us-asciiDownload
>From 0592af4ae2e5a2bb1e4919560ab2768a18d88dd4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH] Introduce InvalidCommandId

Do so by reducing the limit of allowed CommandIds by one and declare ~0 as
InvalidCommandId.

This decreases the possible number of subtransactions by one which seems
unproblematic. Its also not a problem for pg_upgrade because cmin/cmax are
never looked at outside the context of their own transaction.
---
 src/backend/access/transam/xact.c | 4 ++--
 src/include/c.h                   | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 31e868d..0591f3f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -766,12 +766,12 @@ CommandCounterIncrement(void)
 	if (currentCommandIdUsed)
 	{
 		currentCommandId += 1;
-		if (currentCommandId == FirstCommandId) /* check for overflow */
+		if (currentCommandId == InvalidCommandId)
 		{
 			currentCommandId -= 1;
 			ereport(ERROR,
 					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-					 errmsg("cannot have more than 2^32-1 commands in a transaction")));
+					 errmsg("cannot have more than 2^32-2 commands in a transaction")));
 		}
 		currentCommandIdUsed = false;
 
diff --git a/src/include/c.h b/src/include/c.h
index 5961183..14bfdcd 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -368,6 +368,7 @@ typedef uint32 MultiXactOffset;
 typedef uint32 CommandId;
 
 #define FirstCommandId	((CommandId) 0)
+#define InvalidCommandId	(~(CommandId)0)
 
 /*
  * Array indexing support
-- 
1.8.3.251.g1462b67

#86Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#85)
Re: lcr v5 - introduction of InvalidCommandId

On Thu, Sep 5, 2013 at 3:23 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Oh. I hadn't looked at the patch, but I had (mis)read what Robert said
to think that you were proposing introducing InvalidCommandId = 0xFFFFFFFF
while leaving FirstCommandId alone. That would make more sense to me as
(1) it doesn't change the interpretation of anything that's (likely to be)
on disk; (2) it allows the check for overflow in CommandCounterIncrement
to not involve recovering from an *actual* overflow. With the horsing
around we've been seeing from the gcc boys lately

Ok, I can do it that way. LCR obviously shouldn't care.

It doesn't care to the point that the patch already does exactly what
you propose. It's just my memory that remembered things differently.

So, a very slightly updated patch attached.

Committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#87Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#86)
Re: lcr v5 - introduction of InvalidCommandId

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Sep 5, 2013 at 3:23 PM, Andres Freund <andres@2ndquadrant.com> wrote:

So, a very slightly updated patch attached.

Committed.

Hmm ... shouldn't this patch adjust the error messages in
CommandCounterIncrement? We just took away one possible command.
It's pretty nitpicky, especially since many utility commands do
more than one CommandCounterIncrement, but still ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#88Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#87)
Re: lcr v5 - introduction of InvalidCommandId

On 2013-09-09 18:43:51 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Sep 5, 2013 at 3:23 PM, Andres Freund <andres@2ndquadrant.com> wrote:

So, a very slightly updated patch attached.

Committed.

Hmm ... shouldn't this patch adjust the error messages in
CommandCounterIncrement? We just took away one possible command.
It's pretty nitpicky, especially since many utility commands do
more than one CommandCounterIncrement, but still ...

Hm. You're talking about "cannot have more than 2^32-2 commands in a
transaction"? If so, the patch and the commit seem to have adjusted that?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#89Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#88)
Re: lcr v5 - introduction of InvalidCommandId

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-09-09 18:43:51 -0400, Tom Lane wrote:

Hmm ... shouldn't this patch adjust the error messages in
CommandCounterIncrement?

Hm. You're talking about "cannot have more than 2^32-2 commands in a
transaction"? If so, the patch and the commit seem to have adjusted that?

Oh! That's what I get for going on memory instead of re-reading the
commit. Sorry, never mind the noise.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#90Kevin Grittner
kgrittn@ymail.com
In reply to: Andres Freund (#78)
Re: lcr v5 - primary/candidate key in relcache

Andres Freund <andres@2ndquadrant.com> wrote:

Robert Haas wrote:

Andres Freund <andres@2ndquadrant.com> wrote:

0007 wal_decoding: Add information about a tables primary key to
  struct RelationData
* Could be used in the matview refresh code

I think you and Kevin should discuss whether this is actually the
right way to do this.  ISTM that if logical replication and
materialized views end up selecting different approaches to this
problem, everybody loses.

The patch we're discussion here adds a new struct RelationData field
called 'rd_primary' (should possibly be renamed) which contains
information about the "best" candidate key available for a table.

From the header comments:
     /*
     * The 'best' primary or candidate key that has been found, only set
     * correctly if RelationGetIndexList has been called/rd_indexvalid > 0.
     *
     * Indexes are chosen in the following order:
     * * Primary Key
     * * oid index
     * * the first (OID order) unique, immediate, non-partial and
     *  non-expression index over one or more NOT NULL'ed columns
     */
     Oid rd_primary;

I thought we could use that in matview.c:refresh_by_match_merge() to
select a more efficient diff if rd_primary has a valid index. In that
case you only'd need to compare that index's fields which should result
in an more efficient plan.

Maybe it's also useful in other cases for you?

If it's relevant at all, would you like to have a different priority
list than the one above?

My first thought was that it was necessary to use all unique,
immediate, non-partial, non-expression indexes to avoid getting
errors on the UPDATE phase of the concurrent refresh for transient
duplicates; but then I remembered that I had to give up on that and
do it all with DELETE followed by INSERT, which eliminates that
risk.  As things now stand the *existence* of any unique,
non-partial, non-expression index (note that immediate is not
needed) is sufficient for correctness.  We could now even drop that,
I think, if we added a duplicate check at the end in the absence of
such an index.

The reason I left it comparing columns from *all* such indexes is
that it gives the optimizer the chance to pick the one that looks
fastest.  With the upcoming patch that can add some extra
"equality" comparisons in addition to the "identical" comparisons
the patch uses, so the mechanism you propose might be a worthwhile
optimization for some cases as long as it does a good job of
picking *the fastest* such index.  The above method of choosing an
index doesn't seem to necessarily ensure that.

Also, if you need to include the "immediate" test, it could not be
used for RMVC without "fallback" code if this mechanism didn't find
an appropriate index.  Of course, that would satisfy those who
would like to relax the requirement for a unique index on the MV to
be able to use RMVC.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers