logical changeset generation v4

Started by Andres Freundalmost 13 years ago62 messages

andres@2ndquadrant.com

almost 13 years ago

19 attachment(s)

Hi everyone,

Here is the newest version of logical changeset generation.

Changes since last time round:
* loads and loads of bugfixes
* crash/restart persistency of in-memory structures in a crash safe manner
* very large transaction support (spooling to disk)
* rebased onto the newest version of xlogreader

Overview over the patches:

Xlogreader (separate patch):
[01]: Centralize Assert* macros into c.h so its common between backend/frontend
[02]: Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
[03]: Split out xlog reading into its own module called xlogreader
[04]: Add pg_xlogdump contrib module

Those seem to be ready baring some infrastructure work around common
backend/frontend code for xlogdump.

Add capability to map from (tablespace, relfilenode) to pg_class.oid:
[05]: Add a new RELFILENODE syscache to fetch a pg_class entry via (reltablespace, relfilenode)
[06]: Add RelationMapFilenodeToOid function to relmapper.c
[07]: Add pg_relation_by_filenode to lookup up a relation by (tablespace, filenode)

Imo those are pretty solid although there are some doubts about the
correctness of [05]Add a new RELFILENODE syscache to fetch a pg_class entry via (reltablespace, relfilenode) which I think are all fixed in this version:

The fundamental problem of adding a (tablespace, relfilenode) syscache
is that no unique index exists in pg_class over (relfilenode,
reltablespace) because relfilenode is set to a '0' (aka InvalidOid) when
the table is either a shared table or a nailed table. This cannot really
be changed as pg_class.relfilenode is not authoritative for those and
can possibly not even accessed (different table, early startup). We also
don't want to rely on the null bitmap, so we can't set it to NULL.

The reason why I think it is safe to use the added RELFILENODE syscache
as I have in those patches is that when looking a (tablespace, filenode)
pair up none of those duplicat '0' values will get looked up as there is
no point in looking up an invalid relfilenode. Instead the shared/nailed
relfilenodes will have to get mapped via RelationMapFilenodeToOid.

The alternative here seems to be to invent an own attoptcache style but
given that the above syscache is fairly performance critical and should
do invalidations in a sensible manner that seems to be an unnecessary
amount of code.

Any opinions here?

[08]: wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement

Its useful to represent values that are not a valid CommandId. Add such
a representation.
Imo this is straightforward and easy.

[09]: Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader

For timetravel access to the catalog we need to be able to lookup (cmin,
cmax) pairs of catalog rows when were 'inside' that TX. This patch just
adapts the signature of the *Satisfies routines to expect a HeapTuple
instead of a HeapTupleHeader. The amount of changes for that is fairly
low as the HeapTupleSatisfiesVisibility macro already expected the
former.

It also makes sure the HeapTuple fields are setup in the few places that
didn't already do so.

[10]: wal_decoding: Allow walsender's to connect to a specific database

For logical decoding we need to be able access the schema of a database
- for that we need to be connected to a database. Thus allow recovery
connections to connect to a specific database.

This patch currently has the disadvantage that its not possible anymore
to connect to a database thats actually named "replication" as the
decision whether a connection goes to a database or not is made based
uppon dbname != replication.

Better ideas?

[11]: wal_decoding: Add alreadyLocked parameter to GetOldestXminNoLock

Pretty boring preparatory for being able to nail a certain xid as the
global horizon. I don't think there is much to be said about this
anymore, it already has been somewhat discussed.

[12]: wal_decodign: Log xl_running_xact's at a higher frequency than checkpoints are done

Make the bgwriter emit a xl_running_xacts record every 15s if there is
xlog activity in the system.
Imo this isn't too complicated and already beneficial for HS so it could
be applied separately.

[13]: copydir: make fsync_fname public

This should probably go to some other file, fd.[ch]? Otherwise its
pretty trivial.

[14]: wal decoding: Add information about a tables primary key to struct RelationData

Back when discussing the first prototype of BDR Heikki was concerned of
doing a search for the primary key during heap_delete. I agree that that
isn't really a good idea.
So remember the primary key (or a candidate key) when looking through
the available indexes in RelationGetIndexList().

I don't really like the name rd_primary as it also contains candidate
keys (i.e. indimmediate, inunique, noexpression, notnull), better
suggestions?

I don't think there is too much debatable in here, but there is no
independent benefit of applying it separately.

[15]: wal decoding: Introduce wal decoding via catalog timetravel

The heart of changeset generation.

Built out of several parts:

* snapshot building infrastructure
* transaction reassembly
* shared memory state for replication slots
* new wal_level=logical that catches more data
* (local) output plugin interface
* (external) walsender interface

[16]: wal decoding: Add a simple decoding module in contrib named 'test_decoding'

An example output plugin also used in regression tests

[17]: wal decoding: Introduce pg_receivellog, the pg_receivexlog equivalent for logical changes

An application to receive changes over the walsender/replication=1
interface.

[18]: wal_decoding: Add test_logical_replication extension for easier testing of logical decoding

An extension that allows to use logical decoding from sql. This isn't
really suitable for production, high performance use but its usefor for
development and more importantly it makes it relatively easy to write
regression tests without new infrastructure.

I am starting to be happy about the state of this!

Open issues & questions:
1) testing infrastructure
2) Determination of replication slots
3) Options for output plugins
4) the general walsender interface
5) Additional catalog tables

1) Currently all the tests are in patch [18]wal_decoding: Add test_logical_replication extension for easier testing of logical decoding which is a contrib
module. There are two reasons for putting them there:

First the tests need wal_level=logical which isn't the case with the
current regression tests.

Second, I don't think the test_logical_replication functions should live
in core as they shouldn't be used for a production replication scenario
(causes longrunning transactions, requires polling) , but I have failed
to find a neat way to include a contrib extension in the plain
regression tests.

2) Currently the logical replication infrastructure assigns a 'slot-id'
when a new replica is setup. That slot id isn't really nice
(e.g. "id-321578-3"). It also requires that [18]wal_decoding: Add test_logical_replication extension for easier testing of logical decoding keeps state in a global
variable to make writing regression tests easy.

I think it would be better to make the user specify those replication
slot ids, but I am not really sure about it.

3) Currently no options can be passed to an output plugin. I am thinking
about making "INIT_LOGICAL_REPLICATION 'plugin'" accept the now widely
used ('option' ['value'], ...) syntax and pass that to the output
plugin's initialization function.

4) Does anybody object to:
-- allocate a permanent replication slot
INIT_LOGICAL_REPLICATION 'plugin' 'slotname' (options);

-- stream data
START_LOGICAL_REPLICATION 'slotname' 'recptr';

-- deallocate a permanent replication slot
FREE_LOGICAL_REPLICATION 'slotname';

5) Currently its only allowed to access catalog tables, its fairly
trivial to extend this to additional tables if you can accept some
(noticeable but not too big) overhead for modifications on those tables.

I was thinking of making that an option for tables, that would be useful
for replication solutions configuration tables.

Further todo:
* don't reread so much WAL after a restart/crash
* better TOAST support, the current one can fail after A DROP TABLE
* only peg a new "catalog xmin" instead of the global xmin
* more docs about the internals
* nicer interface between snapbuild.c, reorderbuffer.c, decode.c and the
outside. There have been improvements vs 3.1 but not enough
* abort too old replication slots

Puh.

The current git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4

The xlogreader development happens xlogreader_4.

Input?

Greetings,

Andres Freund

PS: Thanks for the input & help so far!

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0010-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patchtext/x-patch; charset=us-asciiDownload

>From 12f4329b2c31eee6d2d93e42e0f52c411dab9d8d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 13 Nov 2012 12:54:36 +0100
Subject: [PATCH 10/19] wal_decoding: Allow walsender's to connect to a
 specific database

Currently the decision whether to connect to a database or not is made by
checking whether the passed "dbname" parameter is "replication". Unfortunately
this makes it impossible to connect a to a database named replication...

This is useful for future walsender commands which need database interaction.
---
 src/backend/postmaster/postmaster.c                |  7 ++++--
 .../libpqwalreceiver/libpqwalreceiver.c            |  4 ++--
 src/backend/replication/walsender.c                | 27 ++++++++++++++++++----
 src/backend/utils/init/postinit.c                  |  5 ++++
 src/bin/pg_basebackup/pg_basebackup.c              |  4 ++--
 src/bin/pg_basebackup/pg_receivexlog.c             |  4 ++--
 src/bin/pg_basebackup/receivelog.c                 |  4 ++--
 7 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 15c2320..53a3988 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1953,10 +1953,13 @@ retry1:
 	if (strlen(port->user_name) >= NAMEDATALEN)
 		port->user_name[NAMEDATALEN - 1] = '\0';
 
-	/* Walsender is not related to a particular database */
-	if (am_walsender)
+	/* Generic Walsender is not related to a particular database */
+	if (am_walsender && strcmp(port->database_name, "replication") == 0)
 		port->database_name[0] = '\0';
 
+	if (am_walsender)
+		elog(WARNING, "connecting to %s", port->database_name);
+
 	/*
 	 * Done putting stuff in TopMemoryContext.
 	 */
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 7374489..e46a060 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -130,7 +130,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
 						"the primary server: %s",
 						PQerrorMessage(streamConn))));
 	}
-	if (PQnfields(res) != 3 || PQntuples(res) != 1)
+	if (PQnfields(res) != 4 || PQntuples(res) != 1)
 	{
 		int			ntuples = PQntuples(res);
 		int			nfields = PQnfields(res);
@@ -138,7 +138,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
 		PQclear(res);
 		ereport(ERROR,
 				(errmsg("invalid response from primary server"),
-				 errdetail("Expected 1 tuple with 3 fields, got %d tuples with %d fields.",
+				 errdetail("Expected 1 tuple with 4 fields, got %d tuples with %d fields.",
 						   ntuples, nfields)));
 	}
 	primary_sysid = PQgetvalue(res, 0, 0);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index ad7d1c9..2284d58 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -46,6 +46,7 @@
 #include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_type.h"
+#include "commands/dbcommands.h"
 #include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -237,10 +238,12 @@ IdentifySystem(void)
 	char		tli[11];
 	char		xpos[MAXFNAMELEN];
 	XLogRecPtr	logptr;
+	char*        dbname = NULL;
 
 	/*
-	 * Reply with a result set with one row, three columns. First col is
-	 * system ID, second is timeline ID, and third is current xlog location.
+	 * Reply with a result set with one row, four columns. First col is system
+	 * ID, second is timeline ID, third is current xlog location and the fourth
+	 * contains the database name if we are connected to one.
 	 */
 
 	snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
@@ -259,9 +262,14 @@ IdentifySystem(void)
 
 	snprintf(xpos, sizeof(xpos), "%X/%X", (uint32) (logptr >> 32), (uint32) logptr);
 
+	if (MyDatabaseId != InvalidOid)
+		dbname = get_database_name(MyDatabaseId);
+	else
+		dbname = "(none)";
+
 	/* Send a RowDescription message */
 	pq_beginmessage(&buf, 'T');
-	pq_sendint(&buf, 3, 2);		/* 3 fields */
+	pq_sendint(&buf, 4, 2);		/* 4 fields */
 
 	/* first field */
 	pq_sendstring(&buf, "systemid");	/* col name */
@@ -289,17 +297,28 @@ IdentifySystem(void)
 	pq_sendint(&buf, -1, 2);
 	pq_sendint(&buf, 0, 4);
 	pq_sendint(&buf, 0, 2);
+
+	/* fourth field */
+	pq_sendstring(&buf, "dbname");
+	pq_sendint(&buf, 0, 4);
+	pq_sendint(&buf, 0, 2);
+	pq_sendint(&buf, TEXTOID, 4);
+	pq_sendint(&buf, -1, 2);
+	pq_sendint(&buf, 0, 4);
+	pq_sendint(&buf, 0, 2);
 	pq_endmessage(&buf);
 
 	/* Send a DataRow message */
 	pq_beginmessage(&buf, 'D');
-	pq_sendint(&buf, 3, 2);		/* # of columns */
+	pq_sendint(&buf, 4, 2);		/* # of columns */
 	pq_sendint(&buf, strlen(sysid), 4); /* col1 len */
 	pq_sendbytes(&buf, (char *) &sysid, strlen(sysid));
 	pq_sendint(&buf, strlen(tli), 4);	/* col2 len */
 	pq_sendbytes(&buf, (char *) tli, strlen(tli));
 	pq_sendint(&buf, strlen(xpos), 4);	/* col3 len */
 	pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
+	pq_sendint(&buf, strlen(dbname), 4);	/* col4 len */
+	pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
 
 	pq_endmessage(&buf);
 }
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7e21cea..2a93cff 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -728,7 +728,12 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 			ereport(FATAL,
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("must be superuser or replication role to start walsender")));
+	}
 
+	if (am_walsender &&
+	    (in_dbname == NULL || in_dbname[0] == '\0') &&
+	    dboid == InvalidOid)
+	{
 		/* process any options passed in the startup packet */
 		if (MyProcPort != NULL)
 			process_startup_options(MyProcPort, am_superuser);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 6631161..9d2fa6d 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1236,11 +1236,11 @@ BaseBackup(void)
 				progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
-	if (PQntuples(res) != 1 || PQnfields(res) != 3)
+	if (PQntuples(res) != 1 || PQnfields(res) != 4)
 	{
 		fprintf(stderr,
 				_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-				progname, PQntuples(res), PQnfields(res), 1, 3);
+				progname, PQntuples(res), PQnfields(res), 1, 4);
 		disconnect_and_exit(1);
 	}
 	sysidentifier = pg_strdup(PQgetvalue(res, 0, 0));
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index b9ccb62..a0f3efc 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -237,11 +237,11 @@ StreamLog(void)
 				progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
-	if (PQntuples(res) != 1 || PQnfields(res) != 3)
+	if (PQntuples(res) != 1 || PQnfields(res) != 4)
 	{
 		fprintf(stderr,
 				_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-				progname, PQntuples(res), PQnfields(res), 1, 3);
+				progname, PQntuples(res), PQnfields(res), 1, 4);
 		disconnect_and_exit(1);
 	}
 	timeline = atoi(PQgetvalue(res, 0, 1));
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index f4f883c..c9cb834 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -355,11 +355,11 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
 			PQclear(res);
 			return false;
 		}
-		if (PQnfields(res) != 3 || PQntuples(res) != 1)
+		if (PQnfields(res) != 4 || PQntuples(res) != 1)
 		{
 			fprintf(stderr,
 					_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
-					progname, PQntuples(res), PQnfields(res), 1, 3);
+					progname, PQntuples(res), PQnfields(res), 1, 4);
 			PQclear(res);
 			return false;
 		}
-- 
1.7.12.289.g0ce9864.dirty

0011-wal_decoding-Add-alreadyLocked-parameter-to-GetOldes.patchtext/x-patch; charset=us-asciiDownload

>From 54cf27b505efcf5aeeb2b78638e88fab04e66b5b Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 13 Dec 2012 20:47:57 +0100
Subject: [PATCH 11/19] wal_decoding: Add alreadyLocked parameter to
 GetOldestXminNoLock

This is useful because it allows to compute the current OldestXmin while
already holding the procarray lock which enables setting the own xmin horizon
safely.
---
 src/backend/access/transam/xlog.c     |  4 ++--
 src/backend/catalog/index.c           |  3 ++-
 src/backend/commands/analyze.c        |  2 +-
 src/backend/commands/vacuum.c         |  4 ++--
 src/backend/replication/walreceiver.c |  2 +-
 src/backend/storage/ipc/procarray.c   | 16 ++++++++--------
 src/include/storage/procarray.h       |  2 +-
 7 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 310a654..ab7f0e4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6823,7 +6823,7 @@ CreateCheckPoint(int flags)
 	 * StartupSUBTRANS hasn't been called yet.
 	 */
 	if (!RecoveryInProgress())
-		TruncateSUBTRANS(GetOldestXmin(true, false));
+		TruncateSUBTRANS(GetOldestXmin(true, false, false));
 
 	/* Real work is done, but log and update stats before releasing lock. */
 	LogCheckpointEnd(false);
@@ -7107,7 +7107,7 @@ CreateRestartPoint(int flags)
 	 * this because StartupSUBTRANS hasn't been called yet.
 	 */
 	if (EnableHotStandby)
-		TruncateSUBTRANS(GetOldestXmin(true, false));
+		TruncateSUBTRANS(GetOldestXmin(true, false, false));
 
 	/* Real work is done, but log and update before releasing lock. */
 	LogCheckpointEnd(true);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index a29c106..dbee4d5 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2196,7 +2196,8 @@ IndexBuildHeapScan(Relation heapRelation,
 	{
 		snapshot = SnapshotAny;
 		/* okay to ignore lazy VACUUMs here */
-		OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true);
+		OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true,
+								   false);
 	}
 
 	scan = heap_beginscan_strat(heapRelation,	/* relation */
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index ac16284..f5a6af7 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1077,7 +1077,7 @@ acquire_sample_rows(Relation onerel, int elevel,
 	totalblocks = RelationGetNumberOfBlocks(onerel);
 
 	/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
-	OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true);
+	OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true, false);
 
 	/* Prepare for sampling block numbers */
 	BlockSampler_Init(&bs, totalblocks, targrows);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 2d3170a..158d0dc 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -394,7 +394,7 @@ vacuum_set_xid_limits(int freeze_min_age,
 	 * working on a particular table at any time, and that each vacuum is
 	 * always an independent transaction.
 	 */
-	*oldestXmin = GetOldestXmin(sharedRel, true);
+	*oldestXmin = GetOldestXmin(sharedRel, true, false);
 
 	Assert(TransactionIdIsNormal(*oldestXmin));
 
@@ -686,7 +686,7 @@ vac_update_datfrozenxid(void)
 	 * committed pg_class entries for new tables; see AddNewRelationTuple().
 	 * Se we cannot produce a wrong minimum by starting with this.
 	 */
-	newFrozenXid = GetOldestXmin(true, true);
+	newFrozenXid = GetOldestXmin(true, true, false);
 
 	/*
 	 * We must seqscan pg_class to find the minimum Xid, because there is no
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 16cf944..2b77369 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1089,7 +1089,7 @@ XLogWalRcvSendHSFeedback(void)
 	 * Make the expensive call to get the oldest xmin once we are certain
 	 * everything else has been checked.
 	 */
-	xmin = GetOldestXmin(true, false);
+	xmin = GetOldestXmin(true, false, false);
 
 	/*
 	 * Get epoch and adjust if nextXid and oldestXmin are different sides of
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..f59e792 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1100,7 +1100,7 @@ TransactionIdIsActive(TransactionId xid)
  * GetOldestXmin() move backwards, with no consequences for data integrity.
  */
 TransactionId
-GetOldestXmin(bool allDbs, bool ignoreVacuum)
+GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked)
 {
 	ProcArrayStruct *arrayP = procArray;
 	TransactionId result;
@@ -1109,7 +1109,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 	/* Cannot look for individual databases during recovery */
 	Assert(allDbs || !RecoveryInProgress());
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	if (!alreadyLocked)
+		LWLockAcquire(ProcArrayLock, LW_SHARED);
 
 	/*
 	 * We initialize the MIN() calculation with latestCompletedXid + 1. This
@@ -1164,7 +1165,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 		 */
 		TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
 
-		LWLockRelease(ProcArrayLock);
+		if (!alreadyLocked)
+			LWLockRelease(ProcArrayLock);
 
 		if (TransactionIdIsNormal(kaxmin) &&
 			TransactionIdPrecedes(kaxmin, result))
@@ -1172,10 +1174,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 	}
 	else
 	{
-		/*
-		 * No other information needed, so release the lock immediately.
-		 */
-		LWLockRelease(ProcArrayLock);
+		if (!alreadyLocked)
+			LWLockRelease(ProcArrayLock);
 
 		/*
 		 * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
@@ -1249,7 +1249,7 @@ GetMaxSnapshotSubxidCount(void)
  *			older than this are known not running any more.
  *		RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
  *			running transactions, except those running LAZY VACUUM).  This is
- *			the same computation done by GetOldestXmin(true, true).
+ *			the same computation done by GetOldestXmin(true, true, ...).
  *
  * Note: this function should probably not be called with an argument that's
  * not statically allocated (see xip allocation below).
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..fe0bad7 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -49,7 +49,7 @@ extern RunningTransactions GetRunningTransactionData(void);
 
 extern bool TransactionIdIsInProgress(TransactionId xid);
 extern bool TransactionIdIsActive(TransactionId xid);
-extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum);
+extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked);
 extern TransactionId GetOldestActiveTransactionId(void);
 
 extern VirtualTransactionId *GetVirtualXIDsDelayingChkpt(int *nvxids);
-- 
1.7.12.289.g0ce9864.dirty

0012-wal_decodign-Log-xl_running_xact-s-at-a-higher-frequ.patchtext/x-patch; charset=us-asciiDownload

>From a3fb76c6a0982e7115dd0909aaccce4572bb7551 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 10 Dec 2012 12:08:30 +0100
Subject: [PATCH 12/19] wal_decodign: Log xl_running_xact's at a higher
 frequency than checkpoints are done

Do so in the background writer which seems to be the best choice as its
regularly running and shouldn't be busy for too long without getting back into
its main loop.

Also mark xl_standby records as being relevant for async commit so the wal
writer writes them out soonish.

This might also be beneficial for HS as it would make it faster to hit a spot
where no (old) transactions are running anymroe.
---
 src/backend/postmaster/bgwriter.c | 47 +++++++++++++++++++++++++++++++++++++++
 src/backend/storage/ipc/standby.c | 22 +++++++++++++++---
 src/include/storage/standby.h     |  2 +-
 3 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 286ae86..2adb36f 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -54,9 +54,11 @@
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
+#include "storage/standby.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
+#include "utils/timestamp.h"
 
 
 /*
@@ -76,6 +78,10 @@ int			BgWriterDelay = 200;
 static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t shutdown_requested = false;
 
+static TimestampTz last_logged_snap_ts;
+static XLogRecPtr last_logged_snap_recptr = InvalidXLogRecPtr;
+static uint32 log_snap_interval_ms = 15000;
+
 /* Signal handlers */
 
 static void bg_quickdie(SIGNAL_ARGS);
@@ -142,6 +148,12 @@ BackgroundWriterMain(void)
 	CurrentResourceOwner = ResourceOwnerCreate(NULL, "Background Writer");
 
 	/*
+	 * We just started, assume there has been either a shutdown or
+	 * end-of-recovery snapshot.
+	 */
+	last_logged_snap_ts = GetCurrentTimestamp();
+
+	/*
 	 * Create a memory context that we will do all our work in.  We do this so
 	 * that we can reset the context during error recovery and thereby avoid
 	 * possible memory leaks.  Formerly this code just ran in
@@ -276,6 +288,41 @@ BackgroundWriterMain(void)
 		}
 
 		/*
+		 * Log a new xl_running_xacts every now and then so replication can get
+		 * into a consistent state faster and clean up resources more
+		 * frequently. The costs of this are relatively low, so doing it 4
+		 * times a minute seems fine.
+		 *
+		 * We assume the interval for writing xl_running_xacts is significantly
+		 * bigger than BgWriterDelay, so we don't complicate the overall
+		 * timeout handling but just assume we're going to get called often
+		 * enough even if hibernation mode is active. It's not that important
+		 * that log_snap_interval_ms is met strictly.
+		 *
+		 * We do this logging in the bgwriter as its the only process thats run
+		 * regularly and returns to its mainloop all the
+		 * time. E.g. Checkpointer, when active, is barely every in its
+		 * mainloop.
+		 */
+		if (XLogStandbyInfoActive() && !RecoveryInProgress())
+		{
+			TimestampTz timeout = 0;
+			timeout = TimestampTzPlusMilliseconds(last_logged_snap_ts,
+												  log_snap_interval_ms);
+
+			/*
+			 * only log if enough time has passed and some xlog record has been
+			 * inserted.
+			 */
+			if (GetCurrentTimestamp() >= timeout &&
+				last_logged_snap_recptr != GetXLogInsertRecPtr())
+			{
+				last_logged_snap_recptr = LogStandbySnapshot();
+				last_logged_snap_ts = GetCurrentTimestamp();
+			}
+		}
+
+		/*
 		 * Sleep until we are signaled or BgWriterDelay has elapsed.
 		 *
 		 * Note: the feedback control loop in BgBufferSync() expects that we
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index a903f12..deb1850 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -42,7 +42,7 @@ static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlis
 									   ProcSignalReason reason);
 static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
 static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
-static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
+static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
 static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
 
 
@@ -846,10 +846,13 @@ standby_redo(XLogRecPtr lsn, XLogRecord *record)
  * currently running xids, performed by StandbyReleaseOldLocks().
  * Zero xids should no longer be possible, but we may be replaying WAL
  * from a time when they were possible.
+ *
+ * Returns the RecPtr of the last inserted record.
  */
-void
+XLogRecPtr
 LogStandbySnapshot(void)
 {
+	XLogRecPtr recptr;
 	RunningTransactions running;
 	xl_standby_lock *locks;
 	int			nlocks;
@@ -875,8 +878,11 @@ LogStandbySnapshot(void)
 	 */
 	running = GetRunningTransactionData();
 	LogCurrentRunningXacts(running);
+
 	/* GetRunningTransactionData() acquired XidGenLock, we must release it */
 	LWLockRelease(XidGenLock);
+
+	return recptr;
 }
 
 /*
@@ -887,7 +893,7 @@ LogStandbySnapshot(void)
  * is a contiguous chunk of memory and never exists fully until it is
  * assembled in WAL.
  */
-static void
+static XLogRecPtr
 LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 {
 	xl_running_xacts xlrec;
@@ -937,6 +943,16 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 			 CurrRunningXacts->oldestRunningXid,
 			 CurrRunningXacts->latestCompletedXid,
 			 CurrRunningXacts->nextXid);
+
+	/*
+	 * Ensure running xact information is synced to disk not too far in the
+	 * future, logical standby's need this soon after initialization. We don't
+	 * want to stall anything though, so we let the wal writer do it during
+	 * normal operation.
+	 */
+	XLogSetAsyncXactLSN(recptr);
+
+	return recptr;
 }
 
 /*
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 168c14c..ab84584 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -113,6 +113,6 @@ typedef RunningTransactionsData *RunningTransactions;
 extern void LogAccessExclusiveLock(Oid dbOid, Oid relOid);
 extern void LogAccessExclusiveLockPrepare(void);
 
-extern void LogStandbySnapshot(void);
+extern XLogRecPtr LogStandbySnapshot(void);
 
 #endif   /* STANDBY_H */
-- 
1.7.12.289.g0ce9864.dirty

0001-Centralize-Assert-macros-into-c.h-so-its-common-betw.patchtext/x-patch; charset=us-asciiDownload

>From 4cec3fe09d714483e0bc05b53fc20501cffe951c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 8 Jan 2013 17:59:10 +0100
Subject: [PATCH 01/19] Centralize Assert* macros into c.h so its common
 between backend/frontend

c.h already had parts of the assert support (StaticAssert*) and its the shared
file between postgres.h and postgres_fe.h. This makes it easier to build
frontend programs which have to do the hack.
---
 src/include/c.h           | 65 +++++++++++++++++++++++++++++++++++++++++++++++
 src/include/postgres.h    | 54 ++-------------------------------------
 src/include/postgres_fe.h | 12 ---------
 3 files changed, 67 insertions(+), 64 deletions(-)

diff --git a/src/include/c.h b/src/include/c.h
index 59af5b5..57664e8 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -694,6 +694,71 @@ typedef NameData *Name;
 
 
 /*
+ * USE_ASSERT_CHECKING, if defined, turns on all the assertions.
+ * - plai  9/5/90
+ *
+ * It should _NOT_ be defined in releases or in benchmark copies
+ */
+
+/*
+ * Assert() can be used in both frontend and backend code. In frontend code it
+ * just calls the standard assert, if it's available. If use of assertions is
+ * not configured, it does nothing.
+ */
+#ifndef USE_ASSERT_CHECKING
+
+#define Assert(condition)
+#define AssertMacro(condition)	((void)true)
+#define AssertArg(condition)
+#define AssertState(condition)
+
+#elif defined FRONTEND
+
+#include <assert.h>
+#define Assert(p) assert(p)
+#define AssertMacro(p)	((void) assert(p))
+
+#else /* USE_ASSERT_CHECKING && FRONTEND */
+
+/*
+ * Trap
+ *		Generates an exception if the given condition is true.
+ */
+#define Trap(condition, errorType) \
+	do { \
+		if ((assert_enabled) && (condition)) \
+			ExceptionalCondition(CppAsString(condition), (errorType), \
+								 __FILE__, __LINE__); \
+	} while (0)
+
+/*
+ *	TrapMacro is the same as Trap but it's intended for use in macros:
+ *
+ *		#define foo(x) (AssertMacro(x != 0), bar(x))
+ *
+ *	Isn't CPP fun?
+ */
+#define TrapMacro(condition, errorType) \
+	((bool) ((! assert_enabled) || ! (condition) || \
+			 (ExceptionalCondition(CppAsString(condition), (errorType), \
+								   __FILE__, __LINE__), 0)))
+
+#define Assert(condition) \
+		Trap(!(condition), "FailedAssertion")
+
+#define AssertMacro(condition) \
+		((void) TrapMacro(!(condition), "FailedAssertion"))
+
+#define AssertArg(condition) \
+		Trap(!(condition), "BadArgument")
+
+#define AssertState(condition) \
+		Trap(!(condition), "BadState")
+
+#endif /* USE_ASSERT_CHECKING && !FRONTEND */
+
+
+/*
  * Macros to support compile-time assertion checks.
  *
  * If the "condition" (a compile-time-constant expression) evaluates to false,
diff --git a/src/include/postgres.h b/src/include/postgres.h
index b6e922f..bbe125a 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -25,7 +25,7 @@
  *	  -------	------------------------------------------------
  *		1)		variable-length datatypes (TOAST support)
  *		2)		datum type + support macros
- *		3)		exception handling definitions
+ *		3)		exception handling
  *
  *	 NOTES
  *
@@ -627,62 +627,12 @@ extern Datum Float8GetDatum(float8 X);
 
 
 /* ----------------------------------------------------------------
- *				Section 3:	exception handling definitions
- *							Assert, Trap, etc macros
+ *				Section 3:	exception handling backend support
  * ----------------------------------------------------------------
  */
 
 extern PGDLLIMPORT bool assert_enabled;
 
-/*
- * USE_ASSERT_CHECKING, if defined, turns on all the assertions.
- * - plai  9/5/90
- *
- * It should _NOT_ be defined in releases or in benchmark copies
- */
-
-/*
- * Trap
- *		Generates an exception if the given condition is true.
- */
-#define Trap(condition, errorType) \
-	do { \
-		if ((assert_enabled) && (condition)) \
-			ExceptionalCondition(CppAsString(condition), (errorType), \
-								 __FILE__, __LINE__); \
-	} while (0)
-
-/*
- *	TrapMacro is the same as Trap but it's intended for use in macros:
- *
- *		#define foo(x) (AssertMacro(x != 0), bar(x))
- *
- *	Isn't CPP fun?
- */
-#define TrapMacro(condition, errorType) \
-	((bool) ((! assert_enabled) || ! (condition) || \
-			 (ExceptionalCondition(CppAsString(condition), (errorType), \
-								   __FILE__, __LINE__), 0)))
-
-#ifndef USE_ASSERT_CHECKING
-#define Assert(condition)
-#define AssertMacro(condition)	((void)true)
-#define AssertArg(condition)
-#define AssertState(condition)
-#else
-#define Assert(condition) \
-		Trap(!(condition), "FailedAssertion")
-
-#define AssertMacro(condition) \
-		((void) TrapMacro(!(condition), "FailedAssertion"))
-
-#define AssertArg(condition) \
-		Trap(!(condition), "BadArgument")
-
-#define AssertState(condition) \
-		Trap(!(condition), "BadState")
-#endif   /* USE_ASSERT_CHECKING */
-
 extern void ExceptionalCondition(const char *conditionName,
 					 const char *errorType,
 			 const char *fileName, int lineNumber) __attribute__((noreturn));
diff --git a/src/include/postgres_fe.h b/src/include/postgres_fe.h
index af31227..0f35ecc 100644
--- a/src/include/postgres_fe.h
+++ b/src/include/postgres_fe.h
@@ -24,16 +24,4 @@
 
 #include "c.h"
 
-/*
- * Assert() can be used in both frontend and backend code. In frontend code it
- * just calls the standard assert, if it's available. If use of assertions is
- * not configured, it does nothing.
- */
-#ifdef USE_ASSERT_CHECKING
-#include <assert.h>
-#define Assert(p) assert(p)
-#else
-#define Assert(p)
-#endif
-
 #endif   /* POSTGRES_FE_H */
-- 
1.7.12.289.g0ce9864.dirty

0002-Provide-a-common-malloc-wrappers-and-palloc-et-al.-e.patch.gzapplication/x-patch-gzipDownload

0003-Split-out-xlog-reading-into-its-own-module-called-xl.patch.gzapplication/x-patch-gzipDownload

0004-Add-pg_xlogdump-contrib-module.patchtext/x-patch; charset=us-asciiDownload

>From c414374faf290d6216dc5fb166b800b08b196fd2 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 8 Jan 2013 18:27:12 +0100
Subject: [PATCH 04/19] Add pg_xlogdump contrib module

Authors: Andres Freund, Heikki Linnakangas
---
 contrib/Makefile                  |   1 +
 contrib/pg_xlogdump/Makefile      |  37 +++
 contrib/pg_xlogdump/compat.c      |  58 ++++
 contrib/pg_xlogdump/pg_xlogdump.c | 656 ++++++++++++++++++++++++++++++++++++++
 contrib/pg_xlogdump/tables.c      |  78 +++++
 doc/src/sgml/ref/allfiles.sgml    |   1 +
 doc/src/sgml/ref/pg_xlogdump.sgml |  76 +++++
 doc/src/sgml/reference.sgml       |   1 +
 src/backend/access/transam/rmgr.c |   1 +
 src/backend/catalog/catalog.c     |   2 +
 src/tools/msvc/Mkvcbuild.pm       |  16 +-
 11 files changed, 926 insertions(+), 1 deletion(-)
 create mode 100644 contrib/pg_xlogdump/Makefile
 create mode 100644 contrib/pg_xlogdump/compat.c
 create mode 100644 contrib/pg_xlogdump/pg_xlogdump.c
 create mode 100644 contrib/pg_xlogdump/tables.c
 create mode 100644 doc/src/sgml/ref/pg_xlogdump.sgml

diff --git a/contrib/Makefile b/contrib/Makefile
index fcd7c1e..5d290b8 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -39,6 +39,7 @@ SUBDIRS = \
 		pg_trgm		\
 		pg_upgrade	\
 		pg_upgrade_support \
+		pg_xlogdump	\
 		pgbench		\
 		pgcrypto	\
 		pgrowlocks	\
diff --git a/contrib/pg_xlogdump/Makefile b/contrib/pg_xlogdump/Makefile
new file mode 100644
index 0000000..1adef35
--- /dev/null
+++ b/contrib/pg_xlogdump/Makefile
@@ -0,0 +1,37 @@
+# contrib/pg_xlogdump/Makefile
+
+PGFILEDESC = "pg_xlogdump"
+PGAPPICON=win32
+
+PROGRAM = pg_xlogdump
+OBJS =  pg_xlogdump.o compat.o tables.o xlogreader.o $(RMGRDESCOBJS) \
+	$(WIN32RES)
+
+# XXX: Perhaps this should be done by a wildcard rule so that you don't need
+# to remember to add new rmgrdesc files to this list.
+RMGRDESCSOURCES = clogdesc.c dbasedesc.c gindesc.c gistdesc.c hashdesc.c \
+	heapdesc.c mxactdesc.c nbtdesc.c relmapdesc.c seqdesc.c smgrdesc.c \
+	spgdesc.c standbydesc.c tblspcdesc.c xactdesc.c xlogdesc.c
+
+RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
+
+EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_xlogdump
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+
+xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/%
+	rm -f $@ && $(LN_S) $< .
+
+$(RMGRDESCSOURCES): % : $(top_srcdir)/src/backend/access/rmgrdesc/%
+	rm -f $@ && $(LN_S) $< .
diff --git a/contrib/pg_xlogdump/compat.c b/contrib/pg_xlogdump/compat.c
new file mode 100644
index 0000000..80a83f6
--- /dev/null
+++ b/contrib/pg_xlogdump/compat.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * compat.c
+ *		Reimplementations of various backend functions.
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pg_xlogdump/compat.c
+ *
+ * This file contains client-side implementations for various backend
+ * functions that the rm_desc functions in *desc.c files rely on.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/* ugly hack, same as in e.g pg_controldata */
+#define FRONTEND 1
+#include "postgres.h"
+
+#include "catalog/catalog.h"
+#include "datatype/timestamp.h"
+#include "lib/stringinfo.h"
+#include "storage/relfilenode.h"
+#include "utils/timestamp.h"
+#include "utils/datetime.h"
+
+const char *
+timestamptz_to_str(TimestampTz dt)
+{
+	return "unimplemented-timestamp";
+}
+
+char *
+relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum)
+{
+	return pstrdup("unimplemented-relpathbackend");
+}
+
+/*
+ * Provide a hacked up compat layer for StringInfos so xlog desc functions can
+ * be linked/called.
+ */
+void
+appendStringInfo(StringInfo str, const char *fmt, ...)
+{
+	va_list		args;
+
+	va_start(args, fmt);
+	vprintf(fmt, args);
+	va_end(args);
+}
+
+void
+appendStringInfoString(StringInfo str, const char *string)
+{
+	appendStringInfo(str, "%s", string);
+}
diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
new file mode 100644
index 0000000..e68058f
--- /dev/null
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -0,0 +1,656 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_xlogdump.c - decode and display WAL
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  contrib/pg_xlogdump/pg_xlogdump.c
+ *-------------------------------------------------------------------------
+ */
+
+/* ugly hack, same as in e.g pg_controldata */
+#define FRONTEND 1
+#include "postgres.h"
+
+#include <unistd.h>
+#include <sys/types.h>
+#include <dirent.h>
+
+#include "access/xlog.h"
+#include "access/xlogreader.h"
+#include "access/rmgr.h"
+#include "access/transam.h"
+
+#include "catalog/catalog.h"
+
+#include "getopt_long.h"
+
+static const char *progname;
+
+typedef struct XLogDumpPrivateData
+{
+	TimeLineID	timeline;
+	char	   *inpath;
+	XLogRecPtr	startptr;
+	XLogRecPtr	endptr;
+
+	/* display options */
+	bool		bkp_details;
+	int			stop_after_records;
+	int			already_displayed_records;
+
+	/* filter options */
+	int         filter_by_rmgr;
+	TransactionId filter_by_xid;
+} XLogDumpPrivateData;
+
+static void fatal_error(const char *fmt, ...)
+__attribute__((format(PG_PRINTF_ATTRIBUTE, 1, 2)));
+
+static void fatal_error(const char *fmt, ...)
+{
+	va_list		args;
+	fflush(stdout);
+
+	fprintf(stderr, "%s: fatal_error: ", progname);
+	va_start(args, fmt);
+	vfprintf(stderr, fmt, args);
+	va_end(args);
+	fputc('\n', stderr);
+	exit(EXIT_FAILURE);
+}
+
+/*
+ * Check whether directory exists and whether we can open it. Keep errno set
+ * error reporting by the caller.
+ */
+static bool
+verify_directory(const char *directory)
+{
+	DIR *dir = opendir(directory);
+	if (dir == NULL)
+		return false;
+	closedir(dir);
+	return true;
+}
+
+static void
+split_path(const char *path, char **dir, char **fname)
+{
+	char *sep;
+
+	/* split filepath into directory & filename */
+	sep = strrchr(path, '/');
+
+	/* directory path */
+	if (sep != NULL)
+	{
+		/* windows doesn't have strndup */
+		*dir = strdup(path);
+		(*dir)[(sep - path) + 1] = '\0';
+		*fname = strdup(sep + 1);
+		}
+	/* local directory */
+	else
+	{
+		*dir = NULL;
+		*fname = strdup(path);
+	}
+}
+
+/*
+ * Try to find the file in several places:
+ * if directory == NULL:
+ *   fname
+ *   XLOGDIR / fname
+ *   $PGDATA / XLOGDIR / fname
+ * else
+ *   directory / fname
+ *   directory / XLOGDIR / fname
+ *
+ * return a read only fd
+ */
+static int
+fuzzy_open_file(const char *directory, const char *fname)
+{
+	int fd = -1;
+	char fpath[MAXPGPATH];
+
+	if (directory == NULL)
+	{
+		const char* datadir;
+
+		/* fname */
+		fd = open(fname, O_RDONLY | PG_BINARY, 0);
+		if (fd < 0 && errno != ENOENT)
+			return -1;
+		else if (fd > 0)
+			return fd;
+
+		/* XLOGDIR / fname */
+		snprintf(fpath, MAXPGPATH, "%s/%s",
+				 XLOGDIR, fname);
+		fd = open(fpath, O_RDONLY | PG_BINARY, 0);
+		if (fd < 0 && errno != ENOENT)
+			return -1;
+		else if (fd > 0)
+			return fd;
+
+		datadir = getenv("PGDATA");
+		/* $PGDATA / XLOGDIR / fname */
+		if (datadir != NULL)
+		{
+			snprintf(fpath, MAXPGPATH, "%s/%s/%s",
+					 datadir, XLOGDIR, fname);
+			fd = open(fpath, O_RDONLY | PG_BINARY, 0);
+			if (fd < 0 && errno != ENOENT)
+				return -1;
+			else if (fd > 0)
+				return fd;
+		}
+	}
+	else
+	{
+		/* directory / fname */
+		snprintf(fpath, MAXPGPATH, "%s/%s",
+				 directory, fname);
+		fd = open(fpath, O_RDONLY | PG_BINARY, 0);
+		if (fd < 0 && errno != ENOENT)
+			return -1;
+		else if (fd > 0)
+			return fd;
+
+		/* directory / XLOGDIR / fname */
+		snprintf(fpath, MAXPGPATH, "%s/%s/%s",
+				 directory, XLOGDIR, fname);
+		fd = open(fpath, O_RDONLY | PG_BINARY, 0);
+		if (fd < 0 && errno != ENOENT)
+			return -1;
+		else if (fd > 0)
+			return fd;
+	}
+	return -1;
+}
+
+/* this should probably be put in a general implementation */
+static void
+XLogDumpXLogRead(const char *directory, TimeLineID timeline_id,
+				 XLogRecPtr startptr, char *buf, Size count)
+{
+	char	   *p;
+	XLogRecPtr	recptr;
+	Size		nbytes;
+
+	static int	sendFile = -1;
+	static XLogSegNo sendSegNo = 0;
+	static uint32 sendOff = 0;
+
+	p = buf;
+	recptr = startptr;
+	nbytes = count;
+
+	while (nbytes > 0)
+	{
+		uint32		startoff;
+		int			segbytes;
+		int			readbytes;
+
+		startoff = recptr % XLogSegSize;
+
+		if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))
+		{
+			char		fname[MAXFNAMELEN];
+
+			/* Switch to another logfile segment */
+			if (sendFile >= 0)
+				close(sendFile);
+
+			XLByteToSeg(recptr, sendSegNo);
+
+			XLogFileName(fname, timeline_id, sendSegNo);
+
+			sendFile = fuzzy_open_file(directory, fname);
+
+			if (sendFile < 0)
+				fatal_error("could not find file \"%s\": %s",
+							fname, strerror(errno));
+			sendOff = 0;
+		}
+
+		/* Need to seek in the file? */
+		if (sendOff != startoff)
+		{
+			if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0)
+			{
+				int		err = errno;
+				char	fname[MAXPGPATH];
+				XLogFileName(fname, timeline_id, sendSegNo);
+
+				fatal_error("could not seek in log segment %s to offset %u: %s",
+							fname, startoff, strerror(err));
+			}
+			sendOff = startoff;
+		}
+
+		/* How many bytes are within this segment? */
+		if (nbytes > (XLogSegSize - startoff))
+			segbytes = XLogSegSize - startoff;
+		else
+			segbytes = nbytes;
+
+		readbytes = read(sendFile, p, segbytes);
+		if (readbytes <= 0)
+		{
+			int		err = errno;
+			char	fname[MAXPGPATH];
+			XLogFileName(fname, timeline_id, sendSegNo);
+
+			fatal_error("could not read from log segment %s, offset %d, length %d: %s",
+						fname, sendOff, segbytes, strerror(err));
+		}
+
+		/* Update state for read */
+		recptr += readbytes;
+
+		sendOff += readbytes;
+		nbytes -= readbytes;
+		p += readbytes;
+	}
+}
+
+static int
+XLogDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+				 char *readBuff, TimeLineID *curFileTLI)
+{
+	XLogDumpPrivateData *private = state->private_data;
+	int			count = XLOG_BLCKSZ;
+
+	if (private->endptr != InvalidXLogRecPtr)
+	{
+		if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+			count = XLOG_BLCKSZ;
+		else if (targetPagePtr + reqLen <= private->endptr)
+			count = private->endptr - targetPagePtr;
+		else
+			return -1;
+	}
+
+	XLogDumpXLogRead(private->inpath, private->timeline, targetPagePtr,
+					 readBuff, count);
+
+	return count;
+}
+
+static void
+XLogDumpDisplayRecord(XLogReaderState *state, XLogRecord *record)
+{
+	XLogDumpPrivateData *config = (XLogDumpPrivateData *) state->private_data;
+	const RmgrData *rmgr = &RmgrTable[record->xl_rmid];
+
+	if (config->filter_by_rmgr != -1 &&
+	    config->filter_by_rmgr != record->xl_rmid)
+		return;
+
+	if (TransactionIdIsValid(config->filter_by_xid) &&
+	    config->filter_by_xid != record->xl_xid)
+		return;
+
+	config->already_displayed_records++;
+
+	printf("xlog record: rmgr: %-11s, record_len: %6u, tot_len: %6u, tx: %10u, lsn: %X/%08X, prev %X/%08X, bkp: %u%u%u%u, desc:",
+			rmgr->rm_name,
+			record->xl_len, record->xl_tot_len,
+			record->xl_xid,
+			(uint32) (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr,
+			(uint32) (record->xl_prev >> 32), (uint32) record->xl_prev,
+			!!(XLR_BKP_BLOCK(0) & record->xl_info),
+			!!(XLR_BKP_BLOCK(1) & record->xl_info),
+			!!(XLR_BKP_BLOCK(2) & record->xl_info),
+			!!(XLR_BKP_BLOCK(3) & record->xl_info));
+
+	/* the desc routine will printf the description directly to stdout */
+	rmgr->rm_desc(NULL, record->xl_info, XLogRecGetData(record));
+
+	putchar('\n');
+
+	if (config->bkp_details)
+	{
+		int		off;
+		char   *blk = (char *) XLogRecGetData(record) + record->xl_len;
+
+		for (off = 0; off < XLR_MAX_BKP_BLOCKS; off++)
+		{
+			BkpBlock	bkpb;
+
+			if (!(XLR_BKP_BLOCK(off) & record->xl_info))
+				continue;
+
+			memcpy(&bkpb, blk, sizeof(BkpBlock));
+			blk += sizeof(BkpBlock);
+			blk += BLCKSZ - bkpb.hole_length;
+
+			printf("\tbackup bkp #%u; rel %u/%u/%u; fork: %s; block: %u; hole: offset: %u, length: %u\n",
+				   off, bkpb.node.spcNode, bkpb.node.dbNode, bkpb.node.relNode,
+				   forkNames[bkpb.fork], bkpb.block, bkpb.hole_offset, bkpb.hole_length);
+		}
+	}
+}
+
+static void
+usage(void)
+{
+	printf("%s: reads/writes postgres transaction logs for debugging.\n\n",
+		   progname);
+	printf("Usage:\n");
+	printf("  %s [OPTION] [STARTSEG [ENDSEG]] \n", progname);
+	printf("\nOptions:\n");
+	printf("  -b, --bkp-details      output detailed information about backup blocks\n");
+	printf("  -e, --end RECPTR       read wal up to RECPTR\n");
+	printf("  -h, --help             show this help, then exit\n");
+	printf("  -n, --limit RECORDS    only display n records, abort afterwards\n");
+	printf("  -p, --path PATH        from where do we want to read? cwd/pg_xlog is the default\n");
+	printf("  -r, --rmgr RMGR        only show records generated by the rmgr RMGR\n");
+	printf("  -s, --start RECPTR     read wal in directory indicated by -p starting at RECPTR\n");
+	printf("  -t, --timeline TLI     which timeline do we want to read, defaults to 1\n");
+	printf("  -V, --version          output version information, then exit\n");
+	printf("  -x, --xid XID          only show records with transactionid XID\n");
+}
+
+int
+main(int argc, char **argv)
+{
+	uint32		xlogid;
+	uint32		xrecoff;
+	XLogReaderState *xlogreader_state;
+	XLogDumpPrivateData private;
+	XLogRecord *record;
+	XLogRecPtr	first_record;
+	char	   *errormsg;
+
+	static struct option long_options[] = {
+		{"bkp-details", no_argument, NULL, 'b'},
+		{"end", required_argument, NULL, 'e'},
+		{"help", no_argument, NULL, '?'},
+		{"limit", required_argument, NULL, 'n'},
+		{"path", required_argument, NULL, 'p'},
+		{"rmgr", required_argument, NULL, 'r'},
+		{"start", required_argument, NULL, 's'},
+		{"timeline", required_argument, NULL, 't'},
+		{"xid", required_argument, NULL, 'x'},
+		{"version", no_argument, NULL, 'V'},
+		{NULL, 0, NULL, 0}
+	};
+
+	int			option;
+	int			optindex = 0;
+
+	progname = get_progname(argv[0]);
+
+	memset(&private, 0, sizeof(XLogDumpPrivateData));
+
+	private.timeline = 1;
+	private.bkp_details = false;
+	private.startptr = InvalidXLogRecPtr;
+	private.endptr = InvalidXLogRecPtr;
+	private.stop_after_records = -1;
+	private.already_displayed_records = 0;
+	private.filter_by_rmgr = -1;
+	private.filter_by_xid = InvalidTransactionId;
+
+	if (argc <= 1)
+	{
+		fprintf(stderr, "%s: no arguments specified\n", progname);
+		goto bad_argument;
+	}
+
+	while ((option = getopt_long(argc, argv, "be:?n:p:r:s:t:Vx:",
+								 long_options, &optindex)) != -1)
+	{
+		switch (option)
+		{
+			case 'b':
+				private.bkp_details = true;
+				break;
+			case 'e':
+				if (sscanf(optarg, "%X/%X", &xlogid, &xrecoff) != 2)
+				{
+					fprintf(stderr, "%s: could not parse parse --end %s\n",
+							progname, optarg);
+					goto bad_argument;
+				}
+				private.endptr = (uint64)xlogid << 32 | xrecoff;
+				break;
+			case '?':
+				usage();
+				exit(EXIT_SUCCESS);
+				break;
+			case 'n':
+				if (sscanf(optarg, "%d", &private.stop_after_records) != 1)
+				{
+					fprintf(stderr, "%s: could not parse parse --limit %s\n",
+							progname, optarg);
+					goto bad_argument;
+				}
+				break;
+			case 'p':
+				private.inpath = strdup(optarg);
+				break;
+			case 'r':
+			{
+				int i;
+				for (i = 0; i < RM_MAX_ID; i++)
+				{
+					if (strcmp(optarg, RmgrTable[i].rm_name) == 0)
+					{
+						private.filter_by_rmgr = i;
+						break;
+					}
+				}
+
+				if (private.filter_by_rmgr == -1)
+				{
+					fprintf(stderr, "%s: --rmgr %s does not exist\n",
+							progname, optarg);
+					goto bad_argument;
+				}
+			}
+			break;
+			case 's':
+				if (sscanf(optarg, "%X/%X", &xlogid, &xrecoff) != 2)
+				{
+					fprintf(stderr, "%s: could not parse parse --end %s\n",
+							progname, optarg);
+					goto bad_argument;
+				}
+				else
+					private.startptr = (uint64)xlogid << 32 | xrecoff;
+				break;
+			case 't':
+				if (sscanf(optarg, "%d", &private.timeline) != 1)
+				{
+					fprintf(stderr, "%s: could not parse timeline --timeline %s\n",
+							progname, optarg);
+					goto bad_argument;
+				}
+				break;
+			case 'V':
+				puts("pg_xlogdump (PostgreSQL) " PG_VERSION);
+				exit(EXIT_SUCCESS);
+				break;
+			case 'x':
+				if (sscanf(optarg, "%u", &private.filter_by_xid) != 1)
+				{
+					fprintf(stderr, "%s: could not parse --xid %s as a valid xid\n",
+							progname, optarg);
+					goto bad_argument;
+				}
+				break;
+			default:
+				goto bad_argument;
+		}
+	}
+
+	if ((optind + 2) < argc)
+	{
+		fprintf(stderr,
+				"%s: too many command-line arguments (first is \"%s\")\n",
+				progname, argv[optind + 2]);
+		goto bad_argument;
+	}
+
+	if (private.inpath != NULL)
+	{
+		/* validdate path points to directory */
+		if (!verify_directory(private.inpath))
+		{
+			fprintf(stderr,
+					"%s: --path %s is cannot be opened: %s\n",
+					progname, private.inpath, strerror(errno));
+			goto bad_argument;
+		}
+	}
+
+	/* parse files as start/end boundaries, extract path if not specified */
+	if (optind < argc)
+	{
+		char *directory = NULL;
+		char *fname = NULL;
+		int fd;
+		XLogSegNo segno;
+
+		split_path(argv[optind], &directory, &fname);
+
+		if (private.inpath == NULL && directory != NULL)
+		{
+			private.inpath = directory;
+
+			if (!verify_directory(private.inpath))
+				fatal_error("cannot open directory %s: %s",
+							private.inpath, strerror(errno));
+		}
+
+		fd = fuzzy_open_file(private.inpath, fname);
+		if (fd < 0)
+			fatal_error("could not open file %s", fname);
+		close(fd);
+
+		/* parse position from file */
+		XLogFromFileName(fname, &private.timeline, &segno);
+
+		if (XLogRecPtrIsInvalid(private.startptr))
+			XLogSegNoOffsetToRecPtr(segno, 0, private.startptr);
+		else if (!XLByteInSeg(private.startptr, segno))
+		{
+			fprintf(stderr,
+					"%s: --end %X/%X is not inside file \"%s\"\n",
+					progname,
+					(uint32)(private.startptr >> 32),
+					(uint32)private.startptr,
+					fname);
+			goto bad_argument;
+		}
+
+		/* no second file specified, set end position */
+		if (!(optind + 1 < argc) && XLogRecPtrIsInvalid(private.endptr))
+			XLogSegNoOffsetToRecPtr(segno + 1, 0, private.endptr);
+
+		/* parse ENDSEG if passed */
+		if (optind + 1 < argc)
+		{
+			XLogSegNo endsegno;
+
+			/* ignore directory, already have that */
+			split_path(argv[optind + 1], &directory, &fname);
+
+			fd = fuzzy_open_file(private.inpath, fname);
+			if (fd < 0)
+				fatal_error("could not open file %s", fname);
+			close(fd);
+
+			/* parse position from file */
+			XLogFromFileName(fname, &private.timeline, &endsegno);
+
+			if (endsegno < segno)
+				fatal_error("ENDSEG %s is before STARSEG %s",
+							argv[optind + 1], argv[optind]);
+
+			if (XLogRecPtrIsInvalid(private.endptr))
+				XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.endptr);
+
+			/* set segno to endsegno for check of --end */
+			segno = endsegno;
+		}
+
+
+		if (!XLByteInSeg(private.endptr, segno) &&
+			private.endptr != (segno + 1) * XLogSegSize)
+		{
+			fprintf(stderr,
+					"%s: --end %X/%X is not inside file \"%s\"\n",
+					progname,
+					(uint32)(private.endptr >> 32),
+					(uint32)private.endptr,
+					argv[argc -1]);
+			goto bad_argument;
+		}
+	}
+
+	/* we don't know what to print */
+	if (XLogRecPtrIsInvalid(private.startptr))
+	{
+		fprintf(stderr, "%s: no --start given in range mode.\n", progname);
+		goto bad_argument;
+	}
+
+	/* done with argument parsing, do the actual work */
+
+	/* we have everything we need, start reading */
+	xlogreader_state = XLogReaderAllocate(private.startptr,
+										  XLogDumpReadPage,
+										  &private);
+
+	/* first find a valid recptr to start from */
+	first_record = XLogFindNextRecord(xlogreader_state, private.startptr);
+
+	if (first_record == InvalidXLogRecPtr)
+		fatal_error("could not find a valid record after %X/%X",
+					(uint32) (private.startptr >> 32),
+					(uint32) private.startptr);
+
+	/*
+	 * Display a message that were skipping data if `from` wasn't a pointer
+	 * to the start of a record and also wasn't a pointer to the beginning
+	 * of a segment (e.g. we were used in file mode).
+	 */
+	if (first_record != private.startptr && (private.startptr % XLogSegSize) != 0)
+		printf("first record is after %X/%X, at %X/%X, skipping over %u bytes\n",
+			   (uint32) (private.startptr >> 32), (uint32) private.startptr,
+			   (uint32) (first_record >> 32), (uint32) first_record,
+			   (uint32) (first_record - private.startptr));
+
+	while ((record = XLogReadRecord(xlogreader_state, first_record, &errormsg)))
+	{
+		/* continue after the last record */
+		first_record = InvalidXLogRecPtr;
+		XLogDumpDisplayRecord(xlogreader_state, record);
+
+		/* check whether we printed enough */
+		if (private.stop_after_records > 0 &&
+			private.already_displayed_records >= private.stop_after_records)
+			break;
+	}
+
+	if (errormsg)
+		fatal_error("error in WAL record at %X/%X: %s\n",
+					(uint32)(xlogreader_state->ReadRecPtr >> 32),
+					(uint32)xlogreader_state->ReadRecPtr,
+					errormsg);
+
+	XLogReaderFree(xlogreader_state);
+
+	return EXIT_SUCCESS;
+bad_argument:
+	fprintf(stderr, "Try \"%s --help\" for more information.\n", progname);
+	return EXIT_FAILURE;
+}
diff --git a/contrib/pg_xlogdump/tables.c b/contrib/pg_xlogdump/tables.c
new file mode 100644
index 0000000..e947e0d
--- /dev/null
+++ b/contrib/pg_xlogdump/tables.c
@@ -0,0 +1,78 @@
+/*-------------------------------------------------------------------------
+ *
+ * tables.c
+ *		Support data for xlogdump.c
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pg_xlogdump/tables.c
+ *
+ * NOTES
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * rmgr.c
+ *
+ * Resource managers definition
+ *
+ * src/backend/access/transam/rmgr.c
+ */
+#include "postgres.h"
+
+#include "access/clog.h"
+#include "access/gin.h"
+#include "access/gist_private.h"
+#include "access/hash.h"
+#include "access/heapam_xlog.h"
+#include "access/multixact.h"
+#include "access/nbtree.h"
+#include "access/spgist.h"
+#include "access/xact.h"
+#include "access/xlog_internal.h"
+#include "catalog/storage_xlog.h"
+#include "commands/dbcommands.h"
+#include "commands/sequence.h"
+#include "commands/tablespace.h"
+#include "storage/standby.h"
+#include "utils/relmapper.h"
+#include "catalog/catalog.h"
+
+/*
+ * Table of fork names.
+ *
+ * needs to be synced with src/backend/catalog/catalog.c
+ */
+const char *forkNames[] = {
+	"main",						/* MAIN_FORKNUM */
+	"fsm",						/* FSM_FORKNUM */
+	"vm",						/* VISIBILITYMAP_FORKNUM */
+	"init"						/* INIT_FORKNUM */
+};
+
+/*
+ * RmgrTable linked only to functions available outside of the backend.
+ *
+ * needs to be synced with src/backend/access/transam/rmgr.c
+ */
+const RmgrData RmgrTable[RM_MAX_ID + 1] = {
+	{"XLOG", NULL, xlog_desc, NULL, NULL, NULL},
+	{"Transaction", NULL, xact_desc, NULL, NULL, NULL},
+	{"Storage", NULL, smgr_desc, NULL, NULL, NULL},
+	{"CLOG", NULL, clog_desc, NULL, NULL, NULL},
+	{"Database", NULL, dbase_desc, NULL, NULL, NULL},
+	{"Tablespace", NULL, tblspc_desc, NULL, NULL, NULL},
+	{"MultiXact", NULL, multixact_desc, NULL, NULL, NULL},
+	{"RelMap", NULL, relmap_desc, NULL, NULL, NULL},
+	{"Standby", NULL, standby_desc, NULL, NULL, NULL},
+	{"Heap2", NULL, heap2_desc, NULL, NULL, NULL},
+	{"Heap", NULL, heap_desc, NULL, NULL, NULL},
+	{"Btree", NULL, btree_desc, NULL, NULL, NULL},
+	{"Hash", NULL, hash_desc, NULL, NULL, NULL},
+	{"Gin", NULL, gin_desc, NULL, NULL, NULL},
+	{"Gist", NULL, gist_desc, NULL, NULL, NULL},
+	{"Sequence", NULL, seq_desc, NULL, NULL, NULL},
+	{"SPGist", NULL, spg_desc, NULL, NULL, NULL}
+};
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index df84054..49cb7ac 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -178,6 +178,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgReceivexlog      SYSTEM "pg_receivexlog.sgml">
 <!ENTITY pgResetxlog        SYSTEM "pg_resetxlog.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
+<!ENTITY pgXlogdump         SYSTEM "pg_xlogdump.sgml">
 <!ENTITY postgres           SYSTEM "postgres-ref.sgml">
 <!ENTITY postmaster         SYSTEM "postmaster.sgml">
 <!ENTITY psqlRef            SYSTEM "psql-ref.sgml">
diff --git a/doc/src/sgml/ref/pg_xlogdump.sgml b/doc/src/sgml/ref/pg_xlogdump.sgml
new file mode 100644
index 0000000..7a27c7b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_xlogdump.sgml
@@ -0,0 +1,76 @@
+<!--
+doc/src/sgml/ref/pg_xlogdump.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="APP-PGXLOGDUMP">
+ <refmeta>
+  <refentrytitle><application>pg_xlogdump</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_xlogdump</refname>
+  <refpurpose>Display the write-ahead log of a <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <indexterm zone="app-pgxlogdump">
+  <primary>pg_xlogdump</primary>
+ </indexterm>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_xlogdump</command>
+   <arg choice="opt"><option>-b</option></arg>
+   <arg choice="opt"><option>-e</option> <replaceable class="parameter">xlogrecptr</replaceable></arg>
+   <arg choice="opt"><option>-f</option> <replaceable class="parameter">filename</replaceable></arg>
+   <arg choice="opt"><option>-h</option></arg>
+   <arg choice="opt"><option>-p</option> <replaceable class="parameter">directory</replaceable></arg>
+   <arg choice="opt"><option>-s</option> <replaceable class="parameter">xlogrecptr</replaceable></arg>
+   <arg choice="opt"><option>-t</option> <replaceable class="parameter">timelineid</replaceable></arg>
+   <arg choice="opt"><option>-v</option></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="R1-APP-PGXLOGDUMP-1">
+  <title>Description</title>
+  <para>
+   <command>pg_xlogdump</command> display the write-ahead log (WAL) and is only
+   useful for debugging or educational purposes.
+  </para>
+
+  <para>
+   This utility can only be run by the user who installed the server, because
+   it requires read access to the data directory. It does not perform any
+   modifications.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options control the location and format of the
+    output.
+
+    <variablelist>
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">directory</replaceable></option></term>
+      <listitem>
+       <para>
+        Directory to find xlog files in.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can give wrong results when the server is running.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 0872168..fed1fdd 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -225,6 +225,7 @@
    &pgDumpall;
    &pgReceivexlog;
    &pgRestore;
+   &pgXlogdump;
    &psqlRef;
    &reindexdb;
    &vacuumdb;
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index cc210a7..4e94af1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -24,6 +24,7 @@
 #include "storage/standby.h"
 #include "utils/relmapper.h"
 
+/* Also update contrib/pg_xlogdump/tables.c if you add something here. */
 
 const RmgrData RmgrTable[RM_MAX_ID + 1] = {
 	{"XLOG", xlog_redo, xlog_desc, NULL, NULL, NULL},
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 9686486..04e0139 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -52,6 +52,8 @@
  * If you add a new entry, remember to update the errhint below, and the
  * documentation for pg_relation_size(). Also keep FORKNAMECHARS above
  * up-to-date.
+ *
+ * Also update contrib/pg_xlogdump/tables.c if you add something here.
  */
 const char *forkNames[] = {
 	"main",						/* MAIN_FORKNUM */
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 47af367..7307af5 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -41,7 +41,7 @@ my $contrib_extraincludes =
 my $contrib_extrasource = {
 	'cube' => [ 'cubescan.l', 'cubeparse.y' ],
 	'seg'  => [ 'segscan.l',  'segparse.y' ] };
-my @contrib_excludes = ('pgcrypto', 'intagg', 'sepgsql');
+my @contrib_excludes = ('intagg', 'pgcrypto', 'pg_xlogdump', 'sepgsql');
 
 sub mkvcbuild
 {
@@ -411,6 +411,20 @@ sub mkvcbuild
 		'localtime.c');
 	$zic->AddReference($libpgport);
 
+	my $pgxlogdump = $solution->AddProject('pg_xlogdump', 'exe', 'contrib');
+	$pgxlogdump->{name} = 'pg_xlogdump';
+	$pgxlogdump->AddIncludeDir('src\backend');
+	$pgxlogdump->AddFiles('contrib\pg_xlogdump',
+		'compat.c', 'pg_xlogdump.c', 'tables.c');
+	$pgxlogdump->AddFile('src\backend\access\transam\xlogreader.c');
+	$pgxlogdump->AddFiles('src\backend\access\rmgrdesc',
+		'clogdesc.c', 'dbasedesc.c', 'gindesc.c', 'gistdesc.c', 'hashdesc.c',
+		'heapdesc.c', 'mxactdesc.c', 'nbtdesc.c', 'relmapdesc.c', 'seqdesc.c',
+		'smgrdesc.c', 'spgdesc.c', 'standbydesc.c', 'tblspcdesc.c',
+		'xactdesc.c', 'xlogdesc.c');
+	$pgxlogdump->AddReference($libpgport);
+	$pgxlogdump->AddDefine('FRONTEND');
+
 	if ($solution->{options}->{xml})
 	{
 		$contrib_extraincludes->{'pgxml'} = [
-- 
1.7.12.289.g0ce9864.dirty

0005-wal_decoding-Add-a-new-RELFILENODE-syscache-to-fetch.patchtext/x-patch; charset=us-asciiDownload

>From d411e69a0c9c05b7ffadf2d9fe6afa1e025377d5 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 5 Apr 2012 11:09:59 +0200
Subject: [PATCH 05/19] wal_decoding: Add a new RELFILENODE syscache to fetch
 a pg_class entry via (reltablespace, relfilenode)

This cache is theoretically problematic because formally indexes used by
syscaches needs to be unique, this one is not. This is "just" because of
0/InvalidOid are stored in pg_class.relfilenode for nailed/shared catalog
relations. This syscache will never be queried for InvalidOid relfilenodes
however so it seems to be safe even if it bends the rules somewhat.

It might be nicer to add infrastructure to do this properly, like using a
partial index, its not clear what the best way to do this is though and the
benefit very well might not be worth the overhead.

Needs a CATVERSION bump.
---
 src/backend/utils/cache/syscache.c | 11 +++++++++++
 src/include/catalog/indexing.h     |  2 ++
 src/include/utils/syscache.h       |  1 +
 3 files changed, 14 insertions(+)

diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bfc3c86..b5fe64f 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -591,6 +591,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		64
 	},
+	{RelationRelationId,		/* RELFILENODE */
+		ClassTblspcRelfilenodeIndexId,
+		2,
+		{
+			Anum_pg_class_reltablespace,
+			Anum_pg_class_relfilenode,
+			0,
+			0
+		},
+		1024
+	},
 	{RelationRelationId,		/* RELNAMENSP */
 		ClassNameNspIndexId,
 		2,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..2a3cd82 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -106,6 +106,8 @@ DECLARE_UNIQUE_INDEX(pg_class_oid_index, 2662, on pg_class using btree(oid oid_o
 #define ClassOidIndexId  2662
 DECLARE_UNIQUE_INDEX(pg_class_relname_nsp_index, 2663, on pg_class using btree(relname name_ops, relnamespace oid_ops));
 #define ClassNameNspIndexId  2663
+DECLARE_INDEX(pg_class_tblspc_relfilenode_index, 3455, on pg_class using btree(reltablespace oid_ops, relfilenode oid_ops));
+#define ClassTblspcRelfilenodeIndexId  3455
 
 DECLARE_UNIQUE_INDEX(pg_collation_name_enc_nsp_index, 3164, on pg_collation using btree(collname name_ops, collencoding int4_ops, collnamespace oid_ops));
 #define CollationNameEncNspIndexId 3164
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index d1d8abe..2a14905 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -75,6 +75,7 @@ enum SysCacheIdentifier
 	PROCNAMEARGSNSP,
 	PROCOID,
 	RANGETYPE,
+	RELFILENODE,
 	RELNAMENSP,
 	RELOID,
 	RULERELNAME,
-- 
1.7.12.289.g0ce9864.dirty

0006-wal_decoding-Add-RelationMapFilenodeToOid-function-t.patchtext/x-patch; charset=us-asciiDownload

>From fea6b2e45cb2caf8d7a4c19f6031bc24b6e47d3b Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 16 Sep 2012 23:51:08 +0200
Subject: [PATCH 06/19] wal_decoding: Add RelationMapFilenodeToOid function to
 relmapper.c

This function maps (reltablespace, relfilenode) to the table oid and thus acts
as a reverse of RelationMapOidToFilenode.
---
 src/backend/utils/cache/relmapper.c | 53 +++++++++++++++++++++++++++++++++++++
 src/include/utils/relmapper.h       |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/src/backend/utils/cache/relmapper.c b/src/backend/utils/cache/relmapper.c
index 2c7d9f3..039aa29 100644
--- a/src/backend/utils/cache/relmapper.c
+++ b/src/backend/utils/cache/relmapper.c
@@ -180,6 +180,59 @@ RelationMapOidToFilenode(Oid relationId, bool shared)
 	return InvalidOid;
 }
 
+/* RelationMapFilenodeToOid
+ *
+ * Do the reverse of the normal direction of mapping done in
+ * RelationMapOidToFilenode.
+ *
+ * This is not supposed to be used during normal running but rather for
+ * information purposes when looking at the filesystem or the xlog.
+ *
+ * Returns InvalidOid if the OID is not know which can easily happen if the
+ * filenode is not of a relation that is nailed or shared or if it simply
+ * doesn't exists anywhere.
+ */
+Oid
+RelationMapFilenodeToOid(Oid filenode, bool shared)
+{
+	const RelMapFile *map;
+	int32		i;
+
+	/* If there are active updates, believe those over the main maps */
+	if (shared)
+	{
+		map = &active_shared_updates;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+		map = &shared_map;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+	}
+	else
+	{
+		map = &active_local_updates;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+		map = &local_map;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+				return map->mappings[i].mapoid;
+		}
+	}
+
+	return InvalidOid;
+}
+
 /*
  * RelationMapUpdateMap
  *
diff --git a/src/include/utils/relmapper.h b/src/include/utils/relmapper.h
index 8f0b438..071bc98 100644
--- a/src/include/utils/relmapper.h
+++ b/src/include/utils/relmapper.h
@@ -36,6 +36,8 @@ typedef struct xl_relmap_update
 
 extern Oid	RelationMapOidToFilenode(Oid relationId, bool shared);
 
+extern Oid	RelationMapFilenodeToOid(Oid relationId, bool shared);
+
 extern void RelationMapUpdateMap(Oid relationId, Oid fileNode, bool shared,
 					 bool immediate);
 
-- 
1.7.12.289.g0ce9864.dirty

0007-wal-decoding-Add-pg_relation_by_filenode-to-lookup-u.patchtext/x-patch; charset=us-asciiDownload

>From 29c11973ac071493bf0aa8bbaa41e0ac7c8b5ea2 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 16 Sep 2012 23:53:23 +0200
Subject: [PATCH 07/19] wal decoding: Add pg_relation_by_filenode to lookup up
 a relation by (tablespace, filenode)

This requires the previously added RELFILENODE syscache and the added
RelationMapFilenodeToOid function added in previous two commits.
---
 doc/src/sgml/func.sgml         | 23 +++++++++++-
 src/backend/utils/adt/dbsize.c | 79 ++++++++++++++++++++++++++++++++++++++++++
 src/include/catalog/pg_proc.h  |  2 ++
 src/include/utils/builtins.h   |  1 +
 4 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 35c7f75..091372d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15176,7 +15176,7 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
 
    <para>
     The functions shown in <xref linkend="functions-admin-dblocation"> assist
-    in identifying the specific disk files associated with database objects.
+    in identifying the specific disk files associated with database objects or doing the reverse.
    </para>
 
    <indexterm>
@@ -15185,6 +15185,9 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
    <indexterm>
     <primary>pg_relation_filepath</primary>
    </indexterm>
+   <indexterm>
+    <primary>pg_relation_by_filenode</primary>
+   </indexterm>
 
    <table id="functions-admin-dblocation">
     <title>Database Object Location Functions</title>
@@ -15213,6 +15216,15 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
         File path name of the specified relation
        </entry>
       </row>
+      <row>
+       <entry>
+        <literal><function>pg_relation_by_filenode(<parameter>tablespace</parameter> <type>oid</type>, <parameter>filenode</parameter> <type>oid</type>)</function></literal>
+        </entry>
+       <entry><type>regclass</type></entry>
+       <entry>
+        Find the associated relation of a filenode
+       </entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
@@ -15236,6 +15248,15 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
     the relation.
    </para>
 
+   <para>
+    <function>pg_relation_by_filenode</> is the reverse of
+    <function>pg_relation_filenode</>. Given a <quote>tablespace</> OID and
+    a <quote>filenode</> it returns the associated relation. The default
+    tablespace for user tables can be replaced with 0. Check the
+    documentation of <function>pg_relation_filenode</> for an explanation why
+    this cannot always easily answered by querying <structname>pg_class</>.
+   </para>
+
   </sect2>
 
   <sect2 id="functions-admin-genfile">
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 89ad386..73c886a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -744,6 +744,85 @@ pg_relation_filenode(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Get the relation via (reltablespace, relfilenode)
+ *
+ * This is expected to be used when somebody wants to match an individual file
+ * on the filesystem back to its table. Thats not trivially possible via
+ * pg_class because that doesn't contain the relfilenodes of shared and nailed
+ * tables.
+ *
+ * We don't fail but return NULL if we cannot find a mapping.
+ *
+ * Instead of knowing DEFAULTTABLESPACE_OID you can pass 0.
+ */
+Datum
+pg_relation_by_filenode(PG_FUNCTION_ARGS)
+{
+	Oid			reltablespace = PG_GETARG_OID(0);
+	Oid			relfilenode = PG_GETARG_OID(1);
+	Oid			lookup_tablespace = reltablespace;
+	Oid         result = InvalidOid;
+	HeapTuple	tuple;
+
+	if (reltablespace == 0)
+		reltablespace = DEFAULTTABLESPACE_OID;
+
+	/* pg_class stores 0 instead of DEFAULTTABLESPACE_OID */
+	if (reltablespace == DEFAULTTABLESPACE_OID)
+		lookup_tablespace = 0;
+
+	tuple = SearchSysCache2(RELFILENODE,
+							lookup_tablespace,
+							relfilenode);
+
+	/* found it in the system catalog, not be a shared/nailed table */
+	if (HeapTupleIsValid(tuple))
+	{
+		result = HeapTupleHeaderGetOid(tuple->t_data);
+		ReleaseSysCache(tuple);
+	}
+	else
+	{
+		if (reltablespace == GLOBALTABLESPACE_OID)
+		{
+			result = RelationMapFilenodeToOid(relfilenode, true);
+		}
+		else
+		{
+			Form_pg_class relform;
+
+			result = RelationMapFilenodeToOid(relfilenode, false);
+
+			if (result != InvalidOid)
+			{
+				/* check that we found the correct relation */
+				tuple = SearchSysCache1(RELOID,
+									result);
+
+				if (!HeapTupleIsValid(tuple))
+				{
+					elog(ERROR, "Couldn't refind previously looked up relation with oid %u",
+						 result);
+				}
+
+				relform = (Form_pg_class) GETSTRUCT(tuple);
+
+				if (relform->reltablespace != reltablespace &&
+					relform->reltablespace != lookup_tablespace)
+					result = InvalidOid;
+
+				ReleaseSysCache(tuple);
+			}
+		}
+	}
+
+	if (!OidIsValid(result))
+		PG_RETURN_NULL();
+	else
+		PG_RETURN_OID(result);
+}
+
+/*
  * Get the pathname (relative to $PGDATA) of a relation
  *
  * See comments for pg_relation_filenode.
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 010605d..d179e49 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3441,6 +3441,8 @@ DATA(insert OID = 2998 ( pg_indexes_size		PGNSP PGUID 12 1 0 0 0 f f f f t f v 1
 DESCR("disk space usage for all indexes attached to the specified table");
 DATA(insert OID = 2999 ( pg_relation_filenode	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 26 "2205" _null_ _null_ _null_ _null_ pg_relation_filenode _null_ _null_ _null_ ));
 DESCR("filenode identifier of relation");
+DATA(insert OID = 3454 ( pg_relation_by_filenode PGNSP PGUID 12 1 0 0 0 f f f f t f s 2 0 2205 "26 26" _null_ _null_ _null_ _null_ pg_relation_by_filenode _null_ _null_ _null_ ));
+DESCR("filenode identifier of relation");
 DATA(insert OID = 3034 ( pg_relation_filepath	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 25 "2205" _null_ _null_ _null_ _null_ pg_relation_filepath _null_ _null_ _null_ ));
 DESCR("file path of relation");
 
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 61d6aef..c5984ad 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -458,6 +458,7 @@ extern Datum pg_table_size(PG_FUNCTION_ARGS);
 extern Datum pg_indexes_size(PG_FUNCTION_ARGS);
 extern Datum pg_relation_filenode(PG_FUNCTION_ARGS);
 extern Datum pg_relation_filepath(PG_FUNCTION_ARGS);
+extern Datum pg_relation_by_filenode(PG_FUNCTION_ARGS);
 
 /* genfile.c */
 extern bytea *read_binary_file(const char *filename,
-- 
1.7.12.289.g0ce9864.dirty

0008-wal_decoding-Introduce-InvalidCommandId-and-declare-.patchtext/x-patch; charset=us-asciiDownload

>From 321a38776fcd10df090f737b722a692b649f969c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 13 Nov 2012 12:18:07 +0100
Subject: [PATCH 08/19] wal_decoding: Introduce InvalidCommandId and declare
 that to be the new maximum for
 CommandCounterIncrement

This is useful to be able to represent a CommandId thats invalid. There was no
such value before.

This decreases the possible number of subtransactions by one which seems
unproblematic. Its also not a problem for pg_upgrade because cmin/cmax are
never looked at outside the context of their own transaction (spare timetravel
access, but thats new anyway).
---
 src/backend/access/transam/xact.c | 4 ++--
 src/include/c.h                   | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 81d2687..369d2b6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -745,12 +745,12 @@ CommandCounterIncrement(void)
 	if (currentCommandIdUsed)
 	{
 		currentCommandId += 1;
-		if (currentCommandId == FirstCommandId) /* check for overflow */
+		if (currentCommandId == InvalidCommandId)
 		{
 			currentCommandId -= 1;
 			ereport(ERROR,
 					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-					 errmsg("cannot have more than 2^32-1 commands in a transaction")));
+					 errmsg("cannot have more than 2^32-2 commands in a transaction")));
 		}
 		currentCommandIdUsed = false;
 
diff --git a/src/include/c.h b/src/include/c.h
index 57664e8..aba0049 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -367,6 +367,7 @@ typedef uint32 MultiXactOffset;
 typedef uint32 CommandId;
 
 #define FirstCommandId	((CommandId) 0)
+#define InvalidCommandId	(~(CommandId)0)
 
 /*
  * Array indexing support
-- 
1.7.12.289.g0ce9864.dirty

0009-wal_decoding-Adjust-all-Satisfies-routines-to-take-a.patchtext/x-patch; charset=us-asciiDownload

>From 56f8f82a77b63079068dd5c29726ddfcdfb581c2 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 12 Nov 2012 13:39:52 +0100
Subject: [PATCH 09/19] wal_decoding: Adjust all *Satisfies routines to take a
 HeapTuple instead of a HeapTupleHeader

For the regular satisfies routines this is needed in prepareation of logical
decoding. I changed the non-regular ones for consistency as well.

The naming between htup, tuple and similar is rather confused, I could not find
any consistent naming anywhere.

This is preparatory work for the logical decoding feature which needs to be
able to get to a valid relfilenode from when checking the visibility of a
tuple.
---
 contrib/pgrowlocks/pgrowlocks.c      |  2 +-
 src/backend/access/heap/heapam.c     | 13 ++++++----
 src/backend/access/heap/pruneheap.c  | 16 ++++++++++--
 src/backend/catalog/index.c          |  2 +-
 src/backend/commands/analyze.c       |  3 ++-
 src/backend/commands/cluster.c       |  2 +-
 src/backend/commands/vacuumlazy.c    |  3 ++-
 src/backend/storage/lmgr/predicate.c |  2 +-
 src/backend/utils/time/tqual.c       | 50 +++++++++++++++++++++++++++++-------
 src/include/utils/snapshot.h         |  4 +--
 src/include/utils/tqual.h            | 20 +++++++--------
 11 files changed, 83 insertions(+), 34 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 20beed2..8f9db55 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -120,7 +120,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 		/* must hold a buffer lock to call HeapTupleSatisfiesUpdate */
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		if (HeapTupleSatisfiesUpdate(tuple->t_data,
+		if (HeapTupleSatisfiesUpdate(tuple,
 									 GetCurrentCommandId(false),
 									 scan->rs_cbuf) == HeapTupleBeingUpdated)
 		{
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b19d1cf..ba9fd36 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -289,6 +289,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 			HeapTupleData loctup;
 			bool		valid;
 
+			loctup.t_tableOid = RelationGetRelid(scan->rs_rd);
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
 			loctup.t_len = ItemIdGetLength(lpp);
 			ItemPointerSet(&(loctup.t_self), page, lineoff);
@@ -1603,7 +1604,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 
 		heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
 		heapTuple->t_len = ItemIdGetLength(lp);
-		heapTuple->t_tableOid = relation->rd_id;
+		heapTuple->t_tableOid = RelationGetRelid(relation);
 		heapTuple->t_self = *tid;
 
 		/*
@@ -1651,7 +1652,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 		 * transactions.
 		 */
 		if (all_dead && *all_dead &&
-			!HeapTupleIsSurelyDead(heapTuple->t_data, RecentGlobalXmin))
+			!HeapTupleIsSurelyDead(heapTuple, RecentGlobalXmin))
 			*all_dead = false;
 
 		/*
@@ -2447,12 +2448,13 @@ heap_delete(Relation relation, ItemPointer tid,
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
+	tp.t_tableOid = RelationGetRelid(relation);
 	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tp.t_len = ItemIdGetLength(lp);
 	tp.t_self = *tid;
 
 l1:
-	result = HeapTupleSatisfiesUpdate(tp.t_data, cid, buffer);
+	result = HeapTupleSatisfiesUpdate(&tp, cid, buffer);
 
 	if (result == HeapTupleInvisible)
 	{
@@ -2817,6 +2819,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
 	Assert(ItemIdIsNormal(lp));
 
+	oldtup.t_tableOid = RelationGetRelid(relation);
 	oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	oldtup.t_len = ItemIdGetLength(lp);
 	oldtup.t_self = *otid;
@@ -2829,7 +2832,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	 */
 
 l2:
-	result = HeapTupleSatisfiesUpdate(oldtup.t_data, cid, buffer);
+	result = HeapTupleSatisfiesUpdate(&oldtup, cid, buffer);
 
 	if (result == HeapTupleInvisible)
 	{
@@ -3531,7 +3534,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
 	tuple->t_tableOid = RelationGetRelid(relation);
 
 l3:
-	result = HeapTupleSatisfiesUpdate(tuple->t_data, cid, *buffer);
+	result = HeapTupleSatisfiesUpdate(tuple, cid, *buffer);
 
 	if (result == HeapTupleInvisible)
 	{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 390585b..a0efe48 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -340,6 +340,9 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 	OffsetNumber chainitems[MaxHeapTuplesPerPage];
 	int			nchain = 0,
 				i;
+	HeapTupleData tup;
+
+	tup.t_tableOid = RelationGetRelid(relation);
 
 	rootlp = PageGetItemId(dp, rootoffnum);
 
@@ -349,6 +352,11 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 	if (ItemIdIsNormal(rootlp))
 	{
 		htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
+
+		tup.t_data = htup;
+		tup.t_len = ItemIdGetLength(rootlp);
+		ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), rootoffnum);
+
 		if (HeapTupleHeaderIsHeapOnly(htup))
 		{
 			/*
@@ -369,7 +377,7 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 			 * either here or while following a chain below.  Whichever path
 			 * gets there first will mark the tuple unused.
 			 */
-			if (HeapTupleSatisfiesVacuum(htup, OldestXmin, buffer)
+			if (HeapTupleSatisfiesVacuum(&tup, OldestXmin, buffer)
 				== HEAPTUPLE_DEAD && !HeapTupleHeaderIsHotUpdated(htup))
 			{
 				heap_prune_record_unused(prstate, rootoffnum);
@@ -432,6 +440,10 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
 
+		tup.t_data = htup;
+		tup.t_len = ItemIdGetLength(lp);
+		ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), offnum);
+
 		/*
 		 * Check the tuple XMIN against prior XMAX, if any
 		 */
@@ -449,7 +461,7 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 		 */
 		tupdead = recent_dead = false;
 
-		switch (HeapTupleSatisfiesVacuum(htup, OldestXmin, buffer))
+		switch (HeapTupleSatisfiesVacuum(&tup, OldestXmin, buffer))
 		{
 			case HEAPTUPLE_DEAD:
 				tupdead = true;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5892e44..a29c106 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2269,7 +2269,7 @@ IndexBuildHeapScan(Relation heapRelation,
 			 */
 			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-			switch (HeapTupleSatisfiesVacuum(heapTuple->t_data, OldestXmin,
+			switch (HeapTupleSatisfiesVacuum(heapTuple, OldestXmin,
 											 scan->rs_cbuf))
 			{
 				case HEAPTUPLE_DEAD:
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7a5eb42..ac16284 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1134,10 +1134,11 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			ItemPointerSet(&targtuple.t_self, targblock, targoffset);
 
+			targtuple.t_tableOid = RelationGetRelid(onerel);
 			targtuple.t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
 			targtuple.t_len = ItemIdGetLength(itemid);
 
-			switch (HeapTupleSatisfiesVacuum(targtuple.t_data,
+			switch (HeapTupleSatisfiesVacuum(&targtuple,
 											 OldestXmin,
 											 targbuffer))
 			{
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 238781b..cb1a430 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -931,7 +931,7 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-		switch (HeapTupleSatisfiesVacuum(tuple->t_data, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
 		{
 			case HEAPTUPLE_DEAD:
 				/* Definitely dead */
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 8eda663..62dda43 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -727,12 +727,13 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 
 			Assert(ItemIdIsNormal(itemid));
 
+			tuple.t_tableOid = RelationGetRelid(onerel);
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
 
 			tupgone = false;
 
-			switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
+			switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
 			{
 				case HEAPTUPLE_DEAD:
 
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 90a9e2a..ee34afb 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -3894,7 +3894,7 @@ CheckForSerializableConflictOut(bool visible, Relation relation,
 	 * tuple is visible to us, while HeapTupleSatisfiesVacuum checks what else
 	 * is going on with it.
 	 */
-	htsvResult = HeapTupleSatisfiesVacuum(tuple->t_data, TransactionXmin, buffer);
+	htsvResult = HeapTupleSatisfiesVacuum(tuple, TransactionXmin, buffer);
 	switch (htsvResult)
 	{
 		case HEAPTUPLE_LIVE:
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 51f0afd..2961822 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -163,8 +163,12 @@ HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
  *			 Xmax is not committed)))			that has not been committed
  */
 bool
-HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
 	{
 		if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -326,8 +330,12 @@ HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  *
  */
 bool
-HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesNow(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
 	{
 		if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -471,7 +479,7 @@ HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  *		Dummy "satisfies" routine: any tuple satisfies SnapshotAny.
  */
 bool
-HeapTupleSatisfiesAny(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesAny(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 {
 	return true;
 }
@@ -491,9 +499,13 @@ HeapTupleSatisfiesAny(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  * table.
  */
 bool
-HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
+HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
 						Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
 	{
 		if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -572,9 +584,13 @@ HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
  *	distinguish that case must test for it themselves.)
  */
 HTSU_Result
-HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
+HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 						 Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
 	{
 		if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -739,9 +755,13 @@ HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
  * for snapshot->xmax and the tuple's xmax.
  */
 bool
-HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
+HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 						Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
 
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
@@ -902,9 +922,13 @@ HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
  * can't see it.)
  */
 bool
-HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
+HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 					   Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
 	{
 		if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -1058,9 +1082,13 @@ HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
  * even if we see that the deleting transaction has committed.
  */
 HTSV_Result
-HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
+HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 						 Buffer buffer)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	/*
 	 * Has inserting transaction committed?
 	 *
@@ -1233,8 +1261,12 @@ HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
  *	just whether or not the tuple is surely dead).
  */
 bool
-HeapTupleIsSurelyDead(HeapTupleHeader tuple, TransactionId OldestXmin)
+HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
 {
+	HeapTupleHeader tuple = htup->t_data;
+	Assert(ItemPointerIsValid(&htup->t_self));
+	Assert(htup->t_tableOid != InvalidOid);
+
 	/*
 	 * If the inserting transaction is marked invalid, then it aborted, and
 	 * the tuple is definitely dead.  If it's marked neither committed nor
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index e747191..ed3f586 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -27,8 +27,8 @@ typedef struct SnapshotData *Snapshot;
  * The specific semantics of a snapshot are encoded by the "satisfies"
  * function.
  */
-typedef bool (*SnapshotSatisfiesFunc) (HeapTupleHeader tuple,
-										   Snapshot snapshot, Buffer buffer);
+typedef bool (*SnapshotSatisfiesFunc) (HeapTuple htup,
+									   Snapshot snapshot, Buffer buffer);
 
 typedef struct SnapshotData
 {
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 72a8ea4..5309ce3 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -52,7 +52,7 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
  *	if so, the indicated buffer is marked dirty.
  */
 #define HeapTupleSatisfiesVisibility(tuple, snapshot, buffer) \
-	((*(snapshot)->satisfies) ((tuple)->t_data, snapshot, buffer))
+	((*(snapshot)->satisfies) (tuple, snapshot, buffer))
 
 /* Result codes for HeapTupleSatisfiesVacuum */
 typedef enum
@@ -65,25 +65,25 @@ typedef enum
 } HTSV_Result;
 
 /* These are the "satisfies" test routines for the various snapshot types */
-extern bool HeapTupleSatisfiesMVCC(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesMVCC(HeapTuple htup,
 					   Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesNow(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesNow(HeapTuple htup,
 					  Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesSelf(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesSelf(HeapTuple htup,
 					   Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesAny(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesAny(HeapTuple htup,
 					  Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesToast(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesToast(HeapTuple htup,
 						Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesDirty(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesDirty(HeapTuple htup,
 						Snapshot snapshot, Buffer buffer);
 
 /* Special "satisfies" routines with different APIs */
-extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTupleHeader tuple,
+extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple htup,
 						 CommandId curcid, Buffer buffer);
-extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple,
+extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup,
 						 TransactionId OldestXmin, Buffer buffer);
-extern bool HeapTupleIsSurelyDead(HeapTupleHeader tuple,
+extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 					  TransactionId OldestXmin);
 
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
1.7.12.289.g0ce9864.dirty

0013-wal_decoding-copydir-make-fsync_fname-public.patchtext/x-patch; charset=us-asciiDownload

>From f026cb457c41b197dee9dca294d955851a64baf6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 9 Jan 2013 17:36:20 +0100
Subject: [PATCH 13/19] wal_decoding: copydir: make fsync_fname public

This probably should be somewhere else, its a generally useful function, not
really related to copying directories. fd.[ch]?
---
 src/backend/storage/file/copydir.c | 5 +----
 src/include/storage/copydir.h      | 1 +
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 7f94f50..a86a35d 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -28,9 +28,6 @@
 #include "miscadmin.h"
 
 
-static void fsync_fname(char *fname, bool isdir);
-
-
 /*
  * copydir: copy a directory
  *
@@ -216,7 +213,7 @@ copy_file(char *fromfile, char *tofile)
  * Try to fsync directories but ignore errors that indicate the OS
  * just doesn't allow/require fsyncing directories.
  */
-static void
+void
 fsync_fname(char *fname, bool isdir)
 {
 	int			fd;
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index a087cce..3bccf3b 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -15,5 +15,6 @@
 
 extern void copydir(char *fromdir, char *todir, bool recurse);
 extern void copy_file(char *fromfile, char *tofile);
+extern void fsync_fname(char *fname, bool isdir);
 
 #endif   /* COPYDIR_H */
-- 
1.7.12.289.g0ce9864.dirty

0014-wal-decoding-Add-information-about-a-tables-primary-.patchtext/x-patch; charset=us-asciiDownload

>From e42be6c2b152cc0d1de2db802e20c1e19eceb364 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 14 Jan 2013 12:16:54 +0100
Subject: [PATCH 14/19] wal decoding: Add information about a tables primary
 key to struct RelationData

'rd_primary' now contains the Oid of an index over uniquely identifying
columns. Several types of indexes are interesting and are collected in that
order:
* Primary Key
* oid index
* the first (OID order) unique, immediate, non-partial and
  non-expression index over one or more NOT NULL'ed columns

To gather rd_primary value RelationGetIndexList() needs to have been called.

This is helpful because for logical replication we frequently - on the sending
and receiving side - need to lookup that index and RelationGetIndexList already
gathers all the necessary information.

This could be used to replace tablecmd.c's transformFkeyGetPrimaryKey, but
would change the meaning of that, so it seems to require additional discussion.
---
 src/backend/utils/cache/relcache.c | 52 +++++++++++++++++++++++++++++++++++---
 src/include/utils/rel.h            | 12 +++++++++
 2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 33fb858..aa110f0 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3365,7 +3365,9 @@ RelationGetIndexList(Relation relation)
 	ScanKeyData skey;
 	HeapTuple	htup;
 	List	   *result;
-	Oid			oidIndex;
+	Oid			oidIndex = InvalidOid;
+	Oid			pkeyIndex = InvalidOid;
+	Oid			candidateIndex = InvalidOid;
 	MemoryContext oldcxt;
 
 	/* Quick exit if we already computed the list. */
@@ -3422,17 +3424,61 @@ RelationGetIndexList(Relation relation)
 		Assert(!isnull);
 		indclass = (oidvector *) DatumGetPointer(indclassDatum);
 
+		if (!IndexIsValid(index))
+			continue;
+
 		/* Check to see if it is a unique, non-partial btree index on OID */
-		if (IndexIsValid(index) &&
-			index->indnatts == 1 &&
+		if (index->indnatts == 1 &&
 			index->indisunique && index->indimmediate &&
 			index->indkey.values[0] == ObjectIdAttributeNumber &&
 			indclass->values[0] == OID_BTREE_OPS_OID &&
 			heap_attisnull(htup, Anum_pg_index_indpred))
 			oidIndex = index->indexrelid;
+
+		if (index->indisunique &&
+			index->indimmediate &&
+			heap_attisnull(htup, Anum_pg_index_indpred))
+		{
+			/* always prefer primary keys */
+			if (index->indisprimary)
+				pkeyIndex = index->indexrelid;
+			else if (!OidIsValid(pkeyIndex)
+					&& !OidIsValid(oidIndex)
+					&& !OidIsValid(candidateIndex))
+			{
+				int key;
+				bool found = true;
+				for (key = 0; key < index->indnatts; key++)
+				{
+					int16 attno = index->indkey.values[key];
+					Form_pg_attribute attr;
+					/* internal column, like oid */
+					if (attno <= 0)
+						continue;
+
+					attr = relation->rd_att->attrs[attno - 1];
+					if (!attr->attnotnull)
+					{
+						found = false;
+						break;
+					}
+				}
+				if (found)
+					candidateIndex = index->indexrelid;
+			}
+		}
 	}
 
 	systable_endscan(indscan);
+
+	if (OidIsValid(pkeyIndex))
+		relation->rd_primary = pkeyIndex;
+	/* prefer oid indexes over normal candidate ones */
+	else if (OidIsValid(oidIndex))
+		relation->rd_primary = oidIndex;
+	else if (OidIsValid(candidateIndex))
+		relation->rd_primary = candidateIndex;
+
 	heap_close(indrel, AccessShareLock);
 
 	/* Now save a copy of the completed list in the relcache entry. */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index bde5f17..930f621 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -121,6 +121,18 @@ typedef struct RelationData
 	TriggerDesc *trigdesc;		/* Trigger info, or NULL if rel has none */
 
 	/*
+	 * The 'best' primary or candidate key that has been found, only set
+	 * correctly if RelationGetIndexList has been called/rd_indexvalid > 0.
+	 *
+	 * Indexes are chosen in the following order:
+	 * * Primary Key
+	 * * oid index
+	 * * the first (OID order) unique, immediate, non-partial and
+	 *   non-expression index over one or more NOT NULL'ed columns
+	 */
+	Oid rd_primary;
+
+	/*
 	 * rd_options is set whenever rd_rel is loaded into the relcache entry.
 	 * Note that you can NOT look into rd_rel for this data.  NULL means "use
 	 * defaults".
-- 
1.7.12.289.g0ce9864.dirty

0015-wal-decoding-Introduce-wal-decoding-via-catalog-time.patch.gzapplication/x-patch-gzipDownload

����P�\ys�8����Hj�#Y�,Q��Mv�I���lg��R)D�����s|��n�7%��lm�s%�E�@?4�A���9;�}.:��&�`bYb`v����'�^W�}Kr��{.���c��!�cz����6���5}����\��������}��p��xS{�Cq�ng���x�yWg��ag���c����D�	#<d_��oO>��`�{��-��Lax��N��
}���{����<��7���������S������1Q��]�@��,8���W��|S��$��bs����m��/x���R`��m�
���,���y�!�`����w�"@�����vL�-��� �/���ar���,�8 ���4��&��x�}�y&0�)�|�����
g�Qw�aq�v���m##+�����`���S0�>,��'>����(d'�a����;n�5���Nt�_��7o>�F:��m��a���cY�(J~0-[�X_8l�x����va]K&\/��p����v|v�atr|6�>�:�_nG��or�G0�
"m���C;�&�g`(���F����e1�0�=���|�p@�h
,p��V�e��$��6��b:���
X��U<N����z�������-�d���Q�q�4T�����#0G&%'
���Z�D�!) SLd,�P�BF'��uY��5BQ�����`m�� �az(k�sD5�/��i4�h���i��}4<�,+����{AW�6��|�_��\
�������7S|@�,��4��.?��k;jH�^��������$M��v�-%���p�\f�Q���`4s��d���`����4z���e�t5�A������@!Y��|����l��H�S4"��;�y����/�A,���e���A4��z���T
�H��s(V�i[��`N>���mi�s��\4@��T�R�B����%���{�	�Y//�.m!�f9���l	�Y��
��!l��J"r���
z`���C���30���S��p�Y��M�[�6r�14v���6s-H�����Ga�����v]~���������� ��98���F������/;d�En�4u%��>��7S�����������=�^��)������i�Z5��.�)�m�(�������`���?���lV���)xJ`=����~;`���� �J��
Ji]��p�-f`G�Ie���nLz�YZ+���FR�n%)�/�7�����NX6`s���RE�RvK���L�+�t�S��L�~&�@���?���F�l��t������@�=������6���ZMwp0�����>�mm�~()�?(YP�@�Go;;���f�j�Cm�j��!���T����e�^5r����>(�����r��A�7)*��+�F:�����S1��c*I�6 �42��F-9������p�WJ���Q�t�����]����Q=�t��)�VX�Y��}��1�*,�k8�)����c��Y��zo���2U2�`e�`Y��d�������w��R~���+�����t+�}�tK"��������W����B�j�t�Q�@]$�n�����g�A�k�+�cy2��"�^��P�E���y5q
����Z��k���x���!�S,V�`g�/�*\e1e �l����T�J�Fk�w��u����BA�S��gjl���!��#
���fCc�!��!� X7���?Ce�Ng��?:|yq�<�8�_~�Z�.�bQa��#���_�&O"J��9T
C�DZ�'�`�I���L��X�5����xN�l�R���M��e���v���������G�S|�X?��Z�C
���1��Wh�#���\��yb�}1�^���qZ@���B��|��y�ka\�>f���,Z@�!rL��[���A'�`�F�W�]�a�IOW,�gfdx"G���9�;����G�
G`rGc�}����Zs�������6�h4l����e���������v8��@�lg��c�~�V���������C�^��^W����Zw���E �Vk�����9��UMLJ]R����.��-V���~�������0XB$S� V�e�``����L��b�.��3i����������x��7�]#���u<��"�O`Ur���Y��Q�$H���bv�*`J�l!������Q�l+�V���{��=��/�s���K5�#bs2��{�����7�4�x� ^Fa���2�OY*k�_J�0�J�F_�y�
���a(U����e\)�o�c���iB�b�q�;A3���)��%b'k<�;��xP��k|,��o>�c��d��$�{g�a��X�Q�aF�?4�L�R������Z�S�D_{�H����7�s������������#�>���<�����������xtqsz}���DU�w���+���"�T���6)��3!1L|.�g�����Q��Y��0���� ��>fx�c(R<����pS`=@�?]+�/��k��4����������IW��%A1�����%x�.Q*F�&�����R}�:���1�{���1e���������������DAf���h~��������������sI%��8i��+I�7��������/@t�.0�������/�0��WM�A�,mFy��[@S�y�b�qV<=JiB�OE��)��h���1>+�	��1�H��;�~���w�������K��~k+��y�g;
���|��Wy�n�����|$p H�(;}��@��X4����l�=�0!g%�q��B?�fW����]�O�o��3���{���������������	.nb������7��5������2���&�5�g��K	!��n1�!��Q��i�>h=�S+�i�����^�z"��Ogg��B!����I��dXd��"7����K[D�H��w�2�R|D������*g����QX!y�����
���� l+~+[.)�"5�)<��~�5'�\{N���g���S1�
��C c,�V�l�oY�qj+���"/������eSbC{`F��#�
=��"���C���m� a���
3����Z�/��p��*n�#����,]��@�`_��~�(d�?���H	W�`����mR�4�_@��� F������FUv���dU���l$F5N~#C*�tbq������iX��4�PE�z����O�����0�<rB�Q���7R�y�\�&�Fh8C-����`[��
lmp�x�q��Y��� ��@�S��iH�"p�'��8`�u,�c�}�e������1����+X����((��(�efN�"�B�i�"�@������~��~�
��T���8>&d�a;������uy0Xqd@�+b~}|���v�����b<���#�#��!|=)�!`����r5z�7����RN2��FSG�-h�(��F=��$������:�|3�&7#���~�{8������L[br�F^0{t`�f]��\��n��Vy�M7Pt,����Q3���'�gu��,��
J;H��\��n��������q��������C���u������T!q�����r`���xSD\
�3�YlTUl���,1�U�Ylbk#�f��^���N�q�*����(-�T����x�6�ueF	*�	�(��8��� f!5�Q�U
����J�r$_-�1��*m�"�yN�C�6�R( �u��
j���u)y�t����tU�D&��t��.��Lr#O�HTx�.>�����I�?��F�������j�aj$O����M�i���9^x��2K�3%�?�,Y���[���`x�.��c?�em������l�1������9=g��.��~��|��7*�"S�.v�F�S�)��'-R6�n���(A��j&w�:��A�
���%`w�?���.�T�@Y������9�$�^\��(��gM�+]6r[�;��;}��CWc/O([�z �4r��}
K���������b�5^�H���Dvx����G�dA�p���2�7�q�u�mt�/���E�
>��f���������b�j��~�����H_���9E2;z�^��[B�n,��-4�^#���"�+�#��:��Vb�*Y6�����z�h-�
-c����G����9�t��0.�������Q5�&�����U$�VmV�?U��"w��)FA�l���huTP
}�����G��:-�B���S�y�r�9��H�|t25��r)%���v�x��zn�����\OR�;=;�=M�>,.�������.�~��:�t�XwX�=M�����V
p��Y��d��0�:7�� �U����W��!.��*{[�34�����:?Z��l���K#fC��RL�)�\���18��U���3������Z���
\�����[�9{��TF-����k��v*����1��j�_U�����c��d�������]k��r�
�R1�?�B�����iq�cs~'X�� %�nb������=
�������Y���lN�N�+`���(��})�()T��d���t�4}�b������(r���(T���tu�^�T):y��DM(yu������b:�d�I��}0@�Q��e!�@5��4������u�Uc�y��L�]|w�X�\b��B�V��[<4N��^��uU��0��T�beM5�:ocZ��M�t��T>cj�6'��>��_o^3�M�)��B��2zb�����!yWJ��h+��}2u����L�g��C��P�������Wl������2�E��D?I]����K-��M[&+d]l,�,��B����G	@�US�MC�F~(�Z����E���gt)_�����"l�u�2UY1�����>�����0����t�a�y�DQ"�M��-R���|%|<g��o�gjh���'�wj;B*��H3B����aE��cm~���&���xemi��:b�����b���$��5����};3�����0X������ ��.��O�a\k�����"�s��4j�� *B�.�x�d����Q���
d2>'��s~s��U�)[u	3|��|R�����b|O���h�-n���Gf�x>2��������$@du�of�#��}ry��-AH�� {�W
f{
�K�O���~Eg~_������/�%	!�U�wu0T�H�#��o�*�`Rut����\i�^�����4����m�i�?S��O���}�d�k2�O�rq��b��CmRA��)�4R5�<�J�T����.�P�B,O��u�e5�']J9�5H��31	|��-�
�#heX���9B������O�O�����������:����>���*h�c8{������������t6�[�+�LP!�n����J91U�d�n�/s�E�J�Iy��[c�S�A�s���Q�?h���n��bM\7P�����8�L����W����L�q�"����.��R3���=�!��M/U����}*:��z��1����o������X�|����|)�Jd�#���S�~��G�i�R���e�!]�,��M[o��������>4'�g�Y�����q��a�?�*O�Jg��O�������)�;��O��Pji�d�����/A�D�R�=d/%�zn�����u5e�P�wV0;b���`m M�F��a�~�H�k�~�~P���"&��������/p�	���+��,�����>�X�����>��7������,K�TV����r�L��"�`���p��X����cu���70<!����P!
"���p~|����7�k#��E��"�L��X�e�`�m^�
�����G/%� �%�Z)�.����-��LI����o��6dF�'��=QC���� �q�_���������nA��b\����������������W��9���uj&Q0�R�`}Q����8��L��'/�?���6�'���[���T����$������w9�7	�V�g���J������CEa���B�xC���k�e0�jM�F~��[�9�[���BG�w��71^�_���x�YLy^��U����|�l,��*&���}��������Tl����1	���=i�j�����E��������ut*!�w�O�8�9Jfg���O���`o�}��u�������Um��+�-PK��TO0�y�#>[H �:�U[]�k
'�tn��-��}��T	RQ���2�x����~�h��y�������Ij�y������t��"��U�~� �@�����N��i���{5�x��6�f���>P�S�$�2�E�[#)�X�O��H�V����V�w�.����qr�����A	�����|U�Kcd���g�������C����k�*�8`�u����R��M|�C	�.�2�����g7��\�jD��4������iw��e�^]O�TOe�;�=��g7�v/V���~��h��rJ����M&��5C���D}�C47Bn����%W����F�m/ts?yJn��,��~��8;b��F�B��Prbj'=�_�S<�`I��U#:TaCX���� r�5e�#U��<Bb:��DqU�4��z�~/�
���-e��*����G$%T
)!����0�������+9��
�=jZ���)��$�\�}�b�:�:��.���}���CQU�>	C@�~����<&��
"$�u�%	�@�^.���F�h�W���hI��/�/�3�lx����!�~�!$�r�������N:1������9���-c���	7ZS���"H�=�VF�,�[W!��x5���B��.V�����o�R�����'�>GFc?#AJp�P����unD���
� �\��	�$N�r2[2m*�X�a0�}*qo�`�*~gUm�
1�B1Z�O�t1��X�#a$�m�N��'��|-��������>{��
�}��>��Q&�C�fl����7`Y�t����-�z5���^\�X
�y.�6��/����+c�j>��(!�����<�i���'��8R�l����
�C�NA�A��A����|c��qE	%KPQ}	'J��:A4!���T���^��E:�(A�������|�S�����v�d�gk��x��<��6j�2 ����J:��B�A��
|,M`�,mk#jW�E��|P�?�v@n���W$^��m��J�M��O����4"�������ATP�*��Q>j,���MC�)�P�)�.�Ar��.���R�R��"�p�N���x�����F|�_-�b��M��;
�G��'I��b��g�����&[����N��8�y^~���O[����H���:��f�>2�ggo���]D?��6��uu��j��5��������]����:u���<��8;g7�����
�4���!,��=	�����"�>��A�����W�HE=6����[d��2�1���S�ku��������h[>��8��o,t/Y�-��h<����<��bov�/��=cM
��n������$�������"�=@�@Ort3z*u4O.�
���3� ���8��u`maGa������#�5.�hl�����]�J�>U�o0N?6�����r�O������s��2$>�z9L�
�<{�jd�yNF���&O��:�!�d<%����&�����t@���UB��bw�F�U�"JRNPe7�Dup!�=�Gx��S��u��#*��,Ot�NN/�?�O���r�c(�k-�����8	lj �1U�F���2T��I]<����E��\n)�����oR��7�Y����~6��N��Y��_�|��)Q�P���/��(z��.PN�F$W�g����� ����'��!^Py:���}`���
�Xw��T��)��	���u�m����'>�~���qw����K9��wp�C4�r�����\��43�S�u��'vt�H�f)���FKXh����z%��@[I
/UF�����V�o��1�A��LD����)�zD �����`Y�
����^�:�+��i�F���'������&���r��5�����>%���p���F�;�V����9�Ti��j
8�MSp�
T��=������ig��^,���u�}�!\fN�'I�����Ee��
/;%:�=��W��FU
;vB�x���xD��������Z�{�bYo�� ��}�Z-2�\�x��b�U�<}�"m[��Q�p�?#I���K��������y���o�i@:���rs�
����]�NQ�+�����,�zQ���X�RTb�d���"���@�4��d�v���4A3�y����f7�+�fSK�Ve��DXipT��QH4�.i����U��u��]�K���U��NN�pe5Y��Zt�>>��(�nU�G�:�����i=|X���m�����pxt��?=�8���C;��#����z��]�z���,���j����l4���}�}��� �Y���/�:�/��E5>1�A���#%���f�6�MG��6?=S���Z��l�	�v=�������+$��E���]A��}�EWS�x�S.�
�����Kq���������:%��V0�	#��Sl3<tx�pv�G�x���r��D��W�I�#�����O�>a���R�%ds��d��w��U�#�X��g>�x�7H�p����4�b�A��������������Q���QS��F�N?�����>��j%�e�k�`�q��7��lT�8��}./��~��?0;?������l�ct6������\i;��H����.�2�L�Jv�M��SI�F;�b�J����O{�����5c��������*U�����m]��Q�]����X�,��
r�'�l�����?����m"Vi�:�rC����Es���/�����}LS��SX�4�R��=#��J���;Z��iY�����W�����Y���i�-H*�!��6�j]x3�#��>T����K���Nx�[g���������}K{����������w0���wz�|DjW�,�r��Et�Q���uf���eS�� �z��t��.�=��9�_+��6���^]2�.���2j5���]d�{oN���X�oG�tH���;�$,��w��kL'0������2,<���(��E����[�[��]���c����������H�o%��j������/k��^\�;9�*����Hwg�q`�K��Z��:P�����?o��.D V)#X��/�O5x�3HW��^7���i%t�J�rE�ZM���Y����V��b�O^��r��I�W�;Z.p1A��Q�Y�	;���UR�R��/-������gF18Q|��~�i!�yr�:}�:o����<���G�Cb������g��'����.N?��'����l�>�s�>�(�������Y����(2ZE0���\?0�
�Oj��0��/�d<]�����Xkg�9��	��Y+��������W	F�C
���w^m/�7�Z6�;��*�����p=X4Sn����.NP[������~]�12A�
�_ ��p:�=:�8��FK)������,+]j�l��*��w+�����DB:���b���V����b>i-�T��CGyG0
�����C��������T�)Vg������ta����?�����=+;A���N��8���g�<.�jzX�Uu����e�69�u/�������^���� �Z���5Em������8;�
+��a���/�����(��mA���_�Zo�N��1�������[�/Q�D�H�z�h�U����@O�1� �gpfcfh��?���y6c��+�T�j/��\�qe]8o��4a���������kj@~�~�:���C�)�lQ�����;�y�����U����'�uXY��.���=Y�M�n���g_�1�D�e�n���^������z����:na��w=nb&(7�����U����?������X+9�<��#�/t&��}����:)�&|��+���/FK%�@�
?,�e�Il*^C�X����+]}��2�������"��_$��?�G�Mg�������+����������c�����]H��?�l� ���[���w>[���C��y�D�G�#���e�<����G�L����K ���(g_��_>�&M�V���g�}�R��r�OQ�6���& ��b�����'��r�tO6�p_u�ZMi���WAC�
W�\v�
�v�oi��<�g����6��5��1�PB����"��j�A��Q��^;�fJ{�#������8����\���@���P�T�t���
������>r�*�B�YS���������E='�[�}���K�=�T=�eh[��b���
>7!�P2 ���?����U��Z���������������,�-_Y$A�4
�v�����1|G�uZ�]
�.�'i���+4L��P	F�}y2h�U�����Y����m���@A���k���+��b�Jw]vH�Y��F�G6E��U�c�^���e����52!��.������.��s�uS[��z�,�>�	}��D��ui+��w����;x1~�{���g����^���n,k8OU�5f��->n�8����u���~�xZ�}��?�����s��m��s�ra�j�p(im�8�iR?8v-5��$|G� �
H�k�fK�@��3I@��U����](2I��������{��Eg�����LKV����?.J�4����#\3�&$l��A<�������/�(��B��af:�������2�����~m?��r�T���9p���*�D�3��L���N<�w%�{~��z��������X���"��TA�U��l�'����s�?	��M��a�t(������.������c�=�������g���w����-�J������p���t0DCF��z��������S��_�~���z�X}N���VU�Fo�A
��������?����h���+���F��}�y��N!����"��]><����
��F�6�'�~�����72���4#�iC6SN��Q���5~E�P�8�x�w�7HB���.G�O�
P��*��P0��./_��9|���S�{�e��������` ���BW43��U�M���W���g�v#+��� xD�r�2N*s�e�nO*+j(�g���AT2��%����[[Q��*A��zp��~X�������?P
�S���8;F:@�v�Ae��bM+���LR���'b�{�D��� 9�����|���2�4��n�P����/�����������T���_g
��K�R&<���/.F��(�|�����L'���/�5�%�{�����u�����a|���/��a���&�,�q���n��t!�����NSqH3��a#���g���N�K;	
(�3O�!���(=�����D�g��@I�D�{��1@������uh��%��z��+%j��+�#������8������/L������zP�"��]tKZ0X+5���Y����
'mw��r��]m5���lB)~��OSD�����O���#Q�0����*��G�_G�c]?$7�hS�{q��������*�p����^��*�Z��}���h�{�	�����
S:�Z�? ���������Me��sM��z��}�	WRp���G�}����d
���6�/&dA���0��-DL�P�l���9�p�G�:L����!�g
;�J�(�'��Q�v,��0n��:��%i���sN���*��l
uK�c0��1VK&��
'�)}��k���Y}�6	�&a}�%2Q���J+�:�8�&t�F���K�;�H�&m�Dt3f��:ch�)�����oUM�	�}'�ph�vCnS��8v�����;4
�.���N���0f���s���x��j��4�<P���&�u���K)7��^n���r�{M����	�����������T�fv-^_��/v7��Gb7Xx&o4���M}u��&�a�`�fbn �Qu�]��8���;I�D8���>�\�3����(���4���A���sCmC_>����J-8j>��xp��=���#�w_.Ic�!���/1�/�-8J��������9=�NMU����7�����r�R�p�
m]����s�����)�5c5a-�]��\�2�K���X���8���c�)�12IG��8g�=�t�C�>�������j�_.e����<�5�}m�t�B�Bn�0�~`?Z�D������dkp+>C��n�kfFt�O��:����'zC���T�B�V����Z�5�U�xj
��d@�7�gKp �'�;9�%��9S�5%�/�8��:�J�G�������YT����N��G|�sf����9JN��_��jhi���5��O�ADH��A~US	n������%��[Y���lwT
Y)
���
e�!l�db2za3n���{5�1DA��4����uH��z�$�X�:���%��"��l�|�� M���l��-c:J���|	@�[h��CGk���JK�U���"�xn�Q�
�P�����,+0�.����y����Z�J
6��:��599�$��Xc<�D��3��ws�%o5u�� �
��Q�I/�����l{H�Q�����&ze��X��2�)���YK��-��R��'��9Mf7����������|������i�	��bC���]�ki/5I�E��O�C4�������qcrZ��P�N���{�{G�A����R�����m=���=�u�"1��(�M'�@{2�\�����-|Q��Y�U����4fK�s5����W@1&��2<ufq/9}V���|4����tQ�Iiw�'��8{��QjZ�
��4����0#|R:dKI���1��F�%��;��u�P'�5^�L&�����r��������������6�������2i�Sk�Y8�5P��x���
[L7P�����s���Z]0�nrL�N;<o�:��U���*{y>�*E��8�#�`��~T��x�6c��,�N9\���n����H�e������w��&�S��z�q5"I���F�:���x���(�C���\�������m}Y�y**d��XM����"�qi�t�T��e��lVW���4�R5���T�U���Dj��y�Q<�jx������H8�$�!��bt��)6���&��n�Q�ay�I�A���F5&���`�0�������cq:�i�������Z�)����i����r3��)o6�t�7�wpU6�U��l*�a>>���Y�t��uN)��q6�,p�d8{J��^9�A����1�y<Y ����mUo���2�>�L����;�T�m��yv$NkP-����	9$=�Y
�T 2��9�`�b�|��CM��21T���g���v��u�(C.�OHZv��P���3]�J'�[dM����|���<�P@��g���W��}�{��?�};�����B����5	b&���P���#�Q;�+E�N?N"�`���C�x00�s�����\�3�d�k��&y,%�����K�M�����:���9[5�������M^����R�����	�d�Ov��t�(�=n`o6�,�uV �:�5~�����w�Es�u$�>���m��8ca�^eCX�����fj7J�2k���}a��)_����5�:�����`4AB5$���!�X�l�(���"V���!�>G���Y���n���I��*�����(�)��M�@*��Qm���mR� 8j�HRr��;�xQ���v������R���@=+�iV�n��*���t\[]���+f<y��jf =w��U	H�����c,y���"��&����^��*���l-��)�pj��H�b�q���hp2J�g��T��s�h�FM$b��@[�	�:�1H!<���a����0�Y`j����Mb5�3���{�����]��K����f�6�W����_���r&p	���;q<$���v�(�5;~�wr���n�+Gg�\�-��-��s#��3N"A���^�;��~����y�������j����~x2����.���~	9<�:���D��:��{wtq�'���N���ZBIuL>C}-��U6
GkYIV�=�����:>���(0���8��d�V��jK��.��ck�C�<�vZ��V�T}�:<����*���i���3�q`sB��X�����6�z�g�E��,J3:��M��63X�����5�'3�w/7e�:�v ��x���U�C�>�_3R�� SR���G�X�9�����f��Z��}^4h)�}�&����C��^T������_���e��k:��g;#;Z����R2��&
xs��o]T)S����0��t�pR�8<|��Z�)|���������~�x~zr�;�x�����d��w�$/.��5��~����f�	Add�!@�Qc��0%Jv������
��tUS/K<�B��,���X��+�J�v�6\o���Uy�#�0~W[��J:�!�{�i~m\N���'�n�!�T��<~���3�����w����-��
��|2+A	��*��zD4HPg��������Y�������w:��<{��jt�t=kH_j%=�M�
{������q���w`�}T�/�1���0���K�4�����/�������]���~��4OpsP�b<0�J��bhD����TX�ho�&
��VGP���$�o	�B�L�&���gh������t����np�z��3�D��S����"c������u\�Hb"#�.��9Wm��x�=Y '��%"��>��i�����X>H���D�������q��\�����*��9�����F}������L�V���IvC�J�)3��\����L��f0?�s����$E�'�o=���}�����Lj���O�+��n�q���O]T�����"Q��n��{��I�q5���c=+W����k�t�����T�������T�u�Q����E{qqf�.���DB����6����J���X�|�\P�i�s��wjU��9&�z�o�;����?�KC�j���Q�!�Y�xy��;|�j�g�c�m����y~�T�?K�j	��e���UQ0(�o�
.����-���Z���tH\���u�c�	���A�1�&��66�m?;C�����J���$�L������wh~=�t��a�����	���k��P��
[i�Z{�(��CYsha�W'���Un~Q�	��O�q�������A�fL!��=����)���+5��[mjsT��]�����v����l����!�R���|0�[�����������,�����Z���.�#BS2�T�sp�[�+7������������\��J���w�&�����f������v��q�r�b�
l��=��a��Q���M-��_P�,om�mL�^��t����~~�wb {^���%��yi�B)"	rF�2��Q|������8I�k����0�gP�a\Yx��Q2T�������G�� 9J���b�]%�G�.f9�J��h�aY�z
�#Q������?���m6�_jNg�u��a��eC��p�3C�,���WC��o����_hV= ��z����jy"J�+�`������������k�
������bR��8���z�oz�6���
�4=�K����>������1Fm��Q�V�:������a���i\jq�����GZ���E��H��K��=��#b�x���0;Pw��0�l:M'-��d��o'��5��n��h��]~*�#�c
:Y����%������A�a2�����w���	�e?����h0��y��qb����6%��ir����H����u��tWT�y�X�m��5)�%��z�vl�`��d�0�������Bo0EnX�s�$U�������Z��y��	T�^'����r�4B�c��4�rD����������g)%-��|F9}e�����������PK�eO���%T�AEc�V��������P��|��T�d�j�n2�F�N�P���
'����IO��H�xZ�I�<Q��V(}.|Z�5v�8?���4��-71f�PQ�v�B�D�`^��x7N�x'��n�������A.�mB~X@%[��-����J��Ai���/4��X'K( ���kQg�@(�/�U<���V�j�:�m+��[��<��+���k�4V���������kK���
�[�'�gm���v�a�,�^�]��0b����������dp"�4�0;F�iJne+�A����4+��|����aj��w����J���r�$q��5"��s�4�����:�p��-��Q���OBK3��P�M���F�����s�����q,V�ec����n��I�i�@r�J�
fYL[*t���]����P�e�m	�q����	�	�kI7Ap��7t���Vr��c-�-�)t�:�>mae+b+����dr��b�X��|�p/4����������
]a6b��p���������wD��%���#�;�~�xw���V)#1�$�3#�9��<w<a$t,�pJ���������X��(��"�@�t�{O&l���v�}�.���<�4���`.�-���_��������T0C��@.2�qZ�#��D�.�;o%�|��=�pp���[q%��;����G���p23����Vl�����/p#��t�wZ�}U��Y���"u�U��+|�Y�� �7��Wd��d!Cs:�HGE�o
"��-��?t{�	��=t�����L�A~������wQ���b	[~��=qQ���\��	_Ee�Q����|�*�8�;{������zrqxq�<�\������1�x_�e_��:I���{�W��,���'����oW=��"�����@�[�A�v����`s#��r'������T����aT�f��J4�7Xm:��P����tW>v���u\���D��>��2g-�(O������]���:d����:�_1�$�!�r����D����aa�����i!S����c_RJ��.��	s���l�|�(B5t[����pX0Y�(q��c����'�u�^���Q�����U�t{g�bD�����;��%^��N�(�~f!�}N�Q�����\@7������
�q���n� ��S�XF�V�(/N�����79�5�����_�,|�:{*-�U<?5���W�Q�[��"�2�S[m�Oz��?�y8�NL�{�H��D�j0���3��'���;&	3��t��gCX�.�L@S|y7JD{]�,I��0Is��6oG���:��AM�]6�^*��!�h(�G�p:��$I�e��$z^������s���� �
�O�i|���x���ux0G��gG����@��_�����\rc���=cX~�0P#:ZI�<�����mDa��5��|��������W���8���E�<02TQYef$�r��q�:��K��8i04�]�^����U�@YVqx#�Z9�W@	~M�C����pV�Nr������VF#��a=ue5abz��&cQG�8�'���D.#��t2O�=��R�R�{g�?o |:�`�M��A"�%�Z�YSS�K��Zp6�4-3�g��(+���H�N@�N����ky7f��&C�ri�'Z���M���c���Qd}u���]�����5?�9j�����~o7��4��R:��R��z�t��W��V
�``g'�^qn9�?���EX��Q@ts�Ye�I�*��h��%�j��-���nI��t7���29������yn�U=�lw
������#�NB���9����XD�2�Mr��m3I��&�Fl�H]5�\���d|EPq�v�{�;�c�N���^R��������[96j�V�k�R�f��oS���
X6�(�����Ig:I����1_t�
)4�f,&�tf7�|�H��b����X���T2�/P�"��/��W����n��Hy�����$�%��\tkt����V�~L�X�]�����Ad�
_Z1A��WR�
I;i/�z��{6����l�^�������,F-���K��d%y��^�����6�[(����_������#�Z�	��a3�a�c ��xq��_�:F��a.���$2��V�&�-ZA�7\TS�B�����e�f�,r��m�e�-�`Y�2�-���x[\$Y��]f#g���������n���t��y	�x�����u+p������v>����',��p���������s_Mly�X�(o�6�
r�����SP��W�~V� bI�`���i��$Y��K�X�;��_�~���w���[g�;�*�����0��+u4����A���?HZ���?���N(�D��
����j�?-
��%��R���<+E@r
k�vC��V2��VEn9
���������M`�E� �i�J�I6�<-�8���2�����T�+T��<���;���o�D��wo�;m��t��3n��h	���Y52?Z_(�>�#��(��yV�z������*e��_���M@@�qvWs�.-����	�1��t�BY=e9*Y��_9��<%�
����.xkT%~��HF7��I������R8K|�y7#�����w�c�r71p��?\�(���!p����GH����������K������j�c���!n����S�}�/i�/X�����w�E�Z8���.��&��iR��N���
��X���t[8����~*MK�3�4y
2�@�p9����_���~N�w���]��^q��m��������k�
qv�.�n�CU��W����)m.����N��~��_�<4B��M=���3r��0/x���<03�u����~=l����2K����U���6*D�a.����^Q����B��GR%]��-�,���X�'���Ps�H���=��`JH���b�������~ ��_�i�xSI��Zh��epU������ ����h.Q��eF*�1B������0O%,pc%,;S%|ldt �u0�C��5���������"�nBlU0�w���{�����%;S�<3�����S~w�]��5���K�����[;m��?��ke@���:���W���x�N
#���c�������[D��KN�
j��-���h�V	"Z�-�#T%����+���LI$�z�e��� �=Z
w�[�����, �;[�:~�HP�IJ6��+�r�B������F�f>��T�����g���f{���`���0���`4��5L7F,�D��N%���`�!�*#�����-���

�E�K������bT3;�[`T�8~�{��c�Nh6L6��� )|t��*f�������\���J��a��u�������U���s��zy�71�^��.<�����������a6� ��2�I�F�\C���	{ ]��)�G����e��'�b���L%� ��\��&����V���<�^�I��x�G?��������xHK#�������z
�6�
i�)�"�� �#�J���!���A��5�F��-�r��sq�s�G���K:���#�8T��`X�
j��)Ij���r���t�0s�����Z��TG�h5�XZ:%x�\H���)�M����KU]�9M0���HRVz���N������-b(�8�^�[)�����Vb0�-��y
SEl�4^zG���������JuJ�	P�8���+0T�2�GH/�����NY�`!�p��b5k�MC�2�.���v����c�~/2q�����`��@hO�����-�O���4C�x��b}�b���P�1m�G<m��)S�����?lo��fuU��������P��+��a[!�����L��kh��0�������E�
s��]+Vx�.8Lh�g��yI�������P��MB+��U/xD�����o/�Z3a'�������F�i�S�@����e��x!��:��vJ�W���%�n�.�=!11��Wg���|�-������3�
�N��\�������:)fB,�s�O�w?�r�I	��
�
Li	����z;��������l���;�����q,.��]�N&7I2t��M��s�A��|�l�4k���2=�<�+@=��d����)W#v�g�<SbF[�g��I�tH�7Vm�����u�_��n8���ar�,bt4�|�j�W������;��^�(M$'�a]�^��]hk�AY��)��q�C��'�0d��x��Kj�d:��z�?������B�4��\��2�P��T�h��JE-P@,9����x�/
����Y����r���_�k�q����Wz��XR\{l(G��T.�Mp=I�|�2�z���V@��^�|����h���Wit~��=�;ZE9Q��!�M���(�~i�F����Z���&������Y����k�����Q�&p�k8���B�c�|�������qD�Q����n&�c�.)0������d��E`,�`<��A��|��
	��u�x�����W�ZC�3�G�z�
�������n�xB�h�O�����t8���
�}Q����q�����j��Z��g�v����%��d���0���}m���%Gl1��?��_�S;��K���_���`���_���Y���M�Yz�*2!e���M�:k�/P�)�W�����c����K:��.0Z^Tna��������������(�%G7��I��&���x�Q�9��j���Pj��<P4rQ��\Ar�ru�$����:b���dkxkUEj%rb����\�O�PGf(���i��IVg��FU����lV�K��6�1,}����q�v���\�7v���i|I�f��zD�=��t�����'����N�������-��H�������g�J{^�����������qk�?��y�x����N����������IC���@��tZQ`CgGD�'b4��&�l�ik���2�}��L�6�����vb=Bq^�
���2�25�7�s����{�p%���
5�I?��f���_�+�4Q�%{���H���1��zy2���P���U��������M�?M70���O�_`T����r|�&��/)���)2%�x��]�^����z����-3�a���Ty���v�r��`�p���:���[<
�V���P���U��
��Q9p�9���JHo:f�W(=9�d�C<�cgE��Z"���ce@�L���,��lN����u��V$���&��y0O�6����CIh��V�*H�-M��y+�&w����X�kp�9a���9R~�����)��� "�����L%@*�+Z?����6J����M��/����������l�Xo��*n\�D�Y�ngIhc +B�*<\\�.�/h���O�����r�����!s��G8[rK��`�������9���g��g�������"^T�j&��a�z������fN��M��e��
����+e-���d��/�<0��;I��x�|�e�>�	E��,6�R�����iO��r��+�S�$t�E��w?G&��j���*]���[	V�wr=�uNX�e&�Q�b�V�#N}2��@�t����Aj�.p���s�����B�Y��9�k3M�,���Z�[|?��L�'�;�t�d��>K������M�p�>h���nG��������&���m���lg��o�u����w�����S��1^�/}�1.�cW�>��NM�s�����q�w��������o������*|�y�'��Y\Ag�����j���
��
�j~7rON�]x�<����yjy��LN,�����I^��G�|J�����2�!�����sm�q�7�$���_��q5F���\�1���Z�W����S)=�������$���������|W0��%��3����������R8V�!��@+�4��	�����"R�!�I���y�P�Cr����Ez��|��K��t��z(50	�q�xy��m?�l�.2N��.�e�u���P��*��U����gQMs�B��nH�%it�s�#�q�������c7�K\[�3�)^�;GV_ZhM��?h&��)��C�y����2��he/������ a�����A�9Ozd�|k���u�r)�43�M�)MC���o{3Wy�x������7�a'����>X��n���Y{���1����7�Ok���+��	�o���QRQ>"�x92����*���f��G3a�TI�}�C�K�A��X���%���6�������G�Ixv�`�=��o��S����r��O�S�����F���|�5@�r�/�B�,r�������R�o��rD�b���..d0Z����0���lz�o	��
��6�� ��h'��W�����M�3�k�s\�d�F�"���n��5�?�M��O`�p�D%\���;|�;O4gmb>���rw�
LxI���r|k����|C���v��CC������|F��O,7c�>��g��I�jC��[����UMT�:&)��d�Y�!��A�������b���	t����Vj�����.F���[�����W1p������N��@�����"9-9��
�����4/����R��tU-7�;�D�X��A2i#~��I�o��qaD�����,��~�T����?�q���uj����c�=e$�M�������e�Q���(��|G&%[�P���3GP��$��#��K�S��TT�����{Bf�	�)��6�����G���
�Dy-��(U�Z�����}_�������j8��v����5��;Z��U�����s�&����-��<��o#����B�mB�h�;������1++'�!9����GSAG(|�.-��z���Z��F����q_0��F�2o��U7��6�~���X��'�E�����GuguR������|�H���k5�.�E��Z�
OW1�[�n:�\�$�@f�F���dr<I��NK���7,���jb��:"��4����L��5�]�������DA�}������
qV��)����0]��C]�b��Zn�����~�j���5��z��X����q����1�''�<b��$�[/x/�����=L����q�%��y�.]���HfJk�'���[�
bY��:U��
)u^�oD���T�Ak&�fo'�PIB�������O�����0>jM����6������od���ac~l[��oS�w(��89H��m��h���v���
�t:�7��i}	��EQ%������e�e 1������X��k�qO�nl#�"�4�]��Oa���
:��~�"YH��Ey���]
��W������,���	)�V����e�Yx�Y�^^=�����+����Qu:k3CQR���93LR��N���`5{��*�/��g�Z���sBq���J3���d������%�@v������'M�����~��M�|-���0����n��e-��I�Y�%�W�YVQn����i%��Vu��,����	�=��l��� ���"zP�#��30cj�t���w,T�r�����I4r*wS��*�������lP<�����\�fK�@~���?�QX(,��s�I�E���n����x
��^�V�*�y���D������ ���h��}��@>��l��Vo���f�9�������������`V\
��XR��a����y`6�[�59������,yS�Yo]��}��Z�.����-�V�
�w(��:�>���������$a�0U��)����k������v���(f����st�I���L1'{�W��Y	����J+�V��R�65zm���4�n�l�?�v�r��nlR'9�D��I?�y�������b!�������s���.��7����������V9y���n�<��yZ����~��"��C�{�]��{;c��,��2�������"�](:�X�+����]���xPBl������v�<S����L�BY�!�qj�p���/���~��S��;O1�N\>��y���):�+�I�{be����b�QPV+����p�~
{��"�/���������{J��maGuC�����\��5Ksm!���A����j���n�9�W�d����k�,|Gg�bc��pJ�u��]	=�����`����9Wm�.���T��?
��r�RV�R���_�pA{��{F����p�o��]x��7o��J�� ��@i]�*v�W��""�q6�x<�'���C)h�n`b��;n!w
�4��Gd��I���J�p����F_[*l��u���mQT�6���)#��rH5j�[��Y����Q
����� �j^�JF�}�!���9����A8���G��������FI���j�����������g_�R��Fh?+z�����I��������]�
�u�w�����!Zd�\J}Y
9I��K��c��_40��n�V2=}PyD�I�|_t_��k
�+����x�8-��_%k����]��`�U���n@�AP\�<!ot�����3W�N�`����J�B����@Rl���.T�R`N���{��@k#�9�o���vo�#�29����eCD�-�"Q��r�BJ�\g9���������:�/"-
�c?�(�~��/�����z�2�B��
���%`q�?�.�`\O��kN��K]�u��T`)�w�]�<w����$X#-������������l��I����8�L���)g��<(�3�L@�_����P��TFQ�oo�����r{����*Y�VT��rN���}$���I	�[���D�>�	�_�R����,'�rNL�Y?%��<<O�����P)J�S�R
wZ,��.��q2��f�����N�FI7�Ep�L;w��3�(�R�2�C��M/�����kY���&?K�9�-LW��<��K-�W�,��^8�����dl��3�d���Vd�Y
Ho�������L!6���j���]�!=��U����g�b\������A��5��l������~2T����m����\�5d�jV�L�N�M����ElLZ}t��R�N#?tY)�G��N�Hy��%=��uR��:	�k��L����{�J)��|:��\��!m�|w�K�bGx.�y��1l\8p�%X������
���$���Ew�^�^/(���G&$�K�������������3j����_m����_�^��b�����Y���>��[k��G�����;�d���e�?��{�_m����=+�`��Q�H���K8{�Z�S	%,Dx��Kv�����xKw�"c���{K5x��I�x�}G��� W���]���E�>������������`�,�^���q:Ij��z���
��:3@ZT���t�`% �i-�;Y7����`Z�HR[�@b�� ��Y ��CT`6:�t��J��O��P�b$t9�F F����-C3�-�S�zTV!�L���������(7*��#hY��:XrL�N�/jK�s�U����#9C�N�B��%.
�o��;���(�/N��@}��N8xy�C����
�E]+�[*���E��B8&��4@v��=1�(���2W��(���X�P��L���
Be2�6�F�@H��0L2Tm�\SAZ��E�a�xr�����J#A�9��a�r����=�1]��i����$�Pv�	�2~GD���R���mD.a��C���vf�����(�2�kU��WN��&�]���n�N[
�j�X[�[������f[��D-[]�����_�����D4	G��.
��������������>����N?���5O���w�7�'{��W�r�:<�pq��������	~Q�h��9�o^��N��9]�?����3��@�3f���/����W��n�,�W��jT."�L���;*�H�	!|�]S.#���e�snP��)�����-��i"|O;�1��v�CS��I������5^}%i$|���A�N^����/�5���UV�aS��Q�|�������n��Z&�PC{]�����TEp��*�C_*����	_lV����3{q� �Es�{���,v����]���>8�3�B���#Q�7�/�	x��t-������b�`^qP�.�5u;��s�J�R����p����AMH2)����p^�5=1D�
����5���E<l����2:W��������Q��2d5�$�3��a����L Q�r��gP��*�������/���w���=^��=d�����@������I �p���mO\���� k0�9����JE��vl��+���ka��?)��x#�r3H5�y��k=�*�aOG�!�Om����y���?K�]�*�8|� �s�?hJD&��[[(b��"+P��%��_9�����R�x���D��(b����G��U/����-DIT���L�g��t�	U	�k���������1t���nq�)��B�{l�h����~S���y�U��u��qc���w�������-������4h2����`�E�t���`��~B��8{�Ta��0q�t
�������xvy�0�e�R���^���>CGtF@�"�
��������Sl����(�������7�x�#�;��&����>�����1F.�A	tGwr=��H�Bl;�����1�����T{��������JB���F|\We�V�m����w�3n���3�p�$��S��^�p1�����Gc�0��W�Z�(�LYt`���7P�Z"�����e�K���~
ao�������soBJ���{���)���l�6���=je4ji�+����y�Z��(�K@����R�$��+�F&�D. KE�Q�@PXc+(rA����G��*h��SFU�>'����I��#�������])s��������"n�����4���',�X�9�#���*��T~�H�NB� �Q/�BEx����#��f�������>�tZ���x"��?���[1��������|b5�'^I����P,q(T7�$Q���E�J��jeu���BG�������jc�u��
,����JB���,�
���lI�} � ��3pEF�T�L�jA��U7d	6����~�~j_���3!�Gyi�'KH`��T��^��lt�O)J�k��S�J\�G��8����)q����c-���t����2�����]�(�.�?���a����w��jQ�\��-�@���7c.�`�bE����l��p�D��tZ
z
I����P����r������YTm�Po��������<�a	O�@��Ir�~Xt�{��6����t� �(Ji�i����.�r	���H`��&:'�E���P[;���5����,�;�0�Y����]��,�~�F��
��;t��'�_L*J�ED5��XI70��~z���(D���	��:�t�jE*�#��Kb
��J����T���
���lr�����u�4��c5�����
0M���1!^?�Z��h)<t/�!
7�LpT9F���@U�J���@%��1���*H�&��m
W ������%����8k��A���1g}u���
��\�S�@S��`~#�4�
�]��_��w,����aP��$��&4m2Q���0&i����L�(�8[�K�s����3��J&�����f��SZ9��� Gd���N[.�k�7�!���������^�mD/x;W�X�d�A�i��a��0`7OT�����hM���g���5]C�z$:8�$�
���r�&\P����7���H������O8������n���?�c�]U��)����$�!�w�0��O��V�-�3��`�(�|1-~I�h����
������d�F�d0�||US5>����1��	� �`tx�Y;���qn$
��S�CnFgc����8�;<i�==�������!MPB���������k�W�{j�
��BjWy������	
m�P��BR K��,��l�*�(��^���gW��s@%gD'��*DVdw�^(�B�o���&`&��&��7��yS�Md�����D�EJ/#��\���[[��<YF��h��4��_lo?�&��&_6��>P��������6��[��v}���g���w����Zm�3 Q��!�Ao=&�`>c�2�_2�R.5��&���NN/�����p�V�r��[��GTi���M�#Ve��<u����[Ey��t�[�01QD����(-�If��r�|�����\��\r�g0J(�KD��1#X�������a���G�k�dB'��K�$vM$R�X=E[|?���t�
�e�*�����$,�d�g����E@��p*��>��x1sN@�GH����t��A�H4	f��1���{�����y�H�k���/�S	�M��'�� �����f��-��L1R95��NNz��D�$u�|�;rG��&w�l�|�������OV��F�7����M�3�pb��
��8�������T�C!G��l;a��O�c@Y�WE��y�Q��C1Y:��������S����Z�h�)-�������"J��i��kPM/g�T��V�I|���+u��C���P!:���X����m���$#����L���������D:�taGqv��-1]����,./y�GU+�����^�)�8�	�2�
t��NK�.��F�r1�B_��::6�Z�����"E9�s�o�����,0TJB��`�������6h#w�������\2���LQ����
rb�`����6Nz���"�/x&m�QY7�3�A�P<V��8��4�e��Qj=��4�,��$!��1�%�C�9:s��P��M�^��H�����E�G��
f��PE��E%�]�
�[���"�D	������������:��'�*py=�9S�#��j<Eh}`��HL��@�nQ�3-�5rOQ;�Ce�E�(���m1!�[SB�Fm����I��i�H��W������%$�wu
�N;c����E�|��Q�ey�:��uSc�]J��LH�~EN#�1i5�h/21�D��k�Ng�n*�|���1}���H����<"s|'T�b�tB{�qUm�/FsGI�r�v��|�y�&C��R�[5�a�/$K"wU"J�p=��H<������q��sD��*&�����pS�0�Xl����d���Z���8�[����8�]D�^	����v�Fiz��x���3����F}�;7����[VR8����*�6�4��u�M���:�����N1�&@�/��(z�I2�*Q|�;X������yry��p��^&����*������a�?��_�8������`�6�64�_���'��O�@C�����y�MT���-9j�P`<���0=�_".y��+����V���������H��~E���_3�����A�w�.��{/8��^;�=S^Fd�B{@Z��&�p�y�K�����K����^)�n��|	�8���o�Y/}Y�L���C�����X���$���0�6iO���������	z�u��|�����VmY%�{���psv*^�����WR`�[�x����]����3��F�C2�y�������=)8�,W?�p�v��pS�H��*��N1c�^�An�~=wU��a��-���)G�n���]��y}���&&�U���6�����0?�O[j���e���w6_%����i�
h�@)�6�f6�p�����|�i'+��+���M�Q���T���w�p84�J���@
X7	��b;��USM���:����v�-�A��8�����������+[4�~(w*:}�pJ����@���d�l`T`&����&���&?d�YE�f����n��P���	���xf�9�Q��|�5$��6d-���U��7w�{��v_��-�+c#���Oa����D���e����x��:T��W����7{G�{o��g{������v:2U�����R�e
���/'�/��4��;
��n>c�>W��n��q�
gE��H"��W��W���F�%n'Mz�t(�%<��{�.����DD�Lj�s�\�<`/�tq�S���y����Q����i�����o���b��+����U��8R�� ���@����7[�$���Lf'�����3��`OT[�
�Z>a�8a0�u��e��J���I�% � �2MwZ/j���5n�R�sC�"�a� d��I�� �
��p.\����mb�(��a
�9w�X<��)g�1in�!R��r��5).�=����G��\`��ZJL��ot��;�b`��O�<�$I��zU�Q�������|���jdDw����+s�����8p������A�v�r���p��i��w"����?��#�B�W��GYrZ�EB��������������K��X��j��xN�)�k�
.KW�:����5�������b�r�lok�.��o�����1Z�(��u��#qN����DJIa���H&��>y_�.��S��&���D5�8���_4/[r���O.�W��������B�{������c,�������<'��������D��[B5�o>�..��/�D�i����A�^���`t�}Q���:�-�-l���Ey=�S�8ECv��x�����>wq�I�����Z�p�t@r��-`e[�U��!���v�V�C���Iq5M��Y9����6�Se��}�m�K��Yq�UR{g_��&0�R�B_�t�.gKP�#�)/�����:�\�WD�:3��jK���8^�/��qB��N���"�ag�$�d(nd
��#Dk�Bq��H��2����E#v���S�D4A�y/e�}�^��hKt���i�����S��T���@���"&�Im9�=�W����g7I�=�������V�@�qr�m�b��>�����(B��s��#�!���I���)e'�Js���pL~V'�]�J�T����%c|A�Nc���U�%�.*��K�b��Q��6��k1�uLN8�g�l�v�i��p�a�U4���������Xr����^��R��_�JT��j�����*'���B���^#p)8(�k�LUSF�>����MT��\��������HL�zT�a���D�����}g^`P���?��[����l�3��bw76����:S~����Jy�F�x[g,�IH���U�
��������	9��T$�q(g�
��� ���Q~����W7^C�d��i�G?��`��@��x��T�n��`�vp���?���O�.@~���hhws:B�k[jx��������"Kh��Q0�)�^$�V�����6�:�!F��mz�r��\�j�Rn��������l9���EY���X^�#������?��1�;�2YmH_D�>�-}��[�n��e��(%�DtT#��%��"�A=�]�a	������
��`��������w�/F�������Fv��V��	y��{	}
Wm�;���jayo�7��y�+C��^]�l�)�����l'���Wcn-��j�����
��U��[���]z:��D�;���|"��
�R\�)B?{���o���S�W|��O=\?���|�cX�*g_!��.�A�5��������"6 \P��	[�q%n!��-��0��V0������I����.���������kj6�J/,F������V�dN������|��bo_9dE�.1��l�x��l@�^�Aq���t�6���dQ@�G������E�e/�a>���P���Ef����������8��o�_b��!��Y�{�����	S+�Pm�?��zB)��r��� �S��a�d��/�(A?h�D y�����@�2Vr!v�S$�
�j�I��(<\C�����pc��u�S�����A��>���������K������A��:)�_�'���K�5��;�`1��� ���{	s��l��DxW,�!3��!���P�h�$�X���B�wW�A�����<Tw���'���\���#EM{��N�avC��)��9�
�@��L���T�)"�"��&Gr��$b���EbIp��]�Y>)�@�tF�����!�3vj�e=p���(�����>���g�=����{����!X�)���QF�~X:0"����cF���j~��G�d,�$���R�W���Q��;��nS0 X��$h�M�A#:�&��
�%�(�4��,z�a�P����K��('m��&
��w����eQy�"x])��"&RN��H%g�$z��Y��<	jQ�9����L�7��P�����tW��k�t���Ac!���������*a��w�6V4�Doa�9��V�����g��@���p���8�J����#�C�o����%b�����q{)�kd�V��e��6Z��$��wZ�������NS���U#>N)�${RM���R�����c}h�N�����	�JRgy�%/g

;�s����@�R
���8�
~X]Q���w���M�L�;�����C��W���-��t4�$Q?_��Kr��,5���f�V�C��'q	s����O<\����3�{�~�T��Zo%l���k��9�\�<'7�W��)�����M$z37��g�����Q R5�[�u���)[9�(�x�Y����m7b��XG������WMu�Y����Wcl9��R��-I"����2M�C���>���L��W�TV�jjN���
LsB���D3/���B`�\�8r�[�G��!�����v�a7�l�:�7E����%&�P:���[���{�b�������x-]";zK��~p6�P	���N�AU��z�%�*Cdt���&��;�X~���D��=��vo���DZcI�,v�.!Zg-9�&':0W[P"��T_���u�'Y4B��]!�W���=W�����$Mh1j��(N��[�nWv�HS���|���p��-�3%�S��]M��,�w[�[��0���WK��}fR�-�+~��?=hZ�����?�N�P��8�.P��8
x7�-���������,����5��
C�U�1�����>r�
�6��M���{4�!�OJ'���6�1\s4;:E����r�!M����a&��7Z���F>����{Q�� �d�\��]�����b��t&��n�~�~y��OG����������_�Z�9Mu
�V3MsS���M�U���::�pp��:����;���N�0��W��9/���j|����a�i�/����zJ�3T��5��������Yo� �N�����q:��/��/�v�
0k&{����?sVf��Q�v���i��;H�f>��1�9����]�@�����U2�}�*���8o]��	F��
�%����-���A��~���G�M4d���,��I����wXo��������������x������Z�NNO��P�qaQ^��;���+8H�?��/W{H)��Bz��
H����QK�����[��J�'i
���~��4XU���im��s:��N�e��i��OQ��%j�9Jg�����K�=�aat$��r!9�Y}:c��t�6�&���u#����=��L���r� �
%����|8��C���Z���������=G~*��wBC�M��:����b\-pAP�%2����t�v_�1S�� dz��)]�MP�����O��m���n���|�KJ�L��9MV}D�U�QagR����F�Qp����Y��L
"H����q�`�'�q~�o������[$�G���StH)�V!RT��0�6�c��#��2����x�t0����)1m�9i����rHG�s����B�����P��X�/�\ca��<(��e��c{�v_���w� HQ�{c�_7��|���m$���QS�H����:�B�
�Ti)U��IdU��r4x����i����C�����4�����)������p�/�����#\���R�������U��m����E^;b\��j3Z�+������r;A�������2�f+��e�6�����3`f�bJ QTY���d����R�'��,f<����)^������ud���+O�J�C����}�n1�������e�gViH�-_Rt������)1�2��w�x�"-\<FhnD�Q<�&���B[�}�q|�N"JP�s��	m���H7?�#��^���	�*��R��g(\
-o�3�f�}s������������O~&e\~�����y����5�R�l5'����1�\��Mb��kr��x5��6G0�q#�'��,�`<y��HmP{+Fr�A������o~@	@����wf��	��!���n��p�U�&�D�A���W���\)_�����I�6{�(>���L4yF,
���g��@F0"H�QY9~L��?����F����tT ��W�B�M:�ND>�M<<�)�c�F����[s��"�l,� vH�������8�Y�	���U�k�dV`�D�����x������<�#e<tv�`�nh�Q�����g6"���k����#��_d[�M��Z:�41�t�.�c�f5n{�����J�u���w�D!��c�H��)X<���u�*�E��T���1����M����������sz�JQ�1j���f����"�:U�Jz�V���l�3����O��e�U����-yQ��e�b����j8�#{�h�y�D��Q���%�L��N7����m6Tyn���KF����G8O����5�*TPR���������C�?s�3H�-����������������wROb����0�G������>���;%�I|��4p������~��.�y�$�����t�MB]�x=�Wy���XHm����]xh�{��Fk���mw�2���F��!b-w�����j�"5!�y��sKU�Y
������t 8O����3�E��������Y�����vW������)h�<���������:���d�2���J�j�:�"_'�AR���v�����T9v���3M�kZ�	��u�C1K��;9T�Y���?JD:@�d�.����m}�CV-�"b�$lU8��!�-�5�@)/����r���;��j�l��46��e�\�a���/GJ�b�C��Z's�`�a��~�I�[����A�M�o���ms-l��{L��y�l��y0{n�<<{rM�����B"�s���~ |���s���S�O�q��N�
?{3-�n{oN��Nk=*�H
T��qSQ���gT}��LO��m���7O�lo���v�_28�����N�C�����t���/Al�@�C������q��2����e�tYsY�������� ��0�$*���m��v���P�������)Dy�3��t�^��v�����0��].x��W�7������!��;=C��R��:9;��i�`k����9C@x��E��%���):����X��ub&���4,B��jqU�)N?�*����lr;|���������9c
p�f��b*8l�1�����(+��s��X�9�n3�+��t��h��4�9�Z�}��?����.���B��?�>��]6u�J�F����.�NJg���9���
�WT���J�K��7��f�4p���iP%�������{)����G����7e��������"�5�z����vZ'�����[hR�vL��=-�����3��'=���W����w��%m�FL%�!=�XB*~���~�&i�����j4uw���u������V6����[��m36n����]
�S �V��L�����u?�15�Q�����CE��(�����R���������z&N������z�(�(vDSJL������M;InTV\[���]T����������O{&{��m��H���S�vK���_�`�������n���~e ����V�q�Dh�E���W�PymYdj���������ws$��|z����[tB}���3:������i�O+��|�V�f�EK3��L�~6��r28�[�YG����\���:6�+*������s���m��m�c6�	+�T��!MR�ae#��LE��`��"(P����V�e2��l���^�����q��]�������eAZ:L�������/��3���'<�����3���|K�kv��@���C�~��2�at���@�[�vT�P��?/�*;�,�W�����?)B ,����j�.��H���xP�,���}e�\��
��
�����*<J��x����s�<B����6z��X-�&��'�3�G�|Mw����2[���S��������F������u1�o+%d��a+�	v*��t��+^D.�HK1l ��oi���M��w-n��
��������,��-D�n��,��+�����L�;ol�yR���&��-�ox��+R�(�w����&��	$��fj'���<w�2����v���j����D
���3$�l��"� �G���L��$�����A*T�s��

������C,��#8AEP�kw���@gfP�yoZ��
��K��ld��O��rU��Rb-�Ana��1���Ql"�&/�0������(� ���*���!>��3{����
(&���K+�!l\��L\��b�����^<��t����<7D]H��H�
����:�(��xx��t{}�|���SB��|y�z��$�J�|:��~!�T����~��q�&J��g��\r�eS�9'	��+`�'��?q��g��,�r_�S��J��!�<�b����T��T�p�{��K�~���.AB���kH��z���.o����XP�qk�[� ��.����<�5,#C���]������*��.���t�?���Y�F��:�� 9���(%O*\h�G��`�%��G��S�)��b��4�4$(�� 8@%)Y0����E��������B������FU�%�W�� ��a��G%�7r��kG�2��;�������p_�w,�U�Hl�7��,Y~�L!<���E�8 [B����>W��b��82�hi�ob=���������
e��H~�����Kw
�N�<�@�u��E������U��3S�����g�f�P���RP����YaG����]�����z�3�)$�����%�������l�h�x�2��E[^���M%����V�A-wy�_#��nG��IzAB�m��EVI9��hb��
�H'����h��&�I+'�&��,T#
@y>$�3az�������c���������a�K'O���nb�w�A���6�|�v3�>J�hP�����}�)vK�R�J`���uZ�\���u
�LAF J��;md�
����8�fs@A_��vGP)1�����*���|e�b�~�"U">���G.b�pDr�FGo�������R��H�(�p�f�`xn��D��f�z��X6Iz����0�x�������z�`
�?:T�-@���_��4��P	�
��8��)����.q�R�ah��.���
��8����<��\��NgJ��.x���i���
KjqB��
��+]����23�dU:���|��� h�������u�|����Y�]<�D"�,�(
i�0�q�+C�&,|���!A����u�)��,�e�l�W������:���o88��������.	n����&g�U�����b������	����W��"����oB����8�]���%M���7Z<#���p�������Ui S,~{}-�(�/<��nHwU���Q�`�cv:n�u
D�F��U�M^�W��j�y�jt|�=�k	|>������u��;XbX^�f�����\�E�R{\e>,���k�k8l������2���f�)����.8<�{��<z,}	��������E��q*�\?�{��������h�����vskk��t}�4sr�&�s����~��v0��A-��rRW���F�p9_S[�����
7�a����(Ap��CX��,��Rz}�b���E���T���v�s�u�D���
���G�i��
�ZS������'���}�H���-��#�[��x���H%<	�pFH���i��I\�Gg$d��:�wx��L��L/�q�>V����y"V�u���`Vzk��
I��u���5to���I�{��L&��~7�F�����{�����]���~x�R2Y�i�����Z�i�U]���}<��$��l:E���������:����,`��I2t6�T��T~tq���|4���j8(T�_��D_m�/���>m�������Bv�1\/��{0����P���0b� A��`23��yr��G��@d/�z�A�A~U[a�)��P'������8�M���� �&��}����rXX�i�J�5��if�c��iJ.4�uT��R��0����Z]��9��"#8sP	�����	��4���a����W�d��0_n�&z-Zq����0���FJ��)Dm��Dt�k��sq�n�H{w��AJ.�)~�P��t�	���$���O.�'�.�<5e�yd��l��i�������e�c������E��!�Q����xJ�;DgZN:�Ft�����HO�(��<��;��l��^�*�� �<h�l��������e�����D?|<?=9�~<{�zsx�w����E�����9���>�O4$SuAx�����K�&]8d6T
���[���@�u9���'�d?P��~?�wbz��r{�c��c�tZka�Z�&���1��I��,<��Z�)����WEGW-1U]�*�.1����S[%�+�tS����e[���$*���j��f����N��A|��<D+:;�V��0�s;'���]�~��Pi\��o�����\�F#sx�wr��
s�M�'���3���bNFD�SFC��N�m��
����1��$�����"�����i�sA`�m!C*q��JG�
1��
�=�]�P����l~,����������Xm>���2
�
7i�4*�����������c�<<��s9���\��i���@�aS��1M�bN��I�7�j��$�r�D��0���V�YJg4����Qb�5���x�t#J0��$=O���"��-�P]S�,�Id�{�$�<"=c���@�ql����
�q��W�����:X�OS^��;�o7_/4�j/���u���<�����������K�"��r<�$��o���[��F*M��o����
|��c���U)&�/{����sq��[_�����o!��y��E�'P�qA9�np���8*��%S�Yl.���[���$+p
� �`��r$���(A%WI��
y�t���~��+*��
em7+��i�����	�q�2�������t����]h�b�,�
w����.��e���i:&H[~���W��s�$l���P��8��J��~����x��I)Os4��X�	���nL�q��P�����r��B1�*�=�>��nD<��Zg�

��]: g&���r��Q��]����"l��+��D8�W�Tn��Q�����6�#�k��!���?"n��������)���h#����kK(b��*�ib��2c_�/��J��'��,����+kb�|�0���pc�L�aG���_Z���$�d��3�xj�lwh�D��o]��I����[4�rh���^��BY�V�u-l�
r��Q�K���!�o�����^VT�[rFm�cUsERml\!��f>�l�^d%S������84���<��1W�m��$�/�/����n�q7���z�������|�-�������G���q�Y����W�AX�o�'-X�id���p~m���h�_��~8����|������pO[o(�F��yvt��wyxz��n�:9�l�,	o�J����e��7��"����M���z|��������7�#�����w�����;��������B���[�������}�$`��Stjj��MG�����R�%���Z��0;d�:����b���I�!��f�+	����bV���1��X+A[�T�&��_,�����z	?T��mo��N��z�[Yf��������?"�
yR�'���(�y]:���l����|a����g4��;�';80{�<:������	O�����o�W]���yt5?|�L�%]?e�d���c*u�P�w��[��MU��n����`K�#�U�x�j��Zl���W�Q���r�\Z����h"��j�Q8���r����&���Z���g���D\`��D�W�T�v[&�<5�|��{NR�VVsTF����-T����~4cff�S>`����~�.SJ<�;@`�<�<y���x�H�����H5��Ox/�����"���CY�������?�������w�<y	������x+Y��0U��q���t<!��	Sju��n���D�>��o��PK��UO�H���K^�'�e�����UPJ)�m�UQv����8S�vqh��;�`���d����#~��=���s����a��~��"V)��;;/�t�����'����|����zS�%qG>�	oo���� ���E(���9�6S�����n��2��z���%��b�|v��Z�Y�e�R��%n�u���_`
+K�4C�f�m��_;��I�1Nu����/�@��V<�i���L�}���Z����S���%�\]����<�%�>�����O��e���pkw�Vgrk���Y��
��
[�`���Q����.��a���u��r��>���aJ���8�u\C�o 
{��.�^�.\�����Q�O��:������8�d���.y4N��v�B��F��,��w���0��������$^�Q0�����
k��~�*K�����&��
F���������5��:�OQ<����p��J/;���z2��U����r\�9��S,9��l���	��a�}��?���ip��"]�l,B���m���g�{����`%%�p��&�8����g&'W�V���-�U/P����$����}C,(���>	�<- �GR���)�;��k�a�R���n[��dB���b�O���}�Q�7+5DxX]�����rj�����$����*K����_[��7yX{H��.�������D����������h���:�����.C��a�+����x����O���
�^I-+�@�GY�BC5����N��e�����A������ln}� ����������8D�Fl�����`��=d��7������d8G_�F�l���7C/��'�3w�������?{��X���h��fH���d	���� ��?~mV�i�M�\:�}�'���cc
~FW�a��p�Wy����#���F�;���&��[�����6*�/���1�������M-c�}&*���~�1L���{t�|�X������l�k���*Z���P}����h��g��������@�zFL���y��n1��4�����p�J��8�im�G�bE�g[[��/��W�/�����&��S�&[�c��Xr
�C������e0�D����8C������+v��p��}4��f#�*���a/#��#���ui	O,z��=�;n5O8���IV�+3&w��'��������v�A�C8g���kRTp:�I�wQ��2���U��6�Z�"����NO�&���d"��>"�$��l����%\�*�R��V��{GG�����A�������������j�(���9��2b)��)�	�~_��eV���0B�:�����<a��r9�{���g}�ud���Hu[&f��oV�59;�%O`��f/kO6�,wx:�y��'d�(��ct�	���$�:-m�5AtF������`P��K)%�k��B��#������D.� �5&�uN��%�
��@f$pX�sK�@��
��z���)C2;s�BPB���yt�F)�(���%7�o��SB�_i������H =����~IT$=�IzI���vqR%�#�U��	�I=��k�`z����]UX��^��fR���.� a�]Z����t'��:R�k�`fL-
<x*��:
@�7/�I���������b(3�{��Z�K��=T&��9�fX�m���O<�[��P"7�]e�e�yW�����2��J��J<���|�:U�-"�#r<a3�\,9:\��%b��q��9���Ko�e3gX�]W�3w���CZ�g���8�]�g��f�(
5,W������d��P0�e��6�u�j�XD��1�,D����?Y�bA����EAh61���1���NeO�ji_;W�Q�}�8�?������L�����'��h��Ie?���5���l�.�3h?A���>����Q|c���a7i�O�$C)�����|���p�eq��e~�fJ��MF�I����D�A�l�K3:�����1��im�@��&�����<E��q�lQ��-}g~��������Pj�2�<u�Qe(>V��x����8�B�Q�=|�������6����12Y�M�1���1U
<jP	�/�v�����U�@!�T���h��XC���f[*w�P�,/([���TX�����#�����Y���g��7���<�G@E�����!����Y�o���-��^�E�����;>&���}K����A��4��t1��r1~�%�b >�6+|,�NG#���ic�j5q@L�3�e�����CT[����m�/�:�_c�v�����.";�~j����3"E�8C q�Qi��J�y���h
���C��#�~/F��(�|��T{�7^�����E|�-sbZ:j�kXg6$����2�[c6�������(�R�!��a<�w�>��W�T��C��g�to���i$�J�e�����tR�C_A����	��dB��0�����1�l�$m����6x���J�Y[��[+�� �'��hR��@E7���bK�;���C)"�0��{�o��9���<���/�����{6��P+��L��X�3
�h���F����nf'��Q��P"�Qw	���Xj�� 1��!�"UH�_������)��#��HI�K�YJ�K�;�J=��E�gO�
��P�(�N|�F���^KJ8���!"?�)��fp*����v�J�b���I���	�!sJ���X�@B��)�-�e
n�
Gh��X�r�J���v0���w��b�Ev�Q6�=���L	���G�Fkq���FMqm�����a2A�"���o�a��e(+:m�_"�#�>NP� �i���N�*eHh�P����y�������~��h����yk�e���H��`:�����4�FU����M'����Y1�j�������k�[��!i=8�����.u�
I:��m��.��S}u���A�����A
E�4�=����GKZ����6�0��]���d|���v���N�d��EA��n�t'�H�����*Q���;~�6M����pJW�[�v��~v����L�9���@�e0Kt�l[&
��C�Dr	��<B��vs��Yl��N���r��Z�o�w���4�W3�J���W�s��
�����7�m�==o�\6��?�]^�����@�qk�cao��s>;�����S	L��fC�M7���>)tf`���Q<�4��!D�	X�-�s������W�t�:EMT�
����������olip�
����u��P���X����������C���w�>bpsxsq��k���������������ys��<�������� ��N��x<MN���K��?R^�����dRX�zdd���~8y�M��_B����N������rH���5�e�8&:�h�O�SQX]Tm!����E�$@��/�mC
B��]��rh�E-�%�i������wT�6�/5�)�:�|,Y�qGGdY��<�`}#W�wVX��e��m�����D�D��7[���5U|��<�������sM��j�n���,.���#;N}6E�
�X	�S]���r��J���w���ot�����������<��c��,����������"DD���M:�I���l�2�)7C�hH������_�xZ����*�#�[�e�������Dt�t����8��^�7�yi���h}N[��HXS��>�!5(��fo<�7��
#�u$,����CRJ��!����w�.�.;�$]�[*�R�XT�0�?3��5[��Z�yqA��l30��N}��=WD��m�K�]�B�[���Bpi����l���r}��i�����O���r�
7[f��4���p�~��-�jU���[�t�o��7uK-��5�k5�H��1���<V�zVx�0S����uh%���(�gL:��c��ukjW%8�D�\<���mP=���x0�p*�F�p�8�#y$v���@0�5a��32
x�$cR7F_iv_n=Fo���;;�?W0�������]�:V�$����(���!J���U�����*��:�t9�;#4q+r��pL�}�w��s�N��"v���u�TDN���"��!�q��N�dh�:���;�ysbe��D���L�i�:�Tdl�NPx��U�'IE�s�|���Q)��F�!I;uA"�P�z��y����l"�<�����*"�D���X�����[��T�t�D���B���|I�������E]�)���@!��q�	�����M���BG�U���R�N������#CCux���t�xH����Hg:r������i��kFk�~�s��q�\���Q�(F%|H
��9P���F��ZY�M��h�Q�����n����b:q>9����S��D2l��#E���z[6�N��+�OP?#K�Y��Dgc��6�����#������Gq60��5���M��Y�)�:r������|�x�N�0���c��#@��"rk�6�F#�;(N�_l
�?��E�{�@8:9haLEDN?8X�d' |O�S�jpNp�%��i���(N��~��UH���%2��-����l��Vc��:X�AT�9�
�o�2VB���B	�}�eN<��_�	yf;o�@����O�<e��g��/	�������)�+���M��|��)>�D-��E��'X>J:i/e�bG�9h����YHg%�Z����K��^�~o^��6��%A�ag:����$�g^z�:�b�]P���!z~{T����9���
l��$9�5vd`�Z��#:2`��P�������V�G��,�w�`����-_���`�������p��Y��X}�����"��{hLiV\����bI����T@��<����������
_}N�Q�
��=���Uf�&a�2o�����r����86�����k�FY���i���/�	�tK�]�Z�L�v�tK����T������+�Dr;�����v(.�3����\�9q������Az�s����:_�5;��*�.m��Y������c���6e{��1)�%;J;�����a������X�^
�)�HF!���m�#�!_%�����O v�
����v��I����iw����u�*��u�BDP�����1��j3�N�/qg:x8mV���$>��U��8���C����Y��cpe ��j��nf�.��������A<�q"g������#kJY�H��4�q�3����:)����>���1ZG����3�����o�R��e����������;_�,������Kw��6�y�=�[	T��&q�0���7��E������)����p>
�I$�w�d�*%����e���B�?�?����j�b�$�K���U�l:��#�;��fr����'�[B����U�U�%lx	�!W����G�Mz�����&X�'���W:�w�"������A;�S��v.lr���H0�y�{}������F\F����D3lgi8�C������I�=�W��������}[�������@�>1� [��/+���}�d1X���<}\���)UCv�TfK�p����������N��F[X�%�}�a<DQ�C���;#I����U?k�}�G��y6���U&������O�P:J��>��/?��J}��D$�{xd�!��u���"�5[�6L��\���s�X^8q��S���w\�NbB��wFSv�p,��!#�s�Ra���}���'�e}{[�h�C`
������$;�����l:�$���F�X:H�1�������s��Y	��2;Tjc��}�����q�����A}���o{�>�)Mr�0.oH20�C��v�%���l��N�q���E~�����G��UE+���+W{B�tx�/H��N��SK�.
�M���;��w�"q�2Wu�M�M&2t��N}����n�ctT�!�4�=�g�>�x���1/0^�����!g|�>���$��������b��'���Dj��4�*�=��S���U��E69�jD�w��5@������Z)�G�Mh|Ob��P�/Y#�|��I� �s�k����MF_�}��!�p+/�b���1��h��$�?�����tb��
Pv�kBjm����HG>�$��!���iC��)/��d[�3-^5�Z��qP��%D�N�`��5=��r��j���LZ��)�@S��[V
�'sM�-u��'�����-����g��
�W��W������Pa[]����!�6-�����N:GUE:qdq��
7P���e���[��YU5����)�����2+��"��B���Z<�}����+�����t���I{����F����v���|����Z�2����9a>�?l�S���$6w$l/��$fv
j�����(���I�A���xB���*N%!�BKP���������\��a,t������ �8�5���E;�1[�V���>Q�Df���m�������O���v#�������������`Gv.�'�8I��N�����w��T��a(�B$�C�`l�i�kpH~?��:j��<R�N���$0bjc���kIe����A}NF�B�G�P�����{uR u��56s�m�mj�5�4UJ,@R����6�&Y�"PBE���l�<k4�>�Iv�+���(��P)$O�c��u��'W3��Ra����)�q�6r��L���dU��Q
�%�e��}�Y�n�p�Jb)3b�G5��N\��\6�;��{`_���~*�=V��Dk��J|H��6^��^S����D�����S����Jy�y�<�=n4���N���Y+iUP��VR��R���*���P���r�*���4��{��X������������o?����d��W�4���5� ?N����r�u"��\�:A,QIE0�e���Y�G�b'���
O�Q��!��+��������r�X�t��t0D�����{Q���N8DY������	j��"�h��������[P�6[L�	=��6����I/��zd�F%O�h9i���R�/��sO�������5�R��a��4�HY�uX�6�@����u�(������N���<�������WI����qBT��Z�����y�whK�h������ z�^"���?k�#e�'����	�I�,�)�j�*�#��Y��.�F�gH!����OM��?���������.��u�|��bg��x��x��\����\U�9(G�>��.Q����[4����A��[rg�v�I�TZ2�;��`���7;SBG'�~�H��v����$]NA���]���$��_��
~�7�+�^�~x��eR�)��Jr��G���'���I��$���wi=���]�^��jR�v���u��;PP`��A����{��&Z�`g`������#�fp��R���s
[�px����>-�&7��x�(g�]���TOIM�R��"����#�����;O���v&�F��9*5Nx��#��H�{�P���"V����������c������]�����`k7���!W��t��A����w(�d#'=��L������s���j.�$^�z�>Z��`$h��;\B�e�<��H�QA(�����M��_NP<����*�B/��"I��I��x��������A<���,��7���������?�F�+X��c�����?��h�
z��7�tD7����Y�����A�vx�������	��h��M�8�����eM5������G���[!G�y�k/\���G�Y��N;����r�\bU�����5�����~N������/��j5��>��s��������������RZ����I��"���5���2�D�so��Kd��,�8������k))�����?	�H��
N��A,����Q�x����jW����K4����q���g�4��@������K_��A!��w	������W�m�W���jY#;N����(��C��.T���������iZb�Y����KkK�X�@R�=�������ux������6�*Jg�DY�����vw��=���qEt0{�#���� �	ZT���������w,q��������j��Q`�?�-��������y|�KYk�|J-�����z���~�C	4�������'�GD�Q�6�n�ia���i�����!�S���g�g�}�5Yl�D�d<D����P{P�@�SA��%���9��$�T���������*e!CS�%�I�/��a���}��}��G[�����������>$>P�D�
�9�Qp=�;���:i-#���;�� S�o1n�����L��TZrN�m�;#CB�,����3���2�]�QdT����������H���z|xy�<P;��J>5��O�]����u���p.����K`s=JI�:R6U���=���q����������������X�*�#���Z�]�3���C��b�>r/cV
�K��*��h=����`h>
������u��!����G6�.Cru�:�ptyh�v���cq����Eh�?k��>Kv���2?KX����N��$�MA��%��?����z�o���8��*Z�#����2�S.1si��9��/
u�cH:��*`Vf]�{�S����BG�/���:�d��|����f��.t� ���9�^���l���l���D�;��+�<�d��.t��s)7N���|h��x��;��������[�p��m�v�UR`�SL����1I�v�{OAj��DDV&���{D)`3�Q~��\5�/{��A$������'&�������8���z�����yQ!���{��\dW�>!qYpZ��<���n��"����j��7m%(
�J
G�1���cQ�]�3�R&'���+�$Qa��
�d+��&\���U\�3J����`qA�L�������w
w]��%;�D,KU�!���X�������nm��YD8`���,�����>�8��q��AP&��<G/���q6(nP���{��T0k���@��i�T��&t��C�'`NQE�lP�:m��#�m��'l������	�5�c���4��6'x�[�.qO�������O��G�3�+� 9�]��:�Y���B�8&��1�/���j��v�����L��!����`'T���}t�8c��c�SY_�ST�
�
nC�;�pO�Ep2T�o�s.5���Uc@�_���m�L��?��n�-�[i��2a���=�z���;Pa�H������q�%�g�|L[��x�������,�+�^{��~����GW�I���:����K��F�;���.��r~���xe9"ka�m���b��q�y�:�8����V/y���n4���/���u��<�4Ys���&�� ����vD�~�1J��W��l����3>u�O/1U�����v>�+N|?WHl�>g��mzE�LV$J=o�V�C��������
��VNl��bk���Z�������*"k���~����F#~�y���@e%f�*��9<}�|��K�P�Z(v�����{'�Z�'oO����[�R�^�9j.-m�>	� ��r�<��z�n-�J�4?�@z���,[�����,m3|��<���`_��/������m9L�� �Ju�o�4�D=���~�������7����^X$���~���Z���
�h^/i~���:��0o�L���h=
����C��(
�i��c�����Z���e�A[pr����(jQ\%���?��N����-P1D��K���dz�a��r����M	�}��WW�������]���,A,�Mt����1cFu�~J�?����\V�Ob�H�{��_F�(����f�+n��.U��0_]�,6��S�hyD��
���������j!���}��p��Gx����~�<��O�{�F����;I'k]ZA)q2E���1y1=��Cp�����3�J�3��6*����;������.��?���e���F���!�f7����C�u3������<t��
�QQ�m�p�	v�a�{��q���C}M��7hC�a������5*\;E������w9n��������������N����V�lwg��Y��u�m�|�l���s$�"V��hX�������#�����������5��r"0���u��p �@��j��i4�� 3w�^�O����F�&�.s�fW�%H_T[-
N�1�����4R�%+A�v�������t���Y.V�}{=t]�J�T�|��,{�-�b�����~�~��k���D���/ �������0��|x�w;�!�D�a��i���XFU�Q�������^6;���[[/�:������K��f�w���q����<o�w�I��v4uN~t����e��M�'S�{����$\s��j��	d���y�@Y�c�6�0i��������3����5���]� /X���#���v��>����/��l��j�.:�M�#����U�,��;�{�A����'���}q�%^�I�	�����s\�*y�T6���RQH2n��4��C|@��+�#�m�
�q1���#��+�%���je�`3�c��m����A<��5�������1����5�����`�o?����x2���W���K��lE4��U%}��:}L��3�9"��4?��#=,�F����wa9������'O�1z�"�{;������
���P�-�P�?/�	���0J�u�+��]���d�?�������)��Hi���S��R�n�
�|�� y��bd�J�EV��{��=m?y��m/z[�_>	���P�*vRNb���e�&��i.Kk�A=��Ri���#��`�@�K�)�l��%�L�i���0�D�K��U6��a��V����w��h x������W�;�p����{��gk��~-�d�������O����������D���*	�v���*��<EeQ�_9�B��O������/�t���/��i�8���|o����C>�Mx�����,���0����y������I��~mj��L-�)0����:������0ds�nW�����d�������hFMO��L��J�a��������A���
�Xu����&a�M�����c��p���/��3���������
� c�pT���������q��U/S3��S�g�	�~������������u�����t{�Bg��`���[?�����}��3������A].���U��%{��_�%"Rf�;�R���
����G(b�=�N�u����=}��@�9��
��c��(�#A�9��=������[���%�}��i��]>�'!;���[g�
���x
��	81����'�k�%Do)uN���PM$��KG�SK$�YbUkj!^KJ�v���`��3��S����5r�{]#�zn$L��\#��x�e��pI���"���I7�A7�q�c*�k�]D���)���LL�*�)����8e]%bj�Wa��{��k����2
&�[rd<�\�Z��2��dc�rL������!$Z?3x3�G]�{�J��������KL[�84� �EI<����Lb�U~�����&Mf#�{�4��q;��KR����Gr�B������`�M�VB4,�A��f��kI�J��[���bt�������V�&�����T��w�r4EU����0�k�K��^�
�bP�6��\�9g
>��d1��-%����n����@�}"\Iy���>[	��*'�,�9M�uc2!��Hm`p�������'4������~o7��4���	��;?���G��w��#*B
�m�?P�=6KM*X��u���V,\Q�V|�[�

k-����P�J]b�Aj�Q�+�x�]����z����M3#�z�
IB�`����B�`J��hC�B��8���_�k}��)�y���Rtr��u���p��peq����0����GlF{�VE����DZ,�F�"�{��s��']%�S�������^o(�*�u
9�*y���vj����K����%�t+�wLQYq�����,8in����v��]�q?O����w�s{Z��Y����%��l�H
�%�=�y�"�v�#��f,9���)����4��F����~8���h����\pJ�6�<����`b�_Z
�`X���b���J�\�������|!�_���d��%���
a��:m��I�$0x��C���A	&u�"f�E��`L�=a���C<��&HJ�V�����Z:S�}����m�����t�/�M��R:Dr��������;����n�Y5������`w��H����uZ�� z^�����g�v��������E]�?�!��o1@^e���2:cQ���^�I�������������OlzZxa�~
�D��e���KO� �4�����<(e!M���P�Y����C��I�ET�zY
�(
����n�	�v3IK�1*��m���+Y���'�]����jT�tk�� <]��w�8QhDU�]��	�,��)]W�&ky��a���:F)�`��I��ZApC��A���9��Jo�<����-1��sCW�77r���c�M�@���s5^~:����9�Pq�<����]�g�����.�7����-�mEk���6.G&�K�\aM������NoKH�-�����IRi�D�?T�� �{*��y��f��&B��E��Z������Y�vzO��,f����
�/�	{!hnnIc6����]���2��H�El|� �w8���<=?h�����-�T�k��" Ga!l�	|a?�6�� ���AL)��/���_s���_�-[�r~SZ�B����?[��_>�g��oB*�P�K���:���������wM	z����pv�w�,�<j^6A���5Z
�M�~��$�#��pB��_Y����]E����s��@	����:����j�J�.0���5���f���\��8�5�������h���q��M�z�.��B���\n8�����=��6���	����~��C1{
L�$e
���D�����z�2�p�V!���LO
���\����d��4�I7!�|�)����5k,���I����ur4!�X��W��5���������O�����{���&�U{�,w�-�����
�5��p��?���e�u`�Dk0���$�p��`.LM�T�6s�b.�h~
N��O�H�������2UO Ch��D��!qKRS+�z-@�
/�(�����kf/�Y��9��9R�K�3��:�c���j��)�H�nNL������!a���I��IF$�m=���)^������n�
~
������k��t!�rg0bP����}b������������]�(���7�upjD0@�8p�0uq;�
b��B�n:O�s�&�Y��]~b����T���<�e��~���90pD30}�	h�m�'KUq`c����6��I�$h�-
�k�yKJ�u��J�X�G��#A�5���m���?E�z�Yw�&��j*��b�5�`	9a�L>\�KKC�w��&��,{�	�")J���
]C5����2�a�v��G��7��Y��T��;d#�!���E�x�!���.�O��'t�����D�Am�p�5u���Q�m��U�E�)�g�����r����J�����8V_O�I��������Z��>���&����!`�J'aF(�]6���t�t��M�J"�~������� :�}���i1�(������0n\���PP
��z�'�A���'�Av/�pq���U����m���0����g{,���kV������Zi��zNteM��������~�x�-�(�o�%�������U�{���o��z���F$�T�S�����9/��.�+��i)����i,F:�wb�i��� *Z�HC�G������Z�%��/��������3�	q�!������x4�j�
}�6����>t��u�83p
�{4�����
�X��ry-��_�\�R���H��w�t��e�>1J����)�G�{b�lW�����S��!��?��t�b���&x��W��6��������u�nF����������nb�A��\'��"^P�C��h*L+�zbL}�tR��,f�r�Z�(�������(��D�^(�����!����<�i$�ZM4�c
��uT�.���f�I�&�Be���o�8��J+ �=�*��������[�fM1���]}���L'0��/���B�z;�$k��m���z�C��a�rvS���`�0�����4W��}>Q���NA1�c��v��5m,�H�����\R��������~_���-�1�l�	��
A����s��`J���^r<y���/��f�l#�;%��Hq����H�v�������N�y���6�S��j^|��i�����z��������;+��/�z�W��1~�H��_���LR���n����;�&��X�L��^��pU{$}�_������Q�P�U�
0���P]�u$:����g����f�BV��n���ii,��[���,��2������Y�[Z����W��&e�#�����.*��1���7
B2}
����
Uk��4���a�)���k	�,�,���Z`�_j����NP��P�A���W_Z��8�:�U�8;S��p�9.��t+-�V�����g[/�>_,��n��:}��G8����j!"��)��wl������{����{'{go>�&|��"nyUf�J[|�C���F;�+�����e;i� aEw�%�A�E�5�n���G%CL9��y���}4���jE1�E>�g�C�P�o�3�pw�fKf��p��W���n(�u�'��$����!x8F�#g(����/fy***Lo���X*i��.���~%��I|������^�?�&�m6��%O�u�`��1n��,h��D��vx6���S��O�v��)�[oD�8�y�]t��%L3��H��:�w��*
r�������yr�H����4����im�_���FEp��S�E���=2���9
���<�g2�]��g7��o��z��!�I�9���/�u@2\�~��C�PX�(GTV��9L��d��t2U�~jf]�U1N�t���*f<�M��WK�W�Q���E]�/��v�Y����IY����.E�7Be �v���b��h��_�f�L��@��)�[���ohnD6�L�p��b���[r�Zd^X��W�<e��%-SL�1���,NlX�z�_�2��
>��d�U��	�>���U�K;a�������4+\�x\~����w��\ey�j���)�'�+(��h�����e��r���TH�����W�>��9�2��%;�)��kS��R��
���W�JS}4aZ��u���N��s�����mE{�T@9l���WaO���
d�Z��*�eN��N�8�\MY���Z��/�9�<�U��z�JE
���P��b�4�N����,vv� �)>VT�t�Z��|��m� ~�|���{�HE����M�e&R�Uh���w�B(��$#v����Q�������|l�{S��XQ�|w�c���~���/L+��8��,����xsNj����E���o����Qr��X�d����5��X��������w.Y�S�:�l�c���=H�\�����v�M>&�<L�*�D��>�1^;�j����� ��/��[����M���d�%�v�"�/��a�������F$�7|������Y��_�!M	�x6E48��+��
���"	\���5��R{����������I���	��ubbrc��_���<��N'L����d��L@:�����:x4���y(�tS�������~���gN��Z�H<���2D'��l(7Mr�D+�tRw�dB��J����rMFx8�3<�����1����&4��N�(�h�dL8 5�dB��F2�b���3h�����I*~z3�a�Q����<DEo�6������`Bw��C1(��'��
�v�DX��V��W%��~�,���.�"P�h�B��Z�7�i05i��A���
�����[��z���Sm���q���P,o�p��*�9s��:�5+�DAi�L�y��<�n�=�;{B��,�e�����b���+H�>��9�p���hn$���vMTv��1PZ����<Ii$�;��i��������/(��3J2����}|��q�bH��2��1e�'���9�$r�rR\~�?������u�O�t3�0H��4�#��q��������B�4F�����f�w��45��h,Xk*���k�9��)����h�O-<��O�u�.��_U9��^L7���e�t�^��x�Z�d����:�� �9AMM��]��W�W������U��9�+��L��GL��sZ������q��1��	}U�"�
��z!����m6�_���G���n}�;�h�`�����o{���XXA�P�s[x���m%��O��v��v�"@����,�aw:������[������z�}���Ra�LI����v�u��(�C
0���8ix�&�q�v���|�"��Q)=��U�E��	y��"�s+z�������W
�7�}s�GL����+�B�J�D�V����wb���������u��/�.B����������K����P�]_���K;W����)��>14h����*q�+a�t���F����y8L@g��Ig:I����"��%|�q���epU<��[Y�v��}������g;��b1���k�� ��-\��[�Br����K�Y
0�����^�/�KT����xo���8����
�PK���+-�f������t��|���`*����w�6Oo��$�����[/:������VF��uy����N�q�����������u�(d'�P}��s��d�BVY��u'N����n����5JK�'^�wDlc�fl����2�b=�+��7�)���sS��Zi
��)� �����xA,��������o��)Hw�wycKr�C�h��MJ2��ye����D���-D�r�tP1��gi%q��$wR�j#bm���NS��m��!Lf>lD{���[6�|���r�S���t��4�+��?3S��Z)�s*��#;��#���1L����r:b��j$�s!�v�M�b��_#��k�[��h"fA�w�A�
������!���������6����A�/�J���:�PD�s�K?'4-	y���*�UX���oKg�1�3���C��_L��\{�cf�	N4�9ZJ�������q��&��bPf��4��F�U���(�.z���re��	��s��>�o����$6�
�f�0F>Aq���5��m�"�
���~P,\x��������������N����'�n:*���s�lt�

0017-wal-decoding-Introduce-pg_receivellog-the-pg_receive.patchtext/x-patch; charset=us-asciiDownload

>From ae9927102528953815568e3381978390af3521bd Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 11 Nov 2012 13:07:51 +0100
Subject: [PATCH 17/19] wal decoding: Introduce pg_receivellog, the
 pg_receivexlog equivalent for logical changes

---
 src/bin/pg_basebackup/Makefile         |   7 +-
 src/bin/pg_basebackup/pg_receivellog.c | 822 +++++++++++++++++++++++++++++++++
 src/bin/pg_basebackup/streamutil.c     |   3 +-
 src/bin/pg_basebackup/streamutil.h     |   1 +
 4 files changed, 830 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_basebackup/pg_receivellog.c

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a707c93..301a93b 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -20,7 +20,7 @@ override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 
 OBJS=receivelog.o streamutil.o $(WIN32RES)
 
-all: pg_basebackup pg_receivexlog
+all: pg_basebackup pg_receivexlog pg_receivellog
 
 pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport
 	$(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -28,6 +28,9 @@ pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport
 pg_receivexlog: pg_receivexlog.o $(OBJS) | submake-libpq submake-libpgport
 	$(CC) $(CFLAGS) pg_receivexlog.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
+pg_receivellog: pg_receivellog.o $(OBJS) | submake-libpq submake-libpgport
+	$(CC) $(CFLAGS) pg_receivellog.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
 install: all installdirs
 	$(INSTALL_PROGRAM) pg_basebackup$(X) '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
 	$(INSTALL_PROGRAM) pg_receivexlog$(X) '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
@@ -40,4 +43,4 @@ uninstall:
 	rm -f '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
 
 clean distclean maintainer-clean:
-	rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o
+	rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o pg_receivellog.o
diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c
new file mode 100644
index 0000000..65c7ca1
--- /dev/null
+++ b/src/bin/pg_basebackup/pg_receivellog.c
@@ -0,0 +1,822 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_receivellog.c - receive streaming logical log data and write it
+ *					  to a local file.
+ *
+ * Author: Magnus Hagander <magnus@hagander.net>
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/bin/pg_basebackup/pg_receivellog.c
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * We have to use postgres.h not postgres_fe.h here, because there's so much
+ * backend-only stuff in the XLOG include files we need.  But we need a
+ * frontend-ish environment otherwise.	Hence this ugly hack.
+ */
+#define FRONTEND 1
+#include "postgres.h"
+
+#include "port/palloc.h"
+#include "libpq-fe.h"
+#include "libpq/pqsignal.h"
+#include "access/xlog_internal.h"
+#include "utils/datetime.h"
+#include "utils/timestamp.h"
+
+#include "receivelog.h"
+#include "streamutil.h"
+
+#include <dirent.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "getopt_long.h"
+
+/* Time to sleep between reconnection attempts */
+#define RECONNECT_SLEEP_TIME 5
+
+/* Global options */
+static char	   *outfile = NULL;
+static int	        outfd = -1;
+static int			verbose = 0;
+static int			noloop = 0;
+static int			standby_message_timeout = 10 * 1000;		/* 10 sec = default */
+static volatile bool time_to_abort = false;
+static const char *plugin = "test_decoding";
+static const char *slot = NULL;
+static const char *free_slot = NULL;
+static XLogRecPtr	startpos;
+
+
+static void usage(void);
+static void StreamLog();
+
+static void
+usage(void)
+{
+	printf(_("%s receives PostgreSQL streaming transaction logs.\n\n"),
+		   progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]...\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -f, --file=FILE        receive log into this file. - for stdout\n"));
+	printf(_("  -n, --no-loop          do not loop on connection lost\n"));
+	printf(_("  -v, --verbose          output verbose messages\n"));
+	printf(_("  -V, --version          output version information, then exit\n"));
+	printf(_("  -?, --help             show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -d, --database=DBNAME  database to connect to\n"));
+	printf(_("  -h, --host=HOSTNAME    database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT        database server port number\n"));
+	printf(_("  -U, --username=NAME    connect as specified database user\n"));
+	printf(_("  -w, --no-password      never prompt for password\n"));
+	printf(_("  -W, --password         force password prompt (should happen automatically)\n"));
+	printf(_("\nReplication options:\n"));
+	printf(_("  -P, --plugin=PLUGIN    use output plugin PLUGIN (defaults to test_decoding)\n"));
+	printf(_("  -s, --status-interval=INTERVAL\n"
+			 "                         time between status packets sent to server (in seconds)\n"));
+	printf(_("  -S, --slot=SLOT        use existing replication slot SLOT instead of starting a new one\n"));
+	printf(_("  -F, --free-slot=SLOT   free existing replication slot then exit\n"));
+
+	printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+
+/*
+ * Local version of GetCurrentTimestamp(), since we are not linked with
+ * backend code. The protocol always uses integer timestamps, regardless of
+ * server setting.
+ */
+static int64
+localGetCurrentTimestamp(void)
+{
+	int64 result;
+	struct timeval tp;
+
+	gettimeofday(&tp, NULL);
+
+	result = (int64) tp.tv_sec -
+		((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY);
+
+	result = (result * USECS_PER_SEC) + tp.tv_usec;
+
+	return result;
+}
+
+/*
+ * Local version of TimestampDifference(), since we are not linked with
+ * backend code.
+ */
+static void
+localTimestampDifference(int64 start_time, int64 stop_time,
+						 long *secs, int *microsecs)
+{
+	int64 diff = stop_time - start_time;
+
+	if (diff <= 0)
+	{
+		*secs = 0;
+		*microsecs = 0;
+	}
+	else
+	{
+		*secs = (long) (diff / USECS_PER_SEC);
+		*microsecs = (int) (diff % USECS_PER_SEC);
+	}
+}
+
+/*
+ * Local version of TimestampDifferenceExceeds(), since we are not
+ * linked with backend code.
+ */
+static bool
+localTimestampDifferenceExceeds(int64 start_time,
+								int64 stop_time,
+								int msec)
+{
+	int64 diff = stop_time - start_time;
+
+	return (diff >= msec * INT64CONST(1000));
+}
+
+/*
+ * Converts an int64 to network byte order.
+ */
+static void
+sendint64(int64 i, char *buf)
+{
+	uint32		n32;
+
+	/* High order half first, since we're doing MSB-first */
+	n32 = (uint32) (i >> 32);
+	n32 = htonl(n32);
+	memcpy(&buf[0], &n32, 4);
+
+	/* Now the low order half */
+	n32 = (uint32) i;
+	n32 = htonl(n32);
+	memcpy(&buf[4], &n32, 4);
+}
+
+/*
+ * Converts an int64 from network byte order to native format.
+ */
+static int64
+recvint64(char *buf)
+{
+	int64		result;
+	uint32		h32;
+	uint32		l32;
+
+	memcpy(&h32, buf, 4);
+	memcpy(&l32, buf + 4, 4);
+	h32 = ntohl(h32);
+	l32 = ntohl(l32);
+
+	result = h32;
+	result <<= 32;
+	result |= l32;
+
+	return result;
+}
+
+/*
+ * Send a Standby Status Update message to server.
+ */
+static bool
+sendFeedback(PGconn *conn, XLogRecPtr blockpos, int64 now, bool replyRequested)
+{
+	char		replybuf[1 + 8 + 8 + 8 + 8 + 1];
+	int			len = 0;
+
+	if (blockpos == startpos)
+		return true;
+
+	if (verbose)
+		fprintf(stderr,
+				_("%s: confirming flush up to %X/%X (slot %s)\n"),
+				progname, (uint32) (blockpos >> 32), (uint32) blockpos,
+				slot);
+
+	replybuf[len] = 'r';
+	len += 1;
+	sendint64(blockpos, &replybuf[len]);			/* write */
+	len += 8;
+	sendint64(blockpos, &replybuf[len]);	/* flush */
+	len += 8;
+	sendint64(InvalidXLogRecPtr, &replybuf[len]);	/* apply */
+	len += 8;
+	sendint64(now, &replybuf[len]);					/* sendTime */
+	len += 8;
+	replybuf[len] = replyRequested ? 1 : 0;			/* replyRequested */
+	len += 1;
+
+	startpos = blockpos;
+
+	if (PQputCopyData(conn, replybuf, len) <= 0 || PQflush(conn))
+	{
+		fprintf(stderr, _("%s: could not send feedback packet: %s"),
+				progname, PQerrorMessage(conn));
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * Start the log streaming
+ */
+static void
+StreamLog(void)
+{
+	PGresult   *res;
+	char		query[256];
+	uint32		hi,
+				lo;
+	char	   *copybuf = NULL;
+	int64		last_status = -1;
+	XLogRecPtr	logoff = InvalidXLogRecPtr;
+
+	/*
+	 * Connect in replication mode to the server
+	 */
+	conn = GetConnection();
+	if (!conn)
+		/* Error message already written in GetConnection() */
+		return;
+
+	/*
+	 * Run IDENTIFY_SYSTEM so we can get the timeline and current xlog
+	 * position.
+	 */
+	res = PQexec(conn, "IDENTIFY_SYSTEM");
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+				progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
+		disconnect_and_exit(1);
+	}
+
+	if (PQntuples(res) != 1 || PQnfields(res) != 4)
+	{
+		fprintf(stderr,
+				_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
+				progname, PQntuples(res), PQnfields(res), 1, 4);
+		disconnect_and_exit(1);
+	}
+	PQclear(res);
+
+	/*
+	 * init a replication slot
+	 */
+	if (verbose)
+		fprintf(stderr,
+				_("%s: init replication slot\n"),
+				progname);
+
+	if (slot == NULL)
+	{
+		snprintf(query, sizeof(query), "INIT_LOGICAL_REPLICATION '%s'",
+				 plugin);
+
+		res = PQexec(conn, query);
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+		{
+			fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+				progname, query, PQerrorMessage(conn));
+			goto error;
+		}
+
+		if (PQntuples(res) != 1 || PQnfields(res) != 4)
+		{
+			fprintf(stderr,
+					_("%s: could not init logical rep: got %d rows and %d fields, expected %d rows and %d fields\n"),
+					progname, PQntuples(res), PQnfields(res), 1, 4);
+			goto error;
+		}
+
+		if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+		{
+			fprintf(stderr,
+					_("%s: could not parse log location \"%s\"\n"),
+					progname, PQgetvalue(res, 0, 1));
+			goto error;
+		}
+		startpos = ((uint64) hi) << 32 | lo;
+
+		slot = strdup(PQgetvalue(res, 0, 0));
+		PQclear(res);
+	}
+
+	/*
+	 * Start the replication
+	 */
+	if (verbose)
+		fprintf(stderr,
+				_("%s: starting log streaming at %X/%X (slot %s)\n"),
+				progname, (uint32) (startpos >> 32), (uint32) startpos,
+				slot);
+
+	/* Initiate the replication stream at specified location */
+	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X",
+			 slot, (uint32) (startpos >> 32), (uint32) startpos);
+	res = PQexec(conn, query);
+	if (PQresultStatus(res) != PGRES_COPY_BOTH)
+	{
+		fprintf(stderr, _("%s: could not send replication command \"%s\": %s\n"),
+				progname, query, PQresultErrorMessage(res));
+		PQclear(res);
+		goto error;
+	}
+	PQclear(res);
+
+	if (verbose)
+		fprintf(stderr,
+				_("%s: initiated streaming\n"),
+				progname);
+
+	while (!time_to_abort)
+	{
+		int			r;
+		int			bytes_left;
+		int			bytes_written;
+		int64		now;
+		int         hdr_len;
+
+		if (copybuf != NULL)
+		{
+			PQfreemem(copybuf);
+			copybuf = NULL;
+		}
+
+		/*
+		 * Potentially send a status message to the master
+		 */
+		now = localGetCurrentTimestamp();
+		if (standby_message_timeout > 0 &&
+			localTimestampDifferenceExceeds(last_status, now,
+											standby_message_timeout))
+		{
+			/* Time to send feedback! */
+			if (!sendFeedback(conn, logoff, now, false))
+				goto error;
+
+			last_status = now;
+		}
+
+		r = PQgetCopyData(conn, &copybuf, 1);
+		if (r == 0)
+		{
+			/*
+			 * In async mode, and no data available. We block on reading but
+			 * not more than the specified timeout, so that we can send a
+			 * response back to the client.
+			 */
+			fd_set		input_mask;
+			struct timeval timeout;
+			struct timeval *timeoutptr;
+
+			FD_ZERO(&input_mask);
+			FD_SET(PQsocket(conn), &input_mask);
+			if (standby_message_timeout)
+			{
+				int64       targettime;
+				long		secs;
+				int			usecs;
+
+				targettime = last_status + (standby_message_timeout - 1) *
+					((int64) 1000);
+				localTimestampDifference(now,
+										 targettime,
+										 &secs,
+										 &usecs);
+				if (secs <= 0)
+					timeout.tv_sec = 1; /* Always sleep at least 1 sec */
+				else
+					timeout.tv_sec = secs;
+				timeout.tv_usec = usecs;
+				timeoutptr = &timeout;
+			}
+			else
+				timeoutptr = NULL;
+
+			r = select(PQsocket(conn) + 1, &input_mask, NULL, NULL, timeoutptr);
+			if (r == 0 || (r < 0 && errno == EINTR))
+			{
+				/*
+				 * Got a timeout or signal. Continue the loop and either
+				 * deliver a status packet to the server or just go back into
+				 * blocking.
+				 */
+				continue;
+			}
+			else if (r < 0)
+			{
+				fprintf(stderr, _("%s: select() failed: %s\n"),
+						progname, strerror(errno));
+				goto error;
+			}
+			/* Else there is actually data on the socket */
+			if (PQconsumeInput(conn) == 0)
+			{
+				fprintf(stderr,
+						_("%s: could not receive data from WAL stream: %s"),
+						progname, PQerrorMessage(conn));
+				goto error;
+			}
+			continue;
+		}
+		if (r == -1)
+			/* End of copy stream */
+			break;
+		if (r == -2)
+		{
+			fprintf(stderr, _("%s: could not read COPY data: %s"),
+					progname, PQerrorMessage(conn));
+			goto error;
+		}
+
+		/* Check the message type. */
+		if (copybuf[0] == 'k')
+		{
+			int		pos;
+			bool	replyRequested;
+
+			/*
+			 * Parse the keepalive message, enclosed in the CopyData message.
+			 * We just check if the server requested a reply, and ignore the
+			 * rest.
+			 */
+			pos = 1;	/* skip msgtype 'k' */
+			pos += 8;	/* skip walEnd */
+			pos += 8;	/* skip sendTime */
+
+			if (r < pos + 1)
+			{
+				fprintf(stderr, _("%s: streaming header too small: %d\n"),
+						progname, r);
+				goto error;
+			}
+			replyRequested = copybuf[pos];
+
+			/* If the server requested an immediate reply, send one. */
+			if (replyRequested)
+			{
+				now = localGetCurrentTimestamp();
+				if (!sendFeedback(conn, logoff, now, false))
+					goto error;
+				last_status = now;
+			}
+			continue;
+		}
+		else if (copybuf[0] != 'w')
+		{
+			fprintf(stderr, _("%s: unrecognized streaming header: \"%c\"\n"),
+					progname, copybuf[0]);
+			goto error;
+		}
+
+
+		/*
+		 * Read the header of the XLogData message, enclosed in the CopyData
+		 * message. We only need the WAL location field (dataStart), the rest
+		 * of the header is ignored.
+		 */
+		hdr_len = 1;	/* msgtype 'w' */
+		hdr_len += 8;	/* dataStart */
+		hdr_len += 8;	/* walEnd */
+		hdr_len += 8;	/* sendTime */
+		if (r < hdr_len + 1)
+		{
+			fprintf(stderr, _("%s: streaming header too small: %d\n"),
+					progname, r);
+			goto error;
+		}
+
+		/* Extract WAL location for this block */
+		{
+			XLogRecPtr temp = recvint64(&copybuf[1]);
+			logoff = Max(temp, logoff);
+		}
+
+		if (outfd == -1 && strcmp(outfile, "-") == 0)
+		{
+			outfd = 1;
+		}
+		else if (outfd == -1)
+		{
+			outfd = open(outfile, O_CREAT|O_APPEND|O_WRONLY|PG_BINARY,
+						 S_IRUSR | S_IWUSR);
+			if (outfd == -1)
+			{
+				fprintf(stderr,
+						_("%s: could not open log file \"%s\": %s\n"),
+						progname, outfile, strerror(errno));
+				goto error;
+			}
+		}
+
+		bytes_left = r - hdr_len;
+		bytes_written = 0;
+
+
+		while (bytes_left)
+		{
+			int ret;
+
+			ret = write(outfd,
+						copybuf + hdr_len + bytes_written,
+						bytes_left);
+
+			if (ret < 0)
+			{
+				fprintf(stderr,
+						_("%s: could not write %u bytes to log file \"%s\": %s\n"),
+						progname, bytes_left, outfile,
+						strerror(errno));
+				goto error;
+			}
+
+			/* Write was successful, advance our position */
+			bytes_written += ret;
+			bytes_left -= ret;
+		}
+
+		if (write(outfd, "\n", 1) != 1)
+		{
+			fprintf(stderr,
+					_("%s: could not write %u bytes to log file \"%s\": %s\n"),
+					progname, 1, outfile,
+					strerror(errno));
+			goto error;
+		}
+	}
+
+	res = PQgetResult(conn);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		fprintf(stderr,
+				_("%s: unexpected termination of replication stream: %s"),
+				progname, PQresultErrorMessage(res));
+		goto error;
+	}
+	PQclear(res);
+
+	if (copybuf != NULL)
+		PQfreemem(copybuf);
+
+	if (outfd != -1 && close(outfd) != 0)
+		fprintf(stderr, _("%s: could not close file \"%s\": %s\n"),
+				progname, outfile, strerror(errno));
+	outfd = -1;
+error:
+	PQfinish(conn);
+}
+
+/*
+ * When sigint is called, just tell the system to exit at the next possible
+ * moment.
+ */
+#ifndef WIN32
+
+static void
+sigint_handler(int signum)
+{
+	time_to_abort = true;
+}
+#endif
+
+int
+main(int argc, char **argv)
+{
+	static struct option long_options[] = {
+/* general options */
+		{"file", required_argument, NULL, 'f'},
+		{"no-loop", no_argument, NULL, 'n'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"version", no_argument, NULL, 'V'},
+		{"help", no_argument, NULL, '?'},
+/* connnection options */
+		{"database", required_argument, NULL, 'd'},
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+/* replication options */
+		{"free-slot", required_argument, NULL, 'F'},
+		{"plugin", required_argument, NULL, 'P'},
+		{"status-interval", required_argument, NULL, 's'},
+		{"slot", required_argument, NULL, 'S'},
+		{"startpos", required_argument, NULL, 'I'},
+		{NULL, 0, NULL, 0}
+	};
+	int			c;
+	int			option_index;
+	uint32		hi, lo;
+
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_receivellog"));
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		else if (strcmp(argv[1], "-V") == 0 ||
+				 strcmp(argv[1], "--version") == 0)
+		{
+			puts("pg_receivellog (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:F:",
+							long_options, &option_index)) != -1)
+	{
+		switch (c)
+		{
+/* general options */
+			case 'f':
+				outfile = pg_strdup(optarg);
+				break;
+			case 'n':
+				noloop = 1;
+				break;
+			case 'v':
+				verbose++;
+				break;
+/* connnection options */
+			case 'd':
+				dbname = pg_strdup(optarg);
+				break;
+			case 'h':
+				dbhost = pg_strdup(optarg);
+				break;
+			case 'p':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid port number \"%s\"\n"),
+							progname, optarg);
+					exit(1);
+				}
+				dbport = pg_strdup(optarg);
+				break;
+			case 'U':
+				dbuser = pg_strdup(optarg);
+				break;
+			case 'w':
+				dbgetpassword = -1;
+				break;
+			case 'W':
+				dbgetpassword = 1;
+				break;
+/* replication options */
+			case 'F':
+				free_slot = pg_strdup(optarg);
+				break;
+			case 'P':
+				plugin = pg_strdup(optarg);
+				break;
+			case 's':
+				standby_message_timeout = atoi(optarg) * 1000;
+				if (standby_message_timeout < 0)
+				{
+					fprintf(stderr, _("%s: invalid status interval \"%s\"\n"),
+							progname, optarg);
+					exit(1);
+				}
+				break;
+			case 'S':
+				slot = pg_strdup(optarg);
+				break;
+			case 'I':
+				if (sscanf(optarg, "%X/%X", &hi, &lo) != 2)
+				{
+					fprintf(stderr,
+							_("%s: could not parse start position \"%s\"\n"),
+							progname, optarg);
+					exit(1);
+				}
+				startpos = ((uint64) hi) << 32 | lo;
+				break;
+			default:
+				/*
+				 * getopt_long already emitted a complaint
+				 */
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+						progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Any non-option arguments?
+	 */
+	if (optind < argc)
+	{
+		fprintf(stderr,
+				_("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/*
+	 * Required arguments
+	 */
+	if (free_slot == NULL && outfile == NULL)
+	{
+		fprintf(stderr, _("%s: no target file specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (free_slot == NULL && dbname == NULL)
+	{
+		fprintf(stderr, _("%s: no database specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if ((slot != NULL && startpos == InvalidXLogRecPtr) ||
+		(slot == NULL && startpos != InvalidXLogRecPtr))
+	{
+		fprintf(stderr, _("%s: --slot and --startpos need to be specified together\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+	}
+#ifndef WIN32
+	pqsignal(SIGINT, sigint_handler);
+#endif
+
+	if (free_slot != NULL)
+	{
+		PGresult   *res;
+		char		query[256];
+
+		conn = GetConnection();
+		if (!conn)
+			/* Error message already written in GetConnection() */
+			exit(1);
+
+		snprintf(query, sizeof(query), "FREE_LOGICAL_REPLICATION '%s'",
+				 free_slot);
+		res = PQexec(conn, query);
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+		{
+			fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+				progname, query, PQerrorMessage(conn));
+			exit(1);
+		}
+
+		if (PQntuples(res) != 0 || PQnfields(res) != 0)
+		{
+			fprintf(stderr,
+					_("%s: could not stop logical rep: got %d rows and %d fields, expected %d rows and %d fields\n"),
+					progname, PQntuples(res), PQnfields(res), 0, 0);
+			exit(1);
+		}
+
+		PQclear(res);
+		PQfinish(conn);
+		exit(0);
+	}
+
+	while (true)
+	{
+		StreamLog();
+		if (time_to_abort)
+		{
+			/*
+			 * We've been Ctrl-C'ed. That's not an error, so exit without an
+			 * errorcode.
+			 */
+			exit(0);
+		}
+		else if (noloop)
+		{
+			fprintf(stderr, _("%s: disconnected.\n"), progname);
+			exit(1);
+		}
+		else
+		{
+			fprintf(stderr,
+					/* translator: check source for value for %d */
+					_("%s: disconnected. Waiting %d seconds to try again.\n"),
+					progname, RECONNECT_SLEEP_TIME);
+			pg_usleep(RECONNECT_SLEEP_TIME * 1000000);
+		}
+	}
+}
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index 184b459..d030c0d 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -22,6 +22,7 @@ const char *progname;
 char	   *dbhost = NULL;
 char	   *dbuser = NULL;
 char	   *dbport = NULL;
+char	   *dbname = NULL;
 int			dbgetpassword = 0;	/* 0=auto, -1=never, 1=always */
 static char *dbpassword = NULL;
 PGconn	   *conn = NULL;
@@ -54,7 +55,7 @@ GetConnection(void)
 	values = pg_malloc0((argcount + 1) * sizeof(*values));
 
 	keywords[0] = "dbname";
-	values[0] = "replication";
+	values[0] = dbname == NULL ? "replication" : dbname;
 	keywords[1] = "replication";
 	values[1] = "true";
 	keywords[2] = "fallback_application_name";
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 4f5ff91..c6948ad 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -4,6 +4,7 @@ extern const char *progname;
 extern char *dbhost;
 extern char *dbuser;
 extern char *dbport;
+extern char *dbname;
 extern int	dbgetpassword;
 
 /* Connection kept global so we can disconnect easily */
-- 
1.7.12.289.g0ce9864.dirty

0018-wal_decoding-Add-test_logical_replication-extension-.patchtext/x-patch; charset=us-asciiDownload

>From 7f1e6c2aaeb17f2ee33fc88de5e06a6057634751 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen <ams@2ndQuadrant.com>
Date: Fri, 11 Jan 2013 14:36:48 +0530
Subject: [PATCH 18/19] wal_decoding: Add test_logical_replication extension
 for easier testing of logical decoding

This extension provides three functions for manipulating replication slots:
* init_logical_replication - initiate a replication slot and wait for consistent state
* start_logical_replication - return all changes since the last call up to now, without blocking
* free_logical_replication - free the logical slot again

Those are pretty direct synonyms for the replication connection commands.

Due to questions about how to integrate logical replication tests this module
also contains the current tests of logical replication itself.

Author: Abhijit Menon-Sen
---
 contrib/Makefile                                   |   1 +
 contrib/test_logical_replication/Makefile          |  20 ++
 contrib/test_logical_replication/expected/ddl.out  | 250 +++++++++++++++
 contrib/test_logical_replication/sql/ddl.sql       | 154 +++++++++
 .../test_logical_replication--1.0.sql              |  14 +
 .../test_logical_replication.c                     | 352 +++++++++++++++++++++
 .../test_logical_replication.control               |   5 +
 7 files changed, 796 insertions(+)
 create mode 100644 contrib/test_logical_replication/Makefile
 create mode 100644 contrib/test_logical_replication/expected/ddl.out
 create mode 100644 contrib/test_logical_replication/sql/ddl.sql
 create mode 100644 contrib/test_logical_replication/test_logical_replication--1.0.sql
 create mode 100644 contrib/test_logical_replication/test_logical_replication.c
 create mode 100644 contrib/test_logical_replication/test_logical_replication.control

diff --git a/contrib/Makefile b/contrib/Makefile
index 432e915..1cc30fe 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -50,6 +50,7 @@ SUBDIRS = \
 		tcn		\
 		test_parser	\
 		test_decoding	\
+		test_logical_replication \
 		tsearch2	\
 		unaccent	\
 		vacuumlo	\
diff --git a/contrib/test_logical_replication/Makefile b/contrib/test_logical_replication/Makefile
new file mode 100644
index 0000000..7ebbc44
--- /dev/null
+++ b/contrib/test_logical_replication/Makefile
@@ -0,0 +1,20 @@
+MODULE_big = test_logical_replication
+OBJS = test_logical_replication.o
+
+EXTENSION = test_logical_replication
+DATA = test_logical_replication--1.0.sql
+
+REGRESS = ddl
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_logical_replication
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+test_logical_replication.o: test_logical_replication.c
diff --git a/contrib/test_logical_replication/expected/ddl.out b/contrib/test_logical_replication/expected/ddl.out
new file mode 100644
index 0000000..226f8f8
--- /dev/null
+++ b/contrib/test_logical_replication/expected/ddl.out
@@ -0,0 +1,250 @@
+CREATE EXTENSION test_logical_replication;
+-- predictability
+SET synchronous_commit = on;
+-- faster startup
+CHECKPOINT;
+SELECT 'init' FROM init_logical_replication('test_decoding');
+ ?column? 
+----------
+ init
+(1 row)
+
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+COMMIT;
+ALTER TABLE replication_example ADD COLUMN bar int;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+COMMIT;
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+-- collect all changes
+SELECT data FROM start_logical_replication('now');
+                                                                                                                                         data                                                                                                                                         
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "replication_example_id_seq": INSERT: sequence_name[name]:replication_example_id_seq last_value[int8]:1 start_value[int8]:1 increment_by[int8]:1 max_value[int8]:9223372036854775807 min_value[int8]:1 cache_value[int8]:1 log_cnt[int8]:0 is_cycled[bool]:f is_called[bool]:f
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:1 somedata[int4]:1 text[varchar]:1
+ table "replication_example": INSERT: id[int4]:2 somedata[int4]:1 text[varchar]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:3 somedata[int4]:2 text[varchar]:1 bar[int4]:4
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:4 somedata[int4]:2 text[varchar]:2 bar[int4]:4
+ table "replication_example": INSERT: id[int4]:5 somedata[int4]:2 text[varchar]:3 bar[int4]:4
+ table "replication_example": INSERT: id[int4]:6 somedata[int4]:2 text[varchar]:4 bar[int4]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:7 somedata[int4]:3 text[varchar]:1
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:8 somedata[int4]:3 text[varchar]:2
+ table "replication_example": INSERT: id[int4]:9 somedata[int4]:3 text[varchar]:3
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:10 somedata[int4]:4 somenum[varchar]:1
+ COMMIT
+(31 rows)
+
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
+-- throw away changes, they contain oids
+SELECT count(data) FROM start_logical_replication('now');
+ count 
+-------
+    12
+(1 row)
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+BEGIN;
+INSERT INTO replication_example(somedata, somenum) VALUES (6, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod1 int;
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 2, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod2 int;
+INSERT INTO replication_example(somedata, somenum, zaphod2) VALUES (6, 3, 1);
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 4, 2);
+COMMIT;
+/*
+ * check whether the correct indexes are chosen for deletions
+ */
+CREATE TABLE tr_unique(id2 serial unique NOT NULL, data int);
+INSERT INTO tr_unique(data) VALUES(10);
+--show deletion with unique index
+DELETE FROM tr_unique;
+ALTER TABLE tr_unique RENAME TO tr_pkey;
+-- show changes
+SELECT data FROM start_logical_replication('now');
+                                                                                                                                data                                                                                                                                
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "replication_example": INSERT: id[int4]:11 somedata[int4]:5 somenum[int4]:1
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:12 somedata[int4]:6 somenum[int4]:1
+ table "replication_example": INSERT: id[int4]:13 somedata[int4]:6 somenum[int4]:2 zaphod1[int4]:1
+ table "replication_example": INSERT: id[int4]:14 somedata[int4]:6 somenum[int4]:3 zaphod1[int4]:(null) zaphod2[int4]:1
+ table "replication_example": INSERT: id[int4]:15 somedata[int4]:6 somenum[int4]:4 zaphod1[int4]:2 zaphod2[int4]:(null)
+ COMMIT
+ BEGIN
+ table "tr_unique_id2_seq": INSERT: sequence_name[name]:tr_unique_id2_seq last_value[int8]:1 start_value[int8]:1 increment_by[int8]:1 max_value[int8]:9223372036854775807 min_value[int8]:1 cache_value[int8]:1 log_cnt[int8]:0 is_cycled[bool]:f is_called[bool]:f
+ COMMIT
+ BEGIN
+ table "tr_unique": INSERT: id2[int4]:1 data[int4]:10
+ COMMIT
+ BEGIN
+ table "tr_unique": DELETE: id2[int4]:1
+ COMMIT
+ BEGIN
+ COMMIT
+(20 rows)
+
+-- hide changes bc of oid visible in full table rewrites
+ALTER TABLE tr_pkey ADD COLUMN id serial primary key;
+SELECT count(data) FROM start_logical_replication('now');
+ count 
+-------
+     3
+(1 row)
+
+INSERT INTO tr_pkey(data) VALUES(1);
+--show deletion with primary key
+DELETE FROM tr_pkey;
+/* display results */
+SELECT data FROM start_logical_replication('now');
+                             data                             
+--------------------------------------------------------------
+ BEGIN
+ table "tr_pkey": INSERT: id2[int4]:2 data[int4]:1 id[int4]:1
+ COMMIT
+ BEGIN
+ table "tr_pkey": DELETE: id[int4]:1
+ COMMIT
+(6 rows)
+
+/*
+ * check that disk spooling works
+ */
+BEGIN;
+CREATE TABLE tr_etoomuch (id serial primary key, data int);
+INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+DELETE FROM tr_etoomuch WHERE id < 5000;
+UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+COMMIT;
+/* display results, but hide most of the output */
+SELECT count(*), min(data), max(data)
+FROM start_logical_replication('now')
+GROUP BY substring(data, 1, 24)
+ORDER BY 1;
+ count |                                                                                                                                 min                                                                                                                                  |                                                                                                                                 max                                                                                                                                  
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+     1 | COMMIT                                                                                                                                                                                                                                                               | COMMIT
+     1 | BEGIN                                                                                                                                                                                                                                                                | BEGIN
+     1 | table "tr_etoomuch_id_seq": INSERT: sequence_name[name]:tr_etoomuch_id_seq last_value[int8]:1 start_value[int8]:1 increment_by[int8]:1 max_value[int8]:9223372036854775807 min_value[int8]:1 cache_value[int8]:1 log_cnt[int8]:0 is_cycled[bool]:f is_called[bool]:f | table "tr_etoomuch_id_seq": INSERT: sequence_name[name]:tr_etoomuch_id_seq last_value[int8]:1 start_value[int8]:1 increment_by[int8]:1 max_value[int8]:9223372036854775807 min_value[int8]:1 cache_value[int8]:1 log_cnt[int8]:0 is_cycled[bool]:f is_called[bool]:f
+  4999 | table "tr_etoomuch": DELETE: id[int4]:1                                                                                                                                                                                                                              | table "tr_etoomuch": DELETE: id[int4]:999
+  5234 | table "tr_etoomuch": UPDATE: id[int4]:10000 data[int4]:-10000                                                                                                                                                                                                        | table "tr_etoomuch": UPDATE: id[int4]:9999 data[int4]:-9999
+ 10234 | table "tr_etoomuch": INSERT: id[int4]:10000 data[int4]:10000                                                                                                                                                                                                         | table "tr_etoomuch": INSERT: id[int4]:9 data[int4]:9
+(6 rows)
+
+/*
+ * check whether we subtransactions correctly in relation with each other
+ */
+CREATE TABLE tr_sub (id serial primary key, path text);
+-- toplevel, subtxn, toplevel, subtxn, subtxn
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('1-top-#1');
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#2');
+RELEASE SAVEPOINT a;
+SAVEPOINT b;
+SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#2');
+RELEASE SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-#1');
+RELEASE SAVEPOINT b;
+COMMIT;
+SELECT data FROM start_logical_replication('now');
+                                                                                                                            data                                                                                                                            
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "tr_sub_id_seq": INSERT: sequence_name[name]:tr_sub_id_seq last_value[int8]:1 start_value[int8]:1 increment_by[int8]:1 max_value[int8]:9223372036854775807 min_value[int8]:1 cache_value[int8]:1 log_cnt[int8]:0 is_cycled[bool]:f is_called[bool]:f
+ COMMIT
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:1 path[text]:1-top-#1
+ table "tr_sub": INSERT: id[int4]:2 path[text]:1-top-1-#1
+ table "tr_sub": INSERT: id[int4]:3 path[text]:1-top-1-#2
+ table "tr_sub": INSERT: id[int4]:4 path[text]:1-top-2-1-#1
+ table "tr_sub": INSERT: id[int4]:5 path[text]:1-top-2-1-#2
+ table "tr_sub": INSERT: id[int4]:6 path[text]:1-top-2-#1
+ COMMIT
+(11 rows)
+
+-- check that we handle xlog assignments correctly
+BEGIN;
+-- nest 80 subtxns
+SAVEPOINT subtop;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+-- assign xid by inserting
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#1');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#2');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#3');
+RELEASE SAVEPOINT subtop;
+INSERT INTO tr_sub(path) VALUES ('2-top-#1');
+COMMIT;
+SELECT data FROM start_logical_replication('now');
+                             data                             
+--------------------------------------------------------------
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:7 path[text]:2-top-1...--#1
+ table "tr_sub": INSERT: id[int4]:8 path[text]:2-top-1...--#2
+ table "tr_sub": INSERT: id[int4]:9 path[text]:2-top-1...--#3
+ table "tr_sub": INSERT: id[int4]:10 path[text]:2-top-#1
+ COMMIT
+(6 rows)
+
+-- done, free logical replication slot
+SELECT data FROM start_logical_replication('now');
+ data 
+------
+(0 rows)
+
+SELECT stop_logical_replication();
+ stop_logical_replication 
+--------------------------
+                        0
+(1 row)
+
diff --git a/contrib/test_logical_replication/sql/ddl.sql b/contrib/test_logical_replication/sql/ddl.sql
new file mode 100644
index 0000000..ce18b7e
--- /dev/null
+++ b/contrib/test_logical_replication/sql/ddl.sql
@@ -0,0 +1,154 @@
+CREATE EXTENSION test_logical_replication;
+-- predictability
+SET synchronous_commit = on;
+
+-- faster startup
+CHECKPOINT;
+
+SELECT 'init' FROM init_logical_replication('test_decoding');
+
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+COMMIT;
+
+ALTER TABLE replication_example ADD COLUMN bar int;
+
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+COMMIT;
+
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+
+-- collect all changes
+SELECT data FROM start_logical_replication('now');
+
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
+-- throw away changes, they contain oids
+SELECT count(data) FROM start_logical_replication('now');
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, somenum) VALUES (6, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod1 int;
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 2, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod2 int;
+INSERT INTO replication_example(somedata, somenum, zaphod2) VALUES (6, 3, 1);
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 4, 2);
+COMMIT;
+
+/*
+ * check whether the correct indexes are chosen for deletions
+ */
+
+CREATE TABLE tr_unique(id2 serial unique NOT NULL, data int);
+INSERT INTO tr_unique(data) VALUES(10);
+--show deletion with unique index
+DELETE FROM tr_unique;
+
+ALTER TABLE tr_unique RENAME TO tr_pkey;
+
+-- show changes
+SELECT data FROM start_logical_replication('now');
+
+-- hide changes bc of oid visible in full table rewrites
+ALTER TABLE tr_pkey ADD COLUMN id serial primary key;
+SELECT count(data) FROM start_logical_replication('now');
+
+INSERT INTO tr_pkey(data) VALUES(1);
+--show deletion with primary key
+DELETE FROM tr_pkey;
+
+/* display results */
+SELECT data FROM start_logical_replication('now');
+
+/*
+ * check that disk spooling works
+ */
+BEGIN;
+CREATE TABLE tr_etoomuch (id serial primary key, data int);
+INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+DELETE FROM tr_etoomuch WHERE id < 5000;
+UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+COMMIT;
+
+/* display results, but hide most of the output */
+SELECT count(*), min(data), max(data)
+FROM start_logical_replication('now')
+GROUP BY substring(data, 1, 24)
+ORDER BY 1;
+
+/*
+ * check whether we subtransactions correctly in relation with each other
+ */
+CREATE TABLE tr_sub (id serial primary key, path text);
+
+-- toplevel, subtxn, toplevel, subtxn, subtxn
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('1-top-#1');
+
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#2');
+RELEASE SAVEPOINT a;
+
+SAVEPOINT b;
+SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#2');
+RELEASE SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-#1');
+RELEASE SAVEPOINT b;
+COMMIT;
+
+SELECT data FROM start_logical_replication('now');
+
+-- check that we handle xlog assignments correctly
+BEGIN;
+-- nest 80 subtxns
+SAVEPOINT subtop;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+-- assign xid by inserting
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#1');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#2');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#3');
+RELEASE SAVEPOINT subtop;
+INSERT INTO tr_sub(path) VALUES ('2-top-#1');
+COMMIT;
+
+SELECT data FROM start_logical_replication('now');
+
+
+-- done, free logical replication slot
+SELECT data FROM start_logical_replication('now');
+SELECT stop_logical_replication();
diff --git a/contrib/test_logical_replication/test_logical_replication--1.0.sql b/contrib/test_logical_replication/test_logical_replication--1.0.sql
new file mode 100644
index 0000000..724ac20
--- /dev/null
+++ b/contrib/test_logical_replication/test_logical_replication--1.0.sql
@@ -0,0 +1,14 @@
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_logical_replication" to load this file. \quit
+
+CREATE FUNCTION init_logical_replication (plugin text, OUT slot_name text, OUT xlog_position text)
+AS 'MODULE_PATHNAME', 'init_logical_replication'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION start_logical_replication (pos text, OUT location text, OUT xid bigint, OUT data text) RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'start_logical_replication'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION stop_logical_replication () RETURNS int
+AS 'MODULE_PATHNAME', 'stop_logical_replication'
+LANGUAGE C IMMUTABLE STRICT;
diff --git a/contrib/test_logical_replication/test_logical_replication.c b/contrib/test_logical_replication/test_logical_replication.c
new file mode 100644
index 0000000..35b69b1
--- /dev/null
+++ b/contrib/test_logical_replication/test_logical_replication.c
@@ -0,0 +1,352 @@
+#include "postgres.h"
+
+#include "access/timeline.h"
+#include "access/xlog_internal.h"
+#include "catalog/pg_type.h"
+#include "libpq/pqformat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
+#include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/syscache.h"
+#include "miscadmin.h"
+#include "funcapi.h"
+
+PG_MODULE_MAGIC;
+
+Datum init_logical_replication(PG_FUNCTION_ARGS);
+Datum start_logical_replication(PG_FUNCTION_ARGS);
+Datum stop_logical_replication(PG_FUNCTION_ARGS);
+
+static const char *slot_name = NULL;
+static Tuplestorestate *tupstore = NULL;
+static TupleDesc tupdesc;
+
+extern void XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count);
+
+static int
+test_read_page(XLogReaderState* state, XLogRecPtr targetPagePtr, int reqLen,
+                 char* cur_page, TimeLineID *pageTLI)
+{
+    XLogRecPtr flushptr, loc;
+    int count;
+
+	loc = targetPagePtr + reqLen;
+	while (1) {
+		flushptr = GetFlushRecPtr();
+		if (loc <= flushptr)
+			break;
+		pg_usleep(1000L);
+	}
+
+    /* more than one block available */
+    if (targetPagePtr + XLOG_BLCKSZ <= flushptr)
+        count = XLOG_BLCKSZ;
+    /* not enough data there */
+    else if (targetPagePtr + reqLen > flushptr)
+        return -1;
+    /* part of the page available */
+    else
+        count = flushptr - targetPagePtr;
+
+    /* FIXME: more sensible/efficient implementation */
+    XLogRead(cur_page, ThisTimeLineID, targetPagePtr, XLOG_BLCKSZ);
+
+    return count;
+}
+
+static void store_tuple(XLogRecPtr ptr, TransactionId xid, StringInfo si)
+{
+	Datum values[3];
+	bool nulls[3];
+	char buf[60];
+
+	sprintf(buf, "%X/%X", (uint32)(ptr >> 32), (uint32)ptr);
+
+	memset(nulls, 0, sizeof(nulls));
+	values[0] = CStringGetTextDatum(buf);
+	values[1] = Int64GetDatum(xid);
+	values[2] = CStringGetTextDatum(si->data);
+
+	tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+}
+
+static void
+begin_txn_wrapper(ReorderBuffer* cache, ReorderBufferTXN* txn)
+{
+	ReaderApplyState *state = cache->private_data;
+	bool send;
+
+	resetStringInfo(state->out);
+
+	send = state->begin_cb(state->user_private, state->out, txn);
+
+	if (send)
+		store_tuple(txn->lsn, txn->xid, state->out);
+}
+
+static void
+commit_txn_wrapper(ReorderBuffer* cache, ReorderBufferTXN* txn, XLogRecPtr commit_lsn)
+{
+	ReaderApplyState *state = cache->private_data;
+	bool send;
+
+	resetStringInfo(state->out);
+
+	send = state->commit_cb(state->user_private, state->out, txn, commit_lsn);
+
+	if (send)
+		store_tuple(commit_lsn, txn->xid, state->out);
+}
+
+static void
+change_wrapper(ReorderBuffer* cache, ReorderBufferTXN* txn, ReorderBufferChange* change)
+{
+	ReaderApplyState *state = cache->private_data;
+	bool send;
+	HeapTuple table;
+	Oid reloid;
+
+	resetStringInfo(state->out);
+
+	table = LookupRelationByRelFileNode(&change->relnode);
+	Assert(table);
+	reloid = HeapTupleHeaderGetOid(table->t_data);
+	ReleaseSysCache(table);
+
+	send = state->change_cb(state->user_private, state->out, txn,
+							reloid, change);
+
+	if (send)
+		store_tuple(change->lsn, txn->xid, state->out);
+}
+
+
+PG_FUNCTION_INFO_V1(init_logical_replication);
+
+Datum
+init_logical_replication(PG_FUNCTION_ARGS)
+{
+	const char *plugin;
+	char		xpos[MAXFNAMELEN];
+	XLogReaderState *logical_reader;
+
+	TupleDesc   tupdesc;
+	HeapTuple   tuple;
+	Datum       result;
+	Datum       values[2];
+	bool        nulls[2];
+
+	if (slot_name)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 (errmsg("sorry, can't init logical replication twice"))));
+
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	/* Acquire a logical replication slot */
+	plugin = text_to_cstring(PG_GETARG_TEXT_P(0));
+	CheckLogicalReplicationRequirements();
+	LogicalDecodingAcquireFreeSlot(plugin);
+
+	/*
+	 * Use the same initial_snapshot_reader, but with our own read_page
+	 * callback that does not depend on walsender.
+	 */
+	logical_reader = initial_snapshot_reader(MyLogicalDecodingSlot->last_required_checkpoint,
+											 MyLogicalDecodingSlot->xmin,
+											 NameStr(MyLogicalDecodingSlot->plugin));
+	logical_reader->read_page = test_read_page;
+
+	/* Wait for a consistent starting point */
+	for (;;)
+	{
+		XLogRecord *record;
+		XLogRecordBuffer buf;
+		ReaderApplyState* apply_state = logical_reader->private_data;
+		char *err = NULL;
+
+		/* the read_page callback waits for new WAL */
+		record = XLogReadRecord(logical_reader, InvalidXLogRecPtr, &err);
+		if (err)
+			elog(ERROR, "%s", err);
+
+		Assert(record);
+
+		buf.origptr = logical_reader->ReadRecPtr;
+		buf.record = *record;
+		buf.record_data = XLogRecGetData(record);
+		DecodeRecordIntoReorderBuffer(logical_reader, apply_state, &buf);
+
+		if (initial_snapshot_ready(logical_reader))
+			break;
+	}
+
+	/* Extract the values we want */
+	MyLogicalDecodingSlot->confirmed_flush = logical_reader->EndRecPtr;
+	slot_name = NameStr(MyLogicalDecodingSlot->name);
+	snprintf(xpos, sizeof(xpos), "%X/%X",
+			 (uint32) (MyLogicalDecodingSlot->confirmed_flush >> 32),
+			 (uint32) MyLogicalDecodingSlot->confirmed_flush);
+
+	/* Release the slot and return the values */
+	LogicalDecodingReleaseSlot();
+
+	values[0] = CStringGetTextDatum(slot_name);
+	values[1] = CStringGetTextDatum(xpos);
+
+	memset(nulls, 0, sizeof(nulls));
+
+	tuple = heap_form_tuple(tupdesc, values, nulls);
+	result = HeapTupleGetDatum(tuple);
+
+	PG_RETURN_DATUM(result);
+}
+
+PG_FUNCTION_INFO_V1(start_logical_replication);
+
+Datum
+start_logical_replication(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	XLogRecPtr now;
+	XLogReaderState *logical_reader;
+	ReaderApplyState *apply_state;
+	ReorderBuffer *reorder;
+
+	ResourceOwner old_resowner = CurrentResourceOwner;
+
+	if (!slot_name)
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 (errmsg("sorry, can't start logical replication outside of an init/stop pair"))));
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * XXX: It's impolite to ignore our argument and keep decoding
+	 * until the current position.
+	 */
+	now = GetFlushRecPtr();
+
+	/*
+	 * We need to create a normal_snapshot_reader, but adjust it to use
+	 * our page_read callback, and also make its reorder buffer use our
+	 * callback wrappers that don't depend on walsender.
+	 */
+
+	CheckLogicalReplicationRequirements();
+	LogicalDecodingReAcquireSlot(slot_name);
+	logical_reader = normal_snapshot_reader(MyLogicalDecodingSlot->last_required_checkpoint,
+											MyLogicalDecodingSlot->xmin,
+											NameStr(MyLogicalDecodingSlot->plugin),
+											MyLogicalDecodingSlot->confirmed_flush);
+
+	logical_reader->read_page = test_read_page;
+	apply_state = (ReaderApplyState *)logical_reader->private_data;
+
+	reorder = apply_state->reorderbuffer;
+	reorder->begin = begin_txn_wrapper;
+	reorder->apply_change = change_wrapper;
+	reorder->commit = commit_txn_wrapper;
+
+	elog(DEBUG1, "Starting logical replication from %X/%X to %X/%x",
+		 (uint32)(MyLogicalDecodingSlot->last_required_checkpoint>>32), (uint32)MyLogicalDecodingSlot->last_required_checkpoint,
+		 (uint32)(now>>32), (uint32)now);
+
+	CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner, "logical decoding");
+
+	for (;;)
+	{
+		XLogRecord *record;
+		char *errm = NULL;
+
+		record = XLogReadRecord(logical_reader, InvalidXLogRecPtr, &errm);
+		if (errm)
+			elog(ERROR, "%s", errm);
+
+		if (record != NULL)
+		{
+			XLogRecPtr rp;
+			XLogRecordBuffer buf;
+			ReaderApplyState* apply_state = logical_reader->private_data;
+
+			buf.origptr = logical_reader->ReadRecPtr;
+			buf.record = *record;
+			buf.record_data = XLogRecGetData(record);
+
+			/*
+			 * The {begin_txn,change,commit_txn}_wrapper callbacks above
+			 * will store the description into our tuplestore.
+			 */
+			DecodeRecordIntoReorderBuffer(logical_reader, apply_state, &buf);
+
+			rp = logical_reader->EndRecPtr;
+			if (rp >= now)
+			{
+				elog(DEBUG1, "Reached endpoint (wanted: %X/%X, got: %X/%X)",
+					 (uint32)(now>>32), (uint32)now,
+					 (uint32)(rp>>32), (uint32)rp);
+				break;
+			}
+		}
+	}
+
+	tuplestore_donestoring(tupstore);
+
+	CurrentResourceOwner = old_resowner;
+
+	/* Next time, start where we left off */
+	MyLogicalDecodingSlot->confirmed_flush = logical_reader->EndRecPtr;
+
+	LogicalDecodingReleaseSlot();
+
+	return (Datum) 0;
+}
+
+PG_FUNCTION_INFO_V1(stop_logical_replication);
+
+Datum
+stop_logical_replication(PG_FUNCTION_ARGS)
+{
+	if (!slot_name)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 (errmsg("sorry, can't stop logical replication before init"))));
+
+	CheckLogicalReplicationRequirements();
+	LogicalDecodingFreeSlot(slot_name);
+	slot_name = NULL;
+
+	PG_RETURN_INT32(0);
+}
diff --git a/contrib/test_logical_replication/test_logical_replication.control b/contrib/test_logical_replication/test_logical_replication.control
new file mode 100644
index 0000000..e73b797
--- /dev/null
+++ b/contrib/test_logical_replication/test_logical_replication.control
@@ -0,0 +1,5 @@
+# test_logical_replication extension
+comment = 'test logical replication'
+default_version = '1.0'
+module_pathname = '$libdir/test_logical_replication'
+relocatable = true
-- 
1.7.12.289.g0ce9864.dirty

0019-wal-decoding-design-document-v2.3-and-snapshot-build.patch.gzapplication/x-patch-gzipDownload

���P�[kW����_Q��^Hlk����8�`��$���U�.IZ�JW��2��}NU�C��;����������QG?��B�ga_� <:�K5���O���|���D����tz�;:o�X\���=��C�_���{����P��0UF���<����$c���U�	���������-L����������g���h�c��u>�S�P����"z��������D��$��l�OF�b&A�Pq&V��@`Oab�4�$k�I�#\+V��~��n���Aw"�[��T-#�L'q7Jf�uO�����;��L��{q�?�m?���F��������8���t����N��;����WoO�N*�����g�J�@��Fs�T����
����Is�!�TAb��J�������4}��PO�����L��WHb����c�T�m�8T�����<<��I%D7T�n�GQ����~�I��[���k����S��������PH�=��h����\������U�i��3���*����eb����Fs$�u��0
����g�u�L$���%��4��ma���y�n�'0���0����T���p�[�	�E�	�_��8K�0���/���\���^�%`|j2�>.#�s����N�l��[�����

�K��J�
��I��?�M��e����xl�b�d:U)�*�S�,u��`��Mh��UY0�^I���e�`
�XF�H�
;t��X	(A9���P2�4�_�����t�O��n����dt������0P��fO��f�~�vb3$Rzq�����3�����H�,��\��CC��h���}D���/��x�3�����2aq�8Z�-f����#9�������2BVk%�[�dX <��������� I����0������JQYnB]��{ .��i� �C\�'��
��2�Zv�����Z�$�fj��N+��R%$�dw	( ��Lg������XV
+�F|��eZ)�OY����[h�L������T�@�b�e�a��oV�t���:I:�.�1r��:������A�i�����)����Q���W'W�NU����BJ���:��A�AR��|@����,L�h�$�"�Sb�SEE�3�(2�(����{+��&�� E4����lP�7B�����a�����Th����{�zV1��(���J���+!k���\�*��cF�*'1����j�����RT�.T��g>�������]r�������Y''g��vl�@�0<( �)��`�L(Y�Z����no�81`��Z��B5�g3,	����)))/t��Q�7p.�a��#�FG�K,#>!��!��6u:���E��)���ln����u�,�$�|IhFM��&��	
�S��Y���L�o���7 �Y�@����P��.�nf�&������+1�F�s���pF:��$�z� ,
����@��Ak��:0� a�����|m�8�I��V5|w1�9� '�'�Jb�X���jN�-|-H�n���+�aq�a~��D�f����Fs�QMs�v2�(I0q'w1i�����c� �A" �["B����c��a]V��i0��(b�M���|g��D��y��s�v�(I�}����
X\2]L+���QDE�Co�5A�"���f%SM�qE�Ee:����	g�������{�8����Nj�2P�h�&���Y����?s���3�t�zZ�	�}�gyJk�� ���K��
���Xr�����d�1����`�����q�.��l�q�����Y<������Z~��F[4�����R�V���(��pz����As����!���[�)X�/,$zM��+�D]�n@��<�;��J���g��;��%�-"�T��.M7�8��ILvR�l��y�	�dy��@1��/����=�������O����a�^-�2�>���(��u�5MAm��>6KX6����%�b6Y����^}��d�.�j4gy/$�������w�s
��c"����~+�?�3P��W�rPGfWn�7L�#���Xbb[�v�
����S�4���#':��e��R�[����8LNJ������pf&��@�~t��mU��K�[a�����9���)�E`���s��ks�Q�_@EA�S������(xAFk�A�t,���.�p^Q0I�)T�#N9L�����6�lnsoy�h=5��f��Y��
:�N�DI�~���eq,VSp�q���|������F��$#��a_� ?ApB����j�wDB	��3����9�`��}�����r�U-��(�R��q�q1��?��/�e��
����I+9;vc��ev��H3l�Bb���t�D�")AD#l�M1�����i�{'��1��vUhZ������e����1��n�)�-Tecg�5�����w
��- ��8l��Q��p;	�x���Tw3�a��X���fD�N�x��XH��cF������Kz��6����{Ru-����9d����:���C~9����`��|�yiK��e������$���s��-��[T^t����E�#C�xm�f�Bq��+��:Fy�E_Nj��?#�����m*�b��\���;����#.$�-&���^c�&F@#�'yy�f��`���J�/T������4V���%��87,><"{g��g.�R�<N�BH��!	r*������Wo�w��v���G����������r�z�	�Ya�F�?-��c)��EP�su'FU]�Bz��4�;��(jh�CmJH"*l:�M�)�����GK�����h.���b��HFdg���7J�N�3��}�b�C��(;��p������Y�V�Bi���&��d}��sNL<TQ�����[+R�q��L��+��-`mn~g��Oc�Vw�W��~l�������
J��Z6T��\:���'�r����'k[��e�8t�PiKp�����g�Y�[�)�����g0vH�5[�L8��h�rD&T0b�"����e��t���.�
�������*�E
?H�
-����>0%��G`�C����A��=e����������t	�H�rn�&LElr�	�+�������(	#�66��D���\���m��vh�\3�'T���)m�'��[Ki\��� ��L��#-l0]��:nq�=���Lm|M�@u��|�S#�.g`8��B�>���{�E�B�p�9�s���h�{�gl[=zO�Y�`���R�@���O��/�nRY��Y>�}���>��{��^=�i�g�J����;5��_k5���a�C���t�G;:��������b�����ynt�������+���nn������������o>�v�F ���u�A�6}�J6W����xOE�h��Lh���1��w����w�ww�y::����.W�J1���&����v��Yl-R�ZdE �-j�?�"��}��~���O]Z�8�m�kzR=�a��*���Q��$�����AZH�z���+-8��=x�j�W�9[�d�47�9,��UP��r�������"�9�6��N9��yS�����@c��~�><(�f5�k���������F9A?�s�>7��}�(H0j1���|[���.�s��r���cN�uQ������T��7�}������0�[$;���K���r�o���-6���nYn�����_��8�,X�����"a,r�$]������4*���jaS
��/���M�8������4�c�Cp2��:�-���*v���^g�K�4�����,�R������i�:J�m/G5x��Y�,8!��[.�)����rB-(g��E-���0�I"���F������tE�%6���p�2����Hj`�kf��f=]��^�6���TE���6�#?�$�J�FUV�m���c������?���m
E�����8�`�	�F|����.T�I�qR[���S��|��/�������L��T����#���V���e-�3�K�r�u��;����M/�������0]����Y��P�(�Y�B_�S�������H����&���L&R���Rv�BQ���������kg��#U�B��;�!�{Ef�����F��bGR�R�n�vP^�R}Y�%W��zk^\u�Kd6������������@���Ywe��e���0�'�'3�,-g��~���w��X��D��\����5���]B�OcY��	7bP
x��;�����F������thZ9���S�	����!�yUD
�^�".N��5"H��&&f@�qi����+�Q��`\5h��VHK���Z>�a��%������Iy���%�3�I�CU;43�!Y��_���<=:z~����;<����5�Mm����y�g����
y���p�[J������l�;U7^{ETD���U�=+��6��
Y+���$�@\��E5����-�qJX^s5���������(5Ke���/=\	��.CM��A�Nr�qA��x�7�
Mqd�h���}���";���7�T�;{���"[�q�,L���;v�c���E"��Fs��LQ���������`�D��Y���*J���e:���A��3og�:\�y��.y���@	����&��w���u~�=��NF�B,�[E���"L�O��;l��#��e�A�qtb��[��Z6�	��T2����o��)�S^�/h�����ZS��!�7V����bK-K���k[}_�5R�:u�������6VQ��Qs6�x�4��HC�]$N�<�7������4�yJR.��W�E�o����Ol7)��[���"�(j�T���d�k\y��T:a�_�����d>R�(^��>������em����k�������5g���������"4H��qhkwd���#��U�/�{��	�qE��W��f��[���Z��E����8x���Q~�����3�U;yH�����BY:c�6;��v�[����wO�J#`p��L#�J%4�u����h;�\�r	��I��eT�m�ta
�bk�c�����V�]�N�mO�O<�n��_��?_6o4���������w>a�b���o���m4�.��^���W�����D�Lw���|5~}z���������:�j���z��Wn����_��oN/���F�����M:�Y�?�����~p���������X�>8�����`�����n���/��_//N�o6i���
�_���.�N��������]f���C��\����I���KM�L����+�lx�u��]gW������Z<��Do����i�mcb6�L�n�W�W�NN�����7�b"�n�'~
-�[��AZ�`�W�Z��+�������_��AK��=;c
�x����OS2��1�EH��J- C�2����R�wl^�I����w(��)��
��R�{T�������7c������A�/�R�������3���D��r�@�t�.�H����s+�/�}����7\��>���������f8����{����_�� ���cQ���7m�x���5������QliW�����w{��N�>�i	���x���������9�0�u�����g�d9�Y����;���a����)�;�������-��J"���M2��� �p�������?����[l��=��Ae1�}�s��"����
>�k�������?�����[x������c����I�#R��m�<}S@_�)��/��^��\������V���/7rV4�d���\��lbs�A<�2�F����m������_��_�u;T��z ������
mw���-�PvCs����K��8��V������h��(�����}���]���n��785K�Q�f�wq���������Q]����?�������&����v�q���_���CR$@I�����9�s(R�K$����dG ��.��������4m����-��o�W��[}[��L��$FQsL���y�U��x�����oS����k*��9Y99�<>Y���U�=|��F�{���W���y�z�Gg�H�����},hF���~_i�p������4B�	/���<�@v�h�����k�����>'I��-&�������nDR�i`���^���`5o
���f3OD�|���s��~����$`25')&@��2����C�R�_��O����,�_�l���t�7���/�<
yz��'Y�oUV�C��t�5V?0��V�/d�����_�u}��rm�<cRd�|������9��i0��eU^;HV&�F"���������:�3v����<��m&s������0��� ���@c����z�-�T���7e=��_N�&?����G{,9��|��u����G�����3���lBb?`|��[p�.�k>�tx�5�S�b�����
Yp)]�mA;�H=�lq3�j�j��W-���jV`s�YN�] d��4}��n{2p%'�P��g]�	� ��H7�t�4�k�vy���/��=H�����n�Y4t�%�]<Wl;���Q�s�B6�	����D�f�x=@��� 2�8R�{��A�8/����3 f�mOnG��A����>G)"�C�`pbr}+8���}�0O4M����~)�T����-W���K����}�"'������L4&�\��+H7�d
�17��"���z<V��.�&Cc����,�=��9���n�[p�=�L�H��/������h�������;�|&sja��9R1:<gR��0����j�����_.���4�R�*	`�VFZ�� +3d������C��:M���S�1���k���<Mr��V�0��^xb�������������EW��nO/�i�8������8��E��@1��������:�^�L@@c	��T�,70��&���g�^��T�cJ)�&�L�;%��y����[�IR*����>��P����)�����T_�g'�o4���_D�es��-/�y}�;�?�6��,�4I6�\��*I����`CY�!l�a���gY���$�_���-*X��2W�0d8E��fg�M��;�&�O���$
e�IT�)�����|9/1��G��i r0�@a��/aS'�C�+��
�Y_���V6��#GdE���F��Hmd�DBT��I��5N�+�JY&���(6��
�R��H� m������~5Y6�J�8�����p9����������fVN�
�!V�����Q��������_��'�sj3j�Q�$j�(M]��(�=b�r��$1����(�EB�I�9VR2Uw���P�H���P�V��������{7����?��M��)Vdi��p
!����	��S���������o����]��%}�
 U�1����g�O�\:���P����xL��������!�����,�
�&�cc}���A���8C�j)��D�:���F��js��
8E\[��C��KK9_�$#h��X]C����	E��|\����@�W��R��u&�S����1�n��(R��[���'��+Z�m��n�l�Q�C��q��[5 p�����fK7O�Xm�������|(�h��f�_�*��l=X����[j���$���i#����|>��G��S ��9�O�$�9����K�%�/Y�1��{t����7�ROj��O��]�0�����'A��59��������x�;���F����!�D5
����(�m��8���dk�
(7��(���k�A�^c�U�����7���!������
�M�D��^��uL���f��;��/��p��Bk����nV�N�
�N�#&����.�%���J���M��`^L�U��"��)�1��
��S\i���Z���hG�Z���8V��lA��y�f.������r�^t�L=T�X1qB4q6�&�U��{����d��d��e�W#jX�K���/M�����j�F;QS�[`4��������U�Va��2�%+3Z7���g9�"h�LI1��e�L�
	�_����DW1��9~�]S����9oC�h�.	b����b�%�P���:�?�+I��X����p�&�����R
@���Hbc���2��L�Z��x��RZb���+��: O��F�kTb�O��f7k����n��I�c3��^��F�~j��k��{;UFzz
m�����;���1���}�&��uQ�<��e�"I�%)����^�V��`�K�d��r�AP�SDhG�#uT~@�c9�f�i����J�j��4�7Q��Me�6W��e~��������79I��M��h��sp���y��]ny�q�'8���`�i$�tPx�Y���Z��p������a���s(�nl�A}�=i�H������J�����r3Jw�b4@eQ�!�g���?����������m��b�6 ��;�huo#��u�������������y����������[/���g���HyH��7�@]�v�s��B���/��|��c�IO5���,�P99AA����k�5]��Z�o2(�@�c��*�]�>9��#�ds�lq��I�0��D=�-CO�������!������"O=����w�[�l",�}g�?���]����a%$�*VM�Z,�����sp��������';/�G��i�A����4�J��k��3&����9��=�?����U���
{'g��?�m��~�j������B�m��Ch���[vO���O6�N�~*�y������B���
�["�d'�$}����*�S"�y��H�������/cfCR�7�60/w��� I���o�740>��?���u`@��������
_�9�'G;�/��7O6g�?���a�(�~��@W���'�������z�76 ��h���-y(�q�`�6�W��8����_��c{yH�2�������L�s-1��H4��6n]�jP�l�p\Fs�\IlNc��D#fL�`�7�Q�f��/<������������:q�9@k���I;"r�\��}:W�S5�4L��}P����p�/��p���	�B�P`[/��m����P�����w�0E���A�L��T��U}������F��/��:��af����6�wX-
�����U1)���x#�
#��������u��;/���}�e� b?���X@)<��I��[�m�&����_��#��������O����Eg��i�_�W��W�������O�6��<�����yL\��)�������b)�~_�LF�g��l�Q�-y��.e-u�?R$�����||Wtnd��G�E���'���4+;!����G41S���-����~�V�[b��K7���Rp�����o�Q���/MJo�"Jo%`���{��RXi%,��}}y��^@�E<O��/G���g��+�X�I3�}��+�l3ZM1��/�]�K�@G�c�z�U�F���V�Z�0+LW��	�'�k�^ePa�bO�K�����0���Kq0L�m��Ac|�&����G�����)~�L�������k���7$�%x��Wo����.
�#�u���!�@����Pj?�V��q~|��<� ��&�*(2#�EiS��e_�����
�w#����Nv�����~�jQ�&��H����$�C#)m���r�I���d��M�&��
�W�i���#^FM��M=
vI����Hu�/��s�������{4n��k#���dv�����
��n�@#=�^!���0�OOg�T�C`;���6��i�O>7}�6!k�"uI���>��&���d��R���%����E����7��}7�TF����a�P4[i��N��xV3��W�)Z5�m����3W T��A�xW�J��,�n����	���+}�7FG�b�-��x���h�l��L��+�����&�[�N��XbF����S��ME!�����AA�z���,������/M��6��@���?��3$/�:L����7��
�#��Yn�:���R�x�C�?m%<�Rv��\r�o"����V�G�(+�g���������9�#/�����J�A�lo>2
]K����|�|�,>0	���
��K�����M�{6P��f�=�z�E�14^Z��o�[����7�e9�9��� U��������*4){��k��I��a]�7���,b������������e y�a!�����W�sB�q��������q�w��p�
_m����$��Fq�@��fj��g7�����f��r��Me�'8�6&�@�p��"H=��h�3���L*�
<��Zy5)��+�C��'��?���XN{�r1!�G����� H���MLGci��WwUoB}�{d�_����[f���.@��\���6J���,�[������(�c���S-���8�9��8�J@�/�(�������P�����f�M��[��S����#K�	�~��I�@l�:<�"=9c8D��2o��������e��Lj!�l���\�y��Z�O�Q	�..U� ��BN����F����U�������E���4�T@U��!�@����U�/�Dg9k�=��S���yvW_GLz+&L	<�GHC�}��7�i����$L>]W�C�BOF�V�gX�<E�b�j���R���^9��.�]5j�}���;RT�K-�@$�,G\�TG�0����5�][{��|�k
[���������w�5PfK��:G�J�	�G��������7���za��d��������e���I���v�[�<(��f�����������:9.X������,�3�������<�)+���~��V���9"q��{��2�-[��C�*��&g�����������hw"��>���A��CBT�x2�������7�zd����<��	r-<x��Q6��a�9�/���{��(������uK
�-NN������zr��� ���h2Te���J�q�-���rsN��2�
��1>��,���)X���X�7��.K�'��{�����w��O�6��g%����6�j�4�&�V����UY_��!*�Y�x�V�F�y&�h�V��M�OV2������_����t����?��+:k��1h�;gY���n�����y���12+��?��28�x5$mkb�mW%����~+���m�
�@�.:� 0�W��f�����~��S��n�r1�c��T^Bs�\���XfyM� KZ�Bm���L���[�=�`c��M@��m��q�n[v�` �V�_�q�Qr�c��rK�` ��n��%vi�t��1mr~L&�u@=$%*�(�:LeU�T���C��p�%��`)�d��Q.f��A��&�W���;Cx����V���V|��l6�+���!�IIq��N��\����<�8N�682�������������,\i�%6��:��������.�������qF�������E3�-h
�47�
���oZ6ov�@����<H��s �Kks�i��}�'��|iqp?}�#q���_Z�X��4G5����y�O2>��u��/-�������{��M�-L������S�����h+HI	J���
������_���]����/�����Z�;>���v��u/k�S�����X�*�7��n��b�5{�p�[_��I|?������cc\��\#s/��S�e9,/���k.����R$Z�����C�O�v�����
M-U
����T���Y�!1��YX#����m���j�0h�D�����a�c[56#Q��F��=5KR�\��?a�8��h�Wr�[�"�+�ah�9��@���Z�%�DZ���e��R7d�w�B���@�U}��P9Kk+�m��g�pu50B���Ix:��E�]�LTt:���PM����pdD�%]Z>}��M�j��c�>[�r~�1.7i�\S7.b��t ���n�k\��M�N��*�.u�c��=�L��gx��i9�^F_����������(����dWfl��5s�Sh�'�n�b�$X�^SB}`f�o���a���N�_E����F��a�,���N�-����WU��zP�-���.h���9�nZX�j�)�,���E�E����SU�qb���O��Q�����*��LV�G�PY��=t�_V"�R����n�"��x���L���GRjC�p�r|+$=���;�����@���������>~���sv��=�����|Z�"�������m�G���1}�,�w�k'����2��u�M�Z|�������?��:��lr���oZ#�X����on����"�j�@�('�
1i���q�h���.#�f~��$�.}N?2�`���5�A�B+�����
�	G;��{�;��;�[\�P��������#OwU�����_Q�L"�.aT��w�
�����O�H����aY��5>:i�!S�4�J�a5�qo�fpI5^u�C���)K�L)�����,1���N91<F{�����'��&�'P�+�_��0"�,���j�!��]��"�gbe��:���i����� ��P63�����M��������
i��:%@�����'	��^(@�5�8������z��G\4�B�hx����e�U�XQ�>Cq��w���fck���������/�6e�"�R!�����/�3&��R<f�V�3����Cia�<��h�:Y�BO����)����,r���R�����?����V����f4��h���,�54�I���������q*�4��T�	��]���]i��"�����6�|������Q�������U�|$t�U&/Xu>B�p|����ikM��U�R�%����+d#NM���N�G_H�Z�Y����G!�la� �Up�b��F��r�W�6��z��EGu���1�T����u����i\W>��"��G�^��E^Y�����*��t���������k����V�i�qn��Dn�T9?�n�#��5.�[�P�
s{��'�2O��7��b=��+��Cq�r��������N<���C�}�s�dNV
�
Ep�CS4�_�l�z|v��ZL�����������P��1p���'�����������[�m����[���N�r��[
�x�������{%b��P���qQW�����?*��4��J$�
1�r\�vA�5CyA-@���h�FKf�T�����.iS!�P��#���������6)==�������h(���E^�w�55���.�C%AL���.3*�.(~���]$pQNZ�=J���u��"=��c{xtHzr�wv���@�q��1-�E=����d���1����Q���.�D������V#J�������M"������T��6��$�B��0Z>��b����J��rF�~��F�[M;1���iZ�Q�8k<nx��p5p������B��5.�Z���(S��/j��,6DN���mT��� ���]�jl;#�����l��rT�G�1�3d��y#��Ii
H������-�:c6,�,�9.*JC4mo`=�#������M�Wr��J��Bbc���`��H�#������SrB��5�9��Yv����b'_&F9%��p�TV�����yY���|�j�P����#D����A<3v�!gBI��^P��
@���a`�O��<�����AU�r��Kk1�$�V����z���Q�6�;tC�k�#��!U
����wc�>�l�dU.+�QXZ�3I>cV��y;��^���X��uCR{��)��ykl��c�q�J�_�����ct���O+���'�$�Ku������E%pa��eJq��B���s12 1�Zi[;9uv7��������8(��k;=���q�[�$N�����4-+��o��*����m�Kt��K�� ���B�u�7�L5-&�����[8U�U	_m�_q@6)�n7qV��9h� ��Q�`8%�z��������������Cl���K1�Z��f�����N�Xz�����q���?t/�������~=��-�?�^Et��

0016-wal-decoding-Add-a-simple-decoding-module-in-contrib.patchtext/x-patch; charset=us-asciiDownload

>From b82ff96f3d3f791fcf06f9daaf2b8b97f82d0779 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 11 Nov 2012 13:01:52 +0100
Subject: [PATCH 16/19] wal decoding: Add a simple decoding module in contrib
 named 'test_decoding'

This is mostly useful for testing, demonstration and documentation purposes.
---
 contrib/Makefile                      |   1 +
 contrib/test_decoding/Makefile        |  16 +++
 contrib/test_decoding/test_decoding.c | 231 ++++++++++++++++++++++++++++++++++
 3 files changed, 248 insertions(+)
 create mode 100644 contrib/test_decoding/Makefile
 create mode 100644 contrib/test_decoding/test_decoding.c

diff --git a/contrib/Makefile b/contrib/Makefile
index 5d290b8..432e915 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -49,6 +49,7 @@ SUBDIRS = \
 		tablefunc	\
 		tcn		\
 		test_parser	\
+		test_decoding	\
 		tsearch2	\
 		unaccent	\
 		vacuumlo	\
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
new file mode 100644
index 0000000..2ac9653
--- /dev/null
+++ b/contrib/test_decoding/Makefile
@@ -0,0 +1,16 @@
+# contrib/test_decoding/Makefile
+
+MODULE_big = test_decoding
+OBJS = test_decoding.o
+
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_decoding
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
new file mode 100644
index 0000000..1d5df59
--- /dev/null
+++ b/contrib/test_decoding/test_decoding.c
@@ -0,0 +1,231 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_deocding.c
+ *		  example output plugin for the logical replication functionality
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  contrib/test_decoding/test_decoding.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/sysattr.h"
+
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "catalog/index.h"
+
+#include "replication/output_plugin.h"
+#include "replication/snapbuild.h"
+
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relcache.h"
+#include "utils/syscache.h"
+#include "utils/typcache.h"
+
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+
+extern void pg_decode_init(void **private_data);
+
+extern bool pg_decode_begin_txn(void *private_data, StringInfo out, ReorderBufferTXN* txn);
+extern bool pg_decode_commit_txn(void *private_data, StringInfo out, ReorderBufferTXN* txn, XLogRecPtr commit_lsn);
+extern bool pg_decode_change(void *private_data, StringInfo out, ReorderBufferTXN* txn, Oid tableoid, ReorderBufferChange *change);
+
+void
+_PG_init(void)
+{
+}
+
+/* initialize this plugin */
+void
+pg_decode_init(void **private_data)
+{
+	AssertVariableIsOfType(&pg_decode_init, LogicalDecodeInitCB);
+	*private_data = AllocSetContextCreate(TopMemoryContext,
+									 "text conversion context",
+									 ALLOCSET_DEFAULT_MINSIZE,
+									 ALLOCSET_DEFAULT_INITSIZE,
+									 ALLOCSET_DEFAULT_MAXSIZE);
+}
+
+/* BEGIN callback */
+bool
+pg_decode_begin_txn(void *private_data, StringInfo out, ReorderBufferTXN* txn)
+{
+	AssertVariableIsOfType(&pg_decode_begin_txn, LogicalDecodeBeginCB);
+
+	appendStringInfo(out, "BEGIN");
+	return true;
+}
+
+/* COMMIT callback */
+bool
+pg_decode_commit_txn(void *private_data, StringInfo out, ReorderBufferTXN* txn, XLogRecPtr commit_lsn)
+{
+	AssertVariableIsOfType(&pg_decode_commit_txn, LogicalDecodeCommitCB);
+
+	appendStringInfo(out, "COMMIT");
+	return true;
+}
+
+/* print the tuple 'tuple' into the StringInfo s */
+static void
+tuple_to_stringinfo(StringInfo s, TupleDesc tupdesc, HeapTuple tuple)
+{
+	int	natt;
+	Oid oid;
+
+	/* print oid of tuple, it's not included in the TupleDesc */
+	if ((oid = HeapTupleHeaderGetOid(tuple->t_data)) != InvalidOid)
+	{
+		appendStringInfo(s, " oid[oid]:%u", oid);
+	}
+
+	/* print all columns individually */
+	for (natt = 0; natt < tupdesc->natts; natt++)
+	{
+		Form_pg_attribute attr; /* the attribute itself */
+		Oid			typid; /* type of current attribute */
+		HeapTuple	type_tuple; /* information about a type */
+		Form_pg_type type_form;
+		Oid			typoutput; /* output function */
+		bool		typisvarlena;
+		Datum		origval; /* possibly toasted Datum */
+		Datum		val; /* definitely detoasted Datum */
+		char        *outputstr;
+		bool        isnull; /* column is null? */
+
+		attr = tupdesc->attrs[natt];
+		/*
+		 * don't print dropped columns, we can't be sure everything is
+		 * available for them
+		 */
+		if (attr->attisdropped)
+			continue;
+
+		/*
+		 * Don't print system columns
+		 */
+		if (attr->attnum < 0)
+			continue;
+
+		typid = attr->atttypid;
+
+		/* gather type name */
+		type_tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
+		if (!HeapTupleIsValid(type_tuple))
+			elog(ERROR, "cache lookup failed for type %u", typid);
+		type_form = (Form_pg_type) GETSTRUCT(type_tuple);
+
+		/* print attribute name */
+		appendStringInfoChar(s, ' ');
+		appendStringInfoString(s, NameStr(attr->attname));
+
+		/* print attribute type */
+		appendStringInfoChar(s, '[');
+		appendStringInfoString(s, NameStr(type_form->typname));
+		appendStringInfoChar(s, ']');
+
+		/* query output function */
+		getTypeOutputInfo(typid,
+						  &typoutput, &typisvarlena);
+
+		ReleaseSysCache(type_tuple);
+
+		/* get Datum from tuple */
+		origval = fastgetattr(tuple, natt + 1, tupdesc, &isnull);
+
+		if (typisvarlena && !isnull)
+			val = PointerGetDatum(PG_DETOAST_DATUM(origval));
+		else
+			val = origval;
+
+		/* print data */
+		if (isnull)
+			outputstr = "(null)";
+		else
+			outputstr = OidOutputFunctionCall(typoutput, val);
+
+		appendStringInfoChar(s, ':');
+		appendStringInfoString(s, outputstr);
+	}
+}
+
+/*
+ * callback for individual changed tuples
+ */
+bool
+pg_decode_change(void *private_data, StringInfo out, ReorderBufferTXN* txn,
+				 Oid tableoid, ReorderBufferChange *change)
+{
+	Relation relation = RelationIdGetRelation(tableoid);
+	Form_pg_class class_form = RelationGetForm(relation);
+	TupleDesc	tupdesc = RelationGetDescr(relation);
+	MemoryContext context = (MemoryContext)private_data;
+	/*
+	 * switch to our own context we can reset after the tuple is printed,
+	 * otherwise we will leak memory in via many of the output routines.
+	 */
+	MemoryContext old = MemoryContextSwitchTo(context);
+
+	AssertVariableIsOfType(&pg_decode_change, LogicalDecodeChangeCB);
+
+	appendStringInfoString(out, "table \"");
+	appendStringInfoString(out, NameStr(class_form->relname));
+	appendStringInfoString(out, "\":");
+
+	switch (change->action)
+	{
+	case REORDER_BUFFER_CHANGE_INSERT:
+		appendStringInfoString(out, " INSERT:");
+		tuple_to_stringinfo(out, tupdesc, &change->newtuple->tuple);
+		break;
+	case REORDER_BUFFER_CHANGE_UPDATE:
+		appendStringInfoString(out, " UPDATE:");
+		tuple_to_stringinfo(out, tupdesc, &change->newtuple->tuple);
+		break;
+	case REORDER_BUFFER_CHANGE_DELETE:
+		{
+			Relation indexrel;
+			TupleDesc	indexdesc;
+
+			/*
+			 * deletions only store the primary key part of the tuple, display
+			 * that index.
+			 */
+
+			/* make sure rd_primary is set */
+			RelationGetIndexList(relation);
+
+			if (!OidIsValid(relation->rd_primary))
+			{
+				elog(LOG, "tuple in table with oid: %u without primary key", tableoid);
+				break;
+			}
+
+			appendStringInfoString(out, " DELETE:");
+
+			indexrel = RelationIdGetRelation(relation->rd_primary);
+
+			indexdesc = RelationGetDescr(indexrel);
+
+			tuple_to_stringinfo(out, indexdesc, &change->oldtuple->tuple);
+
+			RelationClose(indexrel);
+			break;
+		}
+	}
+	RelationClose(relation);
+
+	MemoryContextSwitchTo(old);
+	MemoryContextReset(context);
+	return true;
+}
-- 
1.7.12.289.g0ce9864.dirty

Josh Berkus

josh@agliodbs.com

almost 13 years ago

In reply to: Andres Freund (#1)

Re: logical changeset generation v4

Andreas,

Is there a git fork for logical replication somewhere?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

anarazel@anarazel.de

andres@anarazel.de

almost 13 years ago

In reply to: Josh Berkus (#2)

Re: logical changeset generation v4

Josh Berkus <josh@agliodbs.com> schrieb:

Andreas,

Is there a git fork for logical replication somewhere?

Check the bottom of the email ;)

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Abhijit Menon-Sen

ams@2ndQuadrant.com

almost 13 years ago

In reply to: Josh Berkus (#2)

Re: logical changeset generation v4

At 2013-01-14 18:15:39 -0800, josh@agliodbs.com wrote:

Is there a git fork for logical replication somewhere?

git://git.postgresql.org/git/users/andresfreund/postgres.git, branch
xlog-decoding-rebasing-cf4 (and xlogreader_v4).

-- Abhijit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Abhijit Menon-Sen

ams@2ndQuadrant.com

almost 13 years ago

In reply to: Andres Freund (#1)

Re: logical changeset generation v4

At 2013-01-15 02:38:45 +0100, andres@2ndquadrant.com wrote:

2) Currently the logical replication infrastructure assigns a
'slot-id' when a new replica is setup. That slot id isn't really
nice (e.g. "id-321578-3"). It also requires that [18] keeps state
in a global variable to make writing regression tests easy.

I think it would be better to make the user specify those replication
slot ids, but I am not really sure about it.

I agree, it would be better to let the user name the slot (and report an
error if the given name is already in use).

3) Currently no options can be passed to an output plugin. I am
thinking about making "INIT_LOGICAL_REPLICATION 'plugin'" accept the
now widely used ('option' ['value'], ...) syntax and pass that to the
output plugin's initialization function.

Sounds good.

4) Does anybody object to:
-- allocate a permanent replication slot
INIT_LOGICAL_REPLICATION 'plugin' 'slotname' (options);

-- stream data
START_LOGICAL_REPLICATION 'slotname' 'recptr';

-- deallocate a permanent replication slot
FREE_LOGICAL_REPLICATION 'slotname';

That looks fine, but I think it should be:

INIT_LOGICAL_REPLICATION 'slotname' 'plugin' (options);

i.e., swap 'plugin' and 'slotname' in your proposal to make the slotname
come first for all three commands. Not important, but a wee bit nicer.

-- Abhijit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

almost 13 years ago

In reply to: Andres Freund (#1)

Re: logical changeset generation v4

Andres Freund wrote:

I've been giving a couple of these parts a look. In particular

[03] Split out xlog reading into its own module called xlogreader

Cleaned this one up a bit last week. I will polish it some more,
publish for some final comments, and commit.

[08] wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement

This seems reasonable. Mainly it has the effect that a transaction can
have exactly one less command than before. I don't think this is a
problem for anyone in practice.

[09] Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader

Seemed okay when I looked at it.

Second, I don't think the test_logical_replication functions should live
in core as they shouldn't be used for a production replication scenario
(causes longrunning transactions, requires polling) , but I have failed
to find a neat way to include a contrib extension in the plain
regression tests.

I think this would work if you make a "stamp" file in the contrib
module, similar to how doc/src/sgml uses those.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Mark Kirkwood

mark.kirkwood@catalyst.net.nz

almost 13 years ago

In reply to: Andres Freund (#1)

Re: logical changeset generation v4

On 15/01/13 14:38, Andres Freund wrote:

Hi everyone,

Here is the newest version of logical changeset generation.

I'm quite interested in this feature - so tried applying the 19 patches
to the latest 9.3 checkout. Patch and compile are good.

However portals seem busted:

bench=# BEGIN;
BEGIN
bench=# DECLARE c1 CURSOR FOR SELECT * FROM pgbench_accounts;
DECLARE CURSOR
bench=# FETCH 2 FROM c1;
aid | bid | abalance | filler

-----+-----+----------+---------------------------------------------------------
-----------------------------
1 | 1 | 0 |

2 | 1 | 0 |

(2 rows)

bench=# DELETE FROM pgbench_accounts WHERE CURRENT OF c1;
The connection to the server was lost. Attempting reset: Failed.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Mark Kirkwood

mark.kirkwood@catalyst.net.nz

almost 13 years ago

In reply to: Mark Kirkwood (#7)

Re: logical changeset generation v4

On 15/01/13 17:37, Mark Kirkwood wrote:

On 15/01/13 14:38, Andres Freund wrote:

Hi everyone,

Here is the newest version of logical changeset generation.

I'm quite interested in this feature - so tried applying the 19
patches to the latest 9.3 checkout. Patch and compile are good.

However portals seem busted:

bench=# BEGIN;
BEGIN
bench=# DECLARE c1 CURSOR FOR SELECT * FROM pgbench_accounts;
DECLARE CURSOR
bench=# FETCH 2 FROM c1;
aid | bid | abalance | filler

-----+-----+----------+---------------------------------------------------------

-----------------------------
1 | 1 | 0 |

2 | 1 | 0 |

(2 rows)

bench=# DELETE FROM pgbench_accounts WHERE CURRENT OF c1;
The connection to the server was lost. Attempting reset: Failed.

Sorry - forgot to add: assert and debug build, and it is an assertion
failure that is being picked up:

TRAP: FailedAssertion("!(htup->t_tableOid != ((Oid) 0))", File:
"tqual.c", Line: 940)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Mark Kirkwood (#8)

1 attachment(s)

Re: logical changeset generation v4

On 2013-01-15 17:41:50 +1300, Mark Kirkwood wrote:

On 15/01/13 17:37, Mark Kirkwood wrote:

On 15/01/13 14:38, Andres Freund wrote:

Hi everyone,

Here is the newest version of logical changeset generation.

I'm quite interested in this feature - so tried applying the 19 patches to
the latest 9.3 checkout. Patch and compile are good.

Thanks! Any input welcome.

The git tree might make it easier to follow development ;)

However portals seem busted:

bench=# BEGIN;
BEGIN
bench=# DECLARE c1 CURSOR FOR SELECT * FROM pgbench_accounts;
DECLARE CURSOR
bench=# FETCH 2 FROM c1;
aid | bid | abalance | filler

-----+-----+----------+---------------------------------------------------------

-----------------------------
1 | 1 | 0 |

2 | 1 | 0 |

(2 rows)

bench=# DELETE FROM pgbench_accounts WHERE CURRENT OF c1;
The connection to the server was lost. Attempting reset: Failed.

Sorry - forgot to add: assert and debug build, and it is an assertion
failure that is being picked up:

TRAP: FailedAssertion("!(htup->t_tableOid != ((Oid) 0))", File: "tqual.c",
Line: 940)

I unfortunately don't see the error here, I guess its related to how
stack is reused. But I think I found the error, check the attached patch
which I also pushed to the git repository.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-wal_decoding-mergeme-Satisfies-Setup-a-correct-tup-t.patchtext/x-patch; charset=us-asciiDownload

>From 25bd9aeefb03ec39ff1d1cbbac4d2507d533f6d1 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 15 Jan 2013 11:50:33 +0100
Subject: [PATCH] wal_decoding: mergeme *Satisfies: Setup a correct
 tup->t_tableOid in heap_get_latest_tid

Code review found one other case where tableOid potentially didn'T get set, in
nodeBitmapHeapscan. Thats fixed as well.

Found independently by Mark Kirkwood and Abhijit Menon-Sen
---
 src/backend/access/heap/heapam.c          | 1 +
 src/backend/executor/nodeBitmapHeapscan.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1ff58a4..3346c8a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1789,6 +1789,7 @@ heap_get_latest_tid(Relation relation,
 		tp.t_self = ctid;
 		tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 		tp.t_len = ItemIdGetLength(lp);
+		tp.t_tableOid = RelationGetRelid(relation);
 
 		/*
 		 * After following a t_ctid link, we might arrive at an unrelated
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index c83f972..eda1394 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -258,6 +258,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 		scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 		scan->rs_ctup.t_len = ItemIdGetLength(lp);
+		scan->rs_ctup.t_tableOid = scan->rs_rd->rd_id;
 		ItemPointerSet(&scan->rs_ctup.t_self, tbmres->blockno, targoffset);
 
 		pgstat_count_heap_fetch(scan->rs_rd);
-- 
1.7.12.289.g0ce9864.dirty

#10

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Alvaro Herrera (#6)

Re: logical changeset generation v4

On 2013-01-15 01:00:00 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

I've been giving a couple of these parts a look. In particular

[03] Split out xlog reading into its own module called xlogreader

Cleaned this one up a bit last week. I will polish it some more,
publish for some final comments, and commit.

I have some smaller bugfixes in my current version that you probably
don't have yet (on grounds of being fixed this weekend)... So we need to
be a bit careful not too loose those.

Second, I don't think the test_logical_replication functions should live
in core as they shouldn't be used for a production replication scenario
(causes longrunning transactions, requires polling) , but I have failed
to find a neat way to include a contrib extension in the plain
regression tests.

I think this would work if you make a "stamp" file in the contrib
module, similar to how doc/src/sgml uses those.

I tried that, the problem is not the building itself but getting it
installed into the temporary installation...
But anyway, testing decoding requires a different wal_level so I was
hesitant to continue on grounds of that alone.
Should we just up that? Its probably problematic for tests arround some
WAL optimizations an such?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Alvaro Herrera

alvherre@2ndquadrant.com

almost 13 years ago

In reply to: Andres Freund (#10)

Re: logical changeset generation v4

Andres Freund wrote:

On 2013-01-15 01:00:00 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

I've been giving a couple of these parts a look. In particular

[03] Split out xlog reading into its own module called xlogreader

Cleaned this one up a bit last week. I will polish it some more,
publish for some final comments, and commit.

I have some smaller bugfixes in my current version that you probably
don't have yet (on grounds of being fixed this weekend)... So we need to
be a bit careful not too loose those.

Sure. Do you have them as individual commits? I'm assuming you rebased
the tree. Maybe in your reflog? IIRC I also have at least one minor
bug fix.

Second, I don't think the test_logical_replication functions should live
in core as they shouldn't be used for a production replication scenario
(causes longrunning transactions, requires polling) , but I have failed
to find a neat way to include a contrib extension in the plain
regression tests.

I think this would work if you make a "stamp" file in the contrib
module, similar to how doc/src/sgml uses those.

I tried that, the problem is not the building itself but getting it
installed into the temporary installation...

Oh, hm. Maybe the contrib module's make installcheck, then?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Alvaro Herrera (#11)

Re: logical changeset generation v4

On 2013-01-15 09:56:41 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2013-01-15 01:00:00 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

I've been giving a couple of these parts a look. In particular

[03] Split out xlog reading into its own module called xlogreader

Cleaned this one up a bit last week. I will polish it some more,
publish for some final comments, and commit.

I have some smaller bugfixes in my current version that you probably
don't have yet (on grounds of being fixed this weekend)... So we need to
be a bit careful not too loose those.

Sure. Do you have them as individual commits? I'm assuming you rebased
the tree. Maybe in your reflog? IIRC I also have at least one minor
bug fix.

I can check, which commit did you base your modifications on?

Second, I don't think the test_logical_replication functions should live
in core as they shouldn't be used for a production replication scenario
(causes longrunning transactions, requires polling) , but I have failed
to find a neat way to include a contrib extension in the plain
regression tests.

I think this would work if you make a "stamp" file in the contrib
module, similar to how doc/src/sgml uses those.

I tried that, the problem is not the building itself but getting it
installed into the temporary installation...

Oh, hm. Maybe the contrib module's make installcheck, then?

Thats what I do right now, but I really would prefer to have it checked
during normal make checks, installchecks aren't run all that commonly :(

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Andres Freund (#12)

Re: logical changeset generation v4

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-01-15 09:56:41 -0300, Alvaro Herrera wrote:

Oh, hm. Maybe the contrib module's make installcheck, then?

Thats what I do right now, but I really would prefer to have it checked
during normal make checks, installchecks aren't run all that commonly :(

Sure they are, in every buildfarm cycle. I don't see the problem.

(In the case of contrib, make installcheck is a whole lot faster than
make check, as well as being older. So I don't really see why you
think it's less used.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Tom Lane (#13)

Re: logical changeset generation v4

On 2013-01-15 10:28:28 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-01-15 09:56:41 -0300, Alvaro Herrera wrote:

Oh, hm. Maybe the contrib module's make installcheck, then?

Thats what I do right now, but I really would prefer to have it checked
during normal make checks, installchecks aren't run all that commonly :(

Sure they are, in every buildfarm cycle. I don't see the problem.

(In the case of contrib, make installcheck is a whole lot faster than
make check, as well as being older. So I don't really see why you
think it's less used.)

Oh. Because I was being dumb ;). And I admittedly never ran a buildfarm
animal so far.

But the other part of the problem is hiding in the unfortunately removed
part of the problem description - the tests require the non-default
options wal_level=logical and max_logical_slots=3+.
Is there a problem of making those the default in the buildfarm created
config?

I guess there would need to be an alternative output file for wal_level
< logical? Afaics there is no way to make a test conditional?

I shortly thought something like
DO $$
BEGIN
IF current_setting('wal_level') != 'df' THEN
RAISE FATAL 'wal_level needs to be logical';
END IF;
END
$$;
could be used to avoid creating a huge second output file, but we can't
raise FATAL errors from plpgsql.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Andres Freund (#14)

Re: logical changeset generation v4

Andres Freund <andres@2ndquadrant.com> writes:

But the other part of the problem is hiding in the unfortunately removed
part of the problem description - the tests require the non-default
options wal_level=logical and max_logical_slots=3+.

Oh. Well, that's not going to work.

Is there a problem of making those the default in the buildfarm created
config?

Even if we hacked the buildfarm script to do so, it'd be a nonstarter
because it would cause ordinary manual "make installcheck" to fail.

I think the only reasonable way to handle this would be to (1) make
"make installcheck" a no-op in this contrib module, and (2) make
"make check" work, being careful to start the test postmaster with
the necessary options.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Tom Lane (#15)

Re: logical changeset generation v4

On 2013-01-15 11:10:22 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

But the other part of the problem is hiding in the unfortunately removed
part of the problem description - the tests require the non-default
options wal_level=logical and max_logical_slots=3+.

Oh. Well, that's not going to work.

An alternative would be to have max_logical_slots default to a low value
and make the amount of logged information a wal_level independent
GUC that can be changed on the fly.
ISTM that that would result in too complicated code to deal with other
backends not having the same notion of that value and such, but its
possible.

Is there a problem of making those the default in the buildfarm created
config?

Even if we hacked the buildfarm script to do so, it'd be a nonstarter
because it would cause ordinary manual "make installcheck" to fail.

I thought we could have a second expected file for that case. Not nice
:(

I think the only reasonable way to handle this would be to (1) make
"make installcheck" a no-op in this contrib module, and (2) make
"make check" work, being careful to start the test postmaster with
the necessary options.

Youre talking about adding a contrib-module specific make check or
changing the normal make check's wal_level?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Andres Freund (#16)

Re: logical changeset generation v4

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-01-15 11:10:22 -0500, Tom Lane wrote:

I think the only reasonable way to handle this would be to (1) make
"make installcheck" a no-op in this contrib module, and (2) make
"make check" work, being careful to start the test postmaster with
the necessary options.

Youre talking about adding a contrib-module specific make check or
changing the normal make check's wal_level?

This contrib module's "make check" would change the wal_level. Global
change no good for any number of reasons, the most obvious being that
we want to be able to test other wal_levels too.

I'm not sure whether the "make check" infrastructure currently supports
passing arguments through to the test postmaster's command line, but it
shouldn't be terribly hard to add if not.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Alvaro Herrera

alvherre@2ndquadrant.com

almost 13 years ago

In reply to: Andres Freund (#12)

Re: logical changeset generation v4

Andres Freund wrote:

On 2013-01-15 09:56:41 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2013-01-15 01:00:00 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

I've been giving a couple of these parts a look. In particular

[03] Split out xlog reading into its own module called xlogreader

Cleaned this one up a bit last week. I will polish it some more,
publish for some final comments, and commit.

I have some smaller bugfixes in my current version that you probably
don't have yet (on grounds of being fixed this weekend)... So we need to
be a bit careful not too loose those.

Sure. Do you have them as individual commits? I'm assuming you rebased
the tree. Maybe in your reflog? IIRC I also have at least one minor
bug fix.

I can check, which commit did you base your modifications on?

Dunno. It's probably easier to reverse-apply the version you submitted
to see what changed, and then forward-apply again.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Alvaro Herrera (#18)

2 attachment(s)

Re: logical changeset generation v4

On 2013-01-15 15:16:44 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2013-01-15 09:56:41 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2013-01-15 01:00:00 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

I've been giving a couple of these parts a look. In particular

[03] Split out xlog reading into its own module called xlogreader

Cleaned this one up a bit last week. I will polish it some more,
publish for some final comments, and commit.

I have some smaller bugfixes in my current version that you probably
don't have yet (on grounds of being fixed this weekend)... So we need to
be a bit careful not too loose those.

Sure. Do you have them as individual commits? I'm assuming you rebased
the tree. Maybe in your reflog? IIRC I also have at least one minor
bug fix.

I can check, which commit did you base your modifications on?

Dunno. It's probably easier to reverse-apply the version you submitted
to see what changed, and then forward-apply again.

There's at least the two attached patches...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-xlogreader-fix.patchtext/x-patch; charset=us-asciiDownload

>From 5ca4b81f03bd7a4bf5101bd68811548023ac12fe Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 14 Jan 2013 21:43:13 +0100
Subject: [PATCH] xlogreader: fix

---
 src/backend/access/transam/xlogreader.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 6a420e6..9439c05 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -350,7 +350,7 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 
 			/* Wait for the next page to become available */
 			readOff = ReadPageInternal(state, targetPagePtr,
-									   Min(len, XLOG_BLCKSZ));
+									   Min(total_len - gotlen + SizeOfXLogShortPHD, XLOG_BLCKSZ));
 
 			if (readOff < 0)
 				goto err;
@@ -383,6 +383,11 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 
 			/* Append the continuation from this page to the buffer */
 			pageHeaderSize = XLogPageHeaderSize(pageHeader);
+
+			if (readOff < pageHeaderSize)
+				readOff = ReadPageInternal(state, targetPagePtr,
+										   pageHeaderSize);
+
 			Assert(pageHeaderSize <= readOff);
 
 			contdata = (char *) state->readBuf + pageHeaderSize;
@@ -390,6 +395,10 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 			if (pageHeader->xlp_rem_len < len)
 				len = pageHeader->xlp_rem_len;
 
+			if (readOff < (pageHeaderSize + len))
+				readOff = ReadPageInternal(state, targetPagePtr,
+										   pageHeaderSize + len);
+
 			memcpy(buffer, (char *) contdata, len);
 			buffer += len;
 			gotlen += len;
-- 
1.7.12.289.g0ce9864.dirty

0001-xlogreader-use-correct-type.patchtext/x-patch; charset=us-asciiDownload

>From 995d723239df325b48412878fa818c94cb33f724 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 15 Jan 2013 00:58:49 +0100
Subject: [PATCH] xlogreader: use correct type

---
 src/backend/access/transam/xlogreader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 9439c05..f2b9355 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -927,7 +927,7 @@ XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr)
    uint32      pageHeaderSize;
    XLogPageHeader header;
    XLogRecord *record;
-   uint32 readLen;
+   int readLen;
    char       *errormsg;
 
    if (RecPtr == InvalidXLogRecPtr)
-- 
1.7.12.289.g0ce9864.dirty

#20

Alvaro Herrera

alvherre@2ndquadrant.com

almost 13 years ago

In reply to: Andres Freund (#1)

1 attachment(s)

Re: logical changeset generation v4

Andres Freund wrote:

[09] Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader

For timetravel access to the catalog we need to be able to lookup (cmin,
cmax) pairs of catalog rows when were 'inside' that TX. This patch just
adapts the signature of the *Satisfies routines to expect a HeapTuple
instead of a HeapTupleHeader. The amount of changes for that is fairly
low as the HeapTupleSatisfiesVisibility macro already expected the
former.

It also makes sure the HeapTuple fields are setup in the few places that
didn't already do so.

I had a look at this part. Running the regression tests unveiled a case
where the tableOid wasn't being set (and thus caused an assertion to
fail), so I added that. I also noticed that the additions to
pruneheap.c are sometimes filling a tuple before it's strictly
necessary, leading to wasted work. Moved those too.

Looks good to me as attached.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

heaptuple-satisfies.patchtext/x-diff; charset=us-asciiDownload

*** a/contrib/pgrowlocks/pgrowlocks.c
--- b/contrib/pgrowlocks/pgrowlocks.c
***************
*** 120,126 **** pgrowlocks(PG_FUNCTION_ARGS)
  		/* must hold a buffer lock to call HeapTupleSatisfiesUpdate */
  		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
  
! 		if (HeapTupleSatisfiesUpdate(tuple->t_data,
  									 GetCurrentCommandId(false),
  									 scan->rs_cbuf) == HeapTupleBeingUpdated)
  		{
--- 120,126 ----
  		/* must hold a buffer lock to call HeapTupleSatisfiesUpdate */
  		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
  
! 		if (HeapTupleSatisfiesUpdate(tuple,
  									 GetCurrentCommandId(false),
  									 scan->rs_cbuf) == HeapTupleBeingUpdated)
  		{
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***************
*** 291,296 **** heapgetpage(HeapScanDesc scan, BlockNumber page)
--- 291,297 ----
  
  			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
  			loctup.t_len = ItemIdGetLength(lpp);
+ 			loctup.t_tableOid = RelationGetRelid(scan->rs_rd);
  			ItemPointerSet(&(loctup.t_self), page, lineoff);
  
  			if (all_visible)
***************
*** 1603,1609 **** heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
  
  		heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
  		heapTuple->t_len = ItemIdGetLength(lp);
! 		heapTuple->t_tableOid = relation->rd_id;
  		heapTuple->t_self = *tid;
  
  		/*
--- 1604,1610 ----
  
  		heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
  		heapTuple->t_len = ItemIdGetLength(lp);
! 		heapTuple->t_tableOid = RelationGetRelid(relation);
  		heapTuple->t_self = *tid;
  
  		/*
***************
*** 1651,1657 **** heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
  		 * transactions.
  		 */
  		if (all_dead && *all_dead &&
! 			!HeapTupleIsSurelyDead(heapTuple->t_data, RecentGlobalXmin))
  			*all_dead = false;
  
  		/*
--- 1652,1658 ----
  		 * transactions.
  		 */
  		if (all_dead && *all_dead &&
! 			!HeapTupleIsSurelyDead(heapTuple, RecentGlobalXmin))
  			*all_dead = false;
  
  		/*
***************
*** 1781,1786 **** heap_get_latest_tid(Relation relation,
--- 1782,1788 ----
  		tp.t_self = ctid;
  		tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
  		tp.t_len = ItemIdGetLength(lp);
+ 		tp.t_tableOid = RelationGetRelid(relation);
  
  		/*
  		 * After following a t_ctid link, we might arrive at an unrelated
***************
*** 2447,2458 **** heap_delete(Relation relation, ItemPointer tid,
  	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
  	Assert(ItemIdIsNormal(lp));
  
  	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
  	tp.t_len = ItemIdGetLength(lp);
  	tp.t_self = *tid;
  
  l1:
! 	result = HeapTupleSatisfiesUpdate(tp.t_data, cid, buffer);
  
  	if (result == HeapTupleInvisible)
  	{
--- 2449,2461 ----
  	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
  	Assert(ItemIdIsNormal(lp));
  
+ 	tp.t_tableOid = RelationGetRelid(relation);
  	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
  	tp.t_len = ItemIdGetLength(lp);
  	tp.t_self = *tid;
  
  l1:
! 	result = HeapTupleSatisfiesUpdate(&tp, cid, buffer);
  
  	if (result == HeapTupleInvisible)
  	{
***************
*** 2817,2822 **** heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
--- 2820,2826 ----
  	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
  	Assert(ItemIdIsNormal(lp));
  
+ 	oldtup.t_tableOid = RelationGetRelid(relation);
  	oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
  	oldtup.t_len = ItemIdGetLength(lp);
  	oldtup.t_self = *otid;
***************
*** 2829,2835 **** heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
  	 */
  
  l2:
! 	result = HeapTupleSatisfiesUpdate(oldtup.t_data, cid, buffer);
  
  	if (result == HeapTupleInvisible)
  	{
--- 2833,2839 ----
  	 */
  
  l2:
! 	result = HeapTupleSatisfiesUpdate(&oldtup, cid, buffer);
  
  	if (result == HeapTupleInvisible)
  	{
***************
*** 3531,3537 **** heap_lock_tuple(Relation relation, HeapTuple tuple,
  	tuple->t_tableOid = RelationGetRelid(relation);
  
  l3:
! 	result = HeapTupleSatisfiesUpdate(tuple->t_data, cid, *buffer);
  
  	if (result == HeapTupleInvisible)
  	{
--- 3535,3541 ----
  	tuple->t_tableOid = RelationGetRelid(relation);
  
  l3:
! 	result = HeapTupleSatisfiesUpdate(tuple, cid, *buffer);
  
  	if (result == HeapTupleInvisible)
  	{
*** a/src/backend/access/heap/pruneheap.c
--- b/src/backend/access/heap/pruneheap.c
***************
*** 340,345 **** heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
--- 340,348 ----
  	OffsetNumber chainitems[MaxHeapTuplesPerPage];
  	int			nchain = 0,
  				i;
+ 	HeapTupleData tup;
+ 
+ 	tup.t_tableOid = RelationGetRelid(relation);
  
  	rootlp = PageGetItemId(dp, rootoffnum);
  
***************
*** 349,356 **** heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
--- 352,365 ----
  	if (ItemIdIsNormal(rootlp))
  	{
  		htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
+ 
  		if (HeapTupleHeaderIsHeapOnly(htup))
  		{
+ 			/* fill in the rest of the tuple */
+ 			tup.t_data = htup;
+ 			tup.t_len = ItemIdGetLength(rootlp);
+ 			ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), rootoffnum);
+ 
  			/*
  			 * If the tuple is DEAD and doesn't chain to anything else, mark
  			 * it unused immediately.  (If it does chain, we can only remove
***************
*** 369,375 **** heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
  			 * either here or while following a chain below.  Whichever path
  			 * gets there first will mark the tuple unused.
  			 */
! 			if (HeapTupleSatisfiesVacuum(htup, OldestXmin, buffer)
  				== HEAPTUPLE_DEAD && !HeapTupleHeaderIsHotUpdated(htup))
  			{
  				heap_prune_record_unused(prstate, rootoffnum);
--- 378,384 ----
  			 * either here or while following a chain below.  Whichever path
  			 * gets there first will mark the tuple unused.
  			 */
! 			if (HeapTupleSatisfiesVacuum(&tup, OldestXmin, buffer)
  				== HEAPTUPLE_DEAD && !HeapTupleHeaderIsHotUpdated(htup))
  			{
  				heap_prune_record_unused(prstate, rootoffnum);
***************
*** 448,455 **** heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
  		 * Check tuple's visibility status.
  		 */
  		tupdead = recent_dead = false;
  
! 		switch (HeapTupleSatisfiesVacuum(htup, OldestXmin, buffer))
  		{
  			case HEAPTUPLE_DEAD:
  				tupdead = true;
--- 457,467 ----
  		 * Check tuple's visibility status.
  		 */
  		tupdead = recent_dead = false;
+ 		tup.t_data = htup;
+ 		tup.t_len = ItemIdGetLength(lp);
+ 		ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), offnum);
  
! 		switch (HeapTupleSatisfiesVacuum(&tup, OldestXmin, buffer))
  		{
  			case HEAPTUPLE_DEAD:
  				tupdead = true;
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 2269,2275 **** IndexBuildHeapScan(Relation heapRelation,
  			 */
  			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
  
! 			switch (HeapTupleSatisfiesVacuum(heapTuple->t_data, OldestXmin,
  											 scan->rs_cbuf))
  			{
  				case HEAPTUPLE_DEAD:
--- 2269,2275 ----
  			 */
  			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
  
! 			switch (HeapTupleSatisfiesVacuum(heapTuple, OldestXmin,
  											 scan->rs_cbuf))
  			{
  				case HEAPTUPLE_DEAD:
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***************
*** 1134,1143 **** acquire_sample_rows(Relation onerel, int elevel,
  
  			ItemPointerSet(&targtuple.t_self, targblock, targoffset);
  
  			targtuple.t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
  			targtuple.t_len = ItemIdGetLength(itemid);
  
! 			switch (HeapTupleSatisfiesVacuum(targtuple.t_data,
  											 OldestXmin,
  											 targbuffer))
  			{
--- 1134,1144 ----
  
  			ItemPointerSet(&targtuple.t_self, targblock, targoffset);
  
+ 			targtuple.t_tableOid = RelationGetRelid(onerel);
  			targtuple.t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
  			targtuple.t_len = ItemIdGetLength(itemid);
  
! 			switch (HeapTupleSatisfiesVacuum(&targtuple,
  											 OldestXmin,
  											 targbuffer))
  			{
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 931,937 **** copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
  
  		LockBuffer(buf, BUFFER_LOCK_SHARE);
  
! 		switch (HeapTupleSatisfiesVacuum(tuple->t_data, OldestXmin, buf))
  		{
  			case HEAPTUPLE_DEAD:
  				/* Definitely dead */
--- 931,937 ----
  
  		LockBuffer(buf, BUFFER_LOCK_SHARE);
  
! 		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
  		{
  			case HEAPTUPLE_DEAD:
  				/* Definitely dead */
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***************
*** 727,738 **** lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
  
  			Assert(ItemIdIsNormal(itemid));
  
  			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
  			tuple.t_len = ItemIdGetLength(itemid);
  
  			tupgone = false;
  
! 			switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
  			{
  				case HEAPTUPLE_DEAD:
  
--- 727,739 ----
  
  			Assert(ItemIdIsNormal(itemid));
  
+ 			tuple.t_tableOid = RelationGetRelid(onerel);
  			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
  			tuple.t_len = ItemIdGetLength(itemid);
  
  			tupgone = false;
  
! 			switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
  			{
  				case HEAPTUPLE_DEAD:
  
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
***************
*** 3894,3900 **** CheckForSerializableConflictOut(bool visible, Relation relation,
  	 * tuple is visible to us, while HeapTupleSatisfiesVacuum checks what else
  	 * is going on with it.
  	 */
! 	htsvResult = HeapTupleSatisfiesVacuum(tuple->t_data, TransactionXmin, buffer);
  	switch (htsvResult)
  	{
  		case HEAPTUPLE_LIVE:
--- 3894,3900 ----
  	 * tuple is visible to us, while HeapTupleSatisfiesVacuum checks what else
  	 * is going on with it.
  	 */
! 	htsvResult = HeapTupleSatisfiesVacuum(tuple, TransactionXmin, buffer);
  	switch (htsvResult)
  	{
  		case HEAPTUPLE_LIVE:
*** a/src/backend/utils/time/tqual.c
--- b/src/backend/utils/time/tqual.c
***************
*** 163,170 **** HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
   *			 Xmax is not committed)))			that has not been committed
   */
  bool
! HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  {
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
--- 163,174 ----
   *			 Xmax is not committed)))			that has not been committed
   */
  bool
! HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
  {
+ 	HeapTupleHeader tuple = htup->t_data;
+ 	Assert(ItemPointerIsValid(&htup->t_self));
+ 	Assert(htup->t_tableOid != InvalidOid);
+ 
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
***************
*** 326,333 **** HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
   *
   */
  bool
! HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  {
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
--- 330,341 ----
   *
   */
  bool
! HeapTupleSatisfiesNow(HeapTuple htup, Snapshot snapshot, Buffer buffer)
  {
+ 	HeapTupleHeader tuple = htup->t_data;
+ 	Assert(ItemPointerIsValid(&htup->t_self));
+ 	Assert(htup->t_tableOid != InvalidOid);
+ 
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
***************
*** 471,477 **** HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
   *		Dummy "satisfies" routine: any tuple satisfies SnapshotAny.
   */
  bool
! HeapTupleSatisfiesAny(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
  {
  	return true;
  }
--- 479,485 ----
   *		Dummy "satisfies" routine: any tuple satisfies SnapshotAny.
   */
  bool
! HeapTupleSatisfiesAny(HeapTuple htup, Snapshot snapshot, Buffer buffer)
  {
  	return true;
  }
***************
*** 491,499 **** HeapTupleSatisfiesAny(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
   * table.
   */
  bool
! HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
  						Buffer buffer)
  {
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
--- 499,511 ----
   * table.
   */
  bool
! HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
  						Buffer buffer)
  {
+ 	HeapTupleHeader tuple = htup->t_data;
+ 	Assert(ItemPointerIsValid(&htup->t_self));
+ 	Assert(htup->t_tableOid != InvalidOid);
+ 
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
***************
*** 572,580 **** HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
   *	distinguish that case must test for it themselves.)
   */
  HTSU_Result
! HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
  						 Buffer buffer)
  {
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
--- 584,596 ----
   *	distinguish that case must test for it themselves.)
   */
  HTSU_Result
! HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
  						 Buffer buffer)
  {
+ 	HeapTupleHeader tuple = htup->t_data;
+ 	Assert(ItemPointerIsValid(&htup->t_self));
+ 	Assert(htup->t_tableOid != InvalidOid);
+ 
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
***************
*** 739,747 **** HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
   * for snapshot->xmax and the tuple's xmax.
   */
  bool
! HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
  						Buffer buffer)
  {
  	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
  
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
--- 755,767 ----
   * for snapshot->xmax and the tuple's xmax.
   */
  bool
! HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
  						Buffer buffer)
  {
+ 	HeapTupleHeader tuple = htup->t_data;
+ 	Assert(ItemPointerIsValid(&htup->t_self));
+ 	Assert(htup->t_tableOid != InvalidOid);
+ 
  	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
  
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
***************
*** 902,910 **** HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
   * can't see it.)
   */
  bool
! HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
  					   Buffer buffer)
  {
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
--- 922,935 ----
   * can't see it.)
   */
  bool
! HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
  					   Buffer buffer)
  {
+ 	HeapTupleHeader tuple = htup->t_data;
+ 
+ 	Assert(ItemPointerIsValid(&htup->t_self));
+ 	Assert(htup->t_tableOid != InvalidOid);
+ 
  	if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
  	{
  		if (tuple->t_infomask & HEAP_XMIN_INVALID)
***************
*** 1058,1066 **** HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
   * even if we see that the deleting transaction has committed.
   */
  HTSV_Result
! HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
  						 Buffer buffer)
  {
  	/*
  	 * Has inserting transaction committed?
  	 *
--- 1083,1095 ----
   * even if we see that the deleting transaction has committed.
   */
  HTSV_Result
! HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
  						 Buffer buffer)
  {
+ 	HeapTupleHeader tuple = htup->t_data;
+ 	Assert(ItemPointerIsValid(&htup->t_self));
+ 	Assert(htup->t_tableOid != InvalidOid);
+ 
  	/*
  	 * Has inserting transaction committed?
  	 *
***************
*** 1233,1240 **** HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
   *	just whether or not the tuple is surely dead).
   */
  bool
! HeapTupleIsSurelyDead(HeapTupleHeader tuple, TransactionId OldestXmin)
  {
  	/*
  	 * If the inserting transaction is marked invalid, then it aborted, and
  	 * the tuple is definitely dead.  If it's marked neither committed nor
--- 1262,1273 ----
   *	just whether or not the tuple is surely dead).
   */
  bool
! HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
  {
+ 	HeapTupleHeader tuple = htup->t_data;
+ 	Assert(ItemPointerIsValid(&htup->t_self));
+ 	Assert(htup->t_tableOid != InvalidOid);
+ 
  	/*
  	 * If the inserting transaction is marked invalid, then it aborted, and
  	 * the tuple is definitely dead.  If it's marked neither committed nor
*** a/src/include/utils/snapshot.h
--- b/src/include/utils/snapshot.h
***************
*** 27,34 **** typedef struct SnapshotData *Snapshot;
   * The specific semantics of a snapshot are encoded by the "satisfies"
   * function.
   */
! typedef bool (*SnapshotSatisfiesFunc) (HeapTupleHeader tuple,
! 										   Snapshot snapshot, Buffer buffer);
  
  typedef struct SnapshotData
  {
--- 27,34 ----
   * The specific semantics of a snapshot are encoded by the "satisfies"
   * function.
   */
! typedef bool (*SnapshotSatisfiesFunc) (HeapTuple htup,
! 									   Snapshot snapshot, Buffer buffer);
  
  typedef struct SnapshotData
  {
*** a/src/include/utils/tqual.h
--- b/src/include/utils/tqual.h
***************
*** 52,58 **** extern PGDLLIMPORT SnapshotData SnapshotToastData;
   *	if so, the indicated buffer is marked dirty.
   */
  #define HeapTupleSatisfiesVisibility(tuple, snapshot, buffer) \
! 	((*(snapshot)->satisfies) ((tuple)->t_data, snapshot, buffer))
  
  /* Result codes for HeapTupleSatisfiesVacuum */
  typedef enum
--- 52,58 ----
   *	if so, the indicated buffer is marked dirty.
   */
  #define HeapTupleSatisfiesVisibility(tuple, snapshot, buffer) \
! 	((*(snapshot)->satisfies) (tuple, snapshot, buffer))
  
  /* Result codes for HeapTupleSatisfiesVacuum */
  typedef enum
***************
*** 65,89 **** typedef enum
  } HTSV_Result;
  
  /* These are the "satisfies" test routines for the various snapshot types */
! extern bool HeapTupleSatisfiesMVCC(HeapTupleHeader tuple,
  					   Snapshot snapshot, Buffer buffer);
! extern bool HeapTupleSatisfiesNow(HeapTupleHeader tuple,
  					  Snapshot snapshot, Buffer buffer);
! extern bool HeapTupleSatisfiesSelf(HeapTupleHeader tuple,
  					   Snapshot snapshot, Buffer buffer);
! extern bool HeapTupleSatisfiesAny(HeapTupleHeader tuple,
  					  Snapshot snapshot, Buffer buffer);
! extern bool HeapTupleSatisfiesToast(HeapTupleHeader tuple,
  						Snapshot snapshot, Buffer buffer);
! extern bool HeapTupleSatisfiesDirty(HeapTupleHeader tuple,
  						Snapshot snapshot, Buffer buffer);
  
  /* Special "satisfies" routines with different APIs */
! extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTupleHeader tuple,
  						 CommandId curcid, Buffer buffer);
! extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple,
  						 TransactionId OldestXmin, Buffer buffer);
! extern bool HeapTupleIsSurelyDead(HeapTupleHeader tuple,
  					  TransactionId OldestXmin);
  
  extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--- 65,89 ----
  } HTSV_Result;
  
  /* These are the "satisfies" test routines for the various snapshot types */
! extern bool HeapTupleSatisfiesMVCC(HeapTuple htup,
  					   Snapshot snapshot, Buffer buffer);
! extern bool HeapTupleSatisfiesNow(HeapTuple htup,
  					  Snapshot snapshot, Buffer buffer);
! extern bool HeapTupleSatisfiesSelf(HeapTuple htup,
  					   Snapshot snapshot, Buffer buffer);
! extern bool HeapTupleSatisfiesAny(HeapTuple htup,
  					  Snapshot snapshot, Buffer buffer);
! extern bool HeapTupleSatisfiesToast(HeapTuple htup,
  						Snapshot snapshot, Buffer buffer);
! extern bool HeapTupleSatisfiesDirty(HeapTuple htup,
  						Snapshot snapshot, Buffer buffer);
  
  /* Special "satisfies" routines with different APIs */
! extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple htup,
  						 CommandId curcid, Buffer buffer);
! extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup,
  						 TransactionId OldestXmin, Buffer buffer);
! extern bool HeapTupleIsSurelyDead(HeapTuple htup,
  					  TransactionId OldestXmin);
  
  extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,

#21

Alvaro Herrera

alvherre@2ndquadrant.com

almost 13 years ago

In reply to: Alvaro Herrera (#20)

Re: logical changeset generation v4

Alvaro Herrera wrote:

I had a look at this part. Running the regression tests unveiled a case
where the tableOid wasn't being set (and thus caused an assertion to
fail), so I added that. I also noticed that the additions to
pruneheap.c are sometimes filling a tuple before it's strictly
necessary, leading to wasted work. Moved those too.

Actually I missed that downthread there are some fixes to this part; I
had fixed one of these independently, but there's one I missed. Added
that one too now (not attaching a new version).

(Also, it seems pointless to commit this unless we know for sure that
the downstream change that requires it is good; so I'm holding commit
until we've discussed the other stuff more thoroughly. Per discussion
with Andres.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Robert Haas

robertmhaas@gmail.com

almost 13 years ago

In reply to: Alvaro Herrera (#20)

Re: logical changeset generation v4

On Fri, Jan 18, 2013 at 11:33 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Andres Freund wrote:

[09] Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader

For timetravel access to the catalog we need to be able to lookup (cmin,
cmax) pairs of catalog rows when were 'inside' that TX. This patch just
adapts the signature of the *Satisfies routines to expect a HeapTuple
instead of a HeapTupleHeader. The amount of changes for that is fairly
low as the HeapTupleSatisfiesVisibility macro already expected the
former.

It also makes sure the HeapTuple fields are setup in the few places that
didn't already do so.

I had a look at this part. Running the regression tests unveiled a case
where the tableOid wasn't being set (and thus caused an assertion to
fail), so I added that. I also noticed that the additions to
pruneheap.c are sometimes filling a tuple before it's strictly
necessary, leading to wasted work. Moved those too.

Looks good to me as attached.

I took a quick look at this and am just curious why we're adding the
requirement that t_tableOid has to be initialized?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Robert Haas (#22)

Re: logical changeset generation v4

On 2013-01-18 11:48:43 -0500, Robert Haas wrote:

On Fri, Jan 18, 2013 at 11:33 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Andres Freund wrote:

[09] Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader

For timetravel access to the catalog we need to be able to lookup (cmin,
cmax) pairs of catalog rows when were 'inside' that TX. This patch just
adapts the signature of the *Satisfies routines to expect a HeapTuple
instead of a HeapTupleHeader. The amount of changes for that is fairly
low as the HeapTupleSatisfiesVisibility macro already expected the
former.

It also makes sure the HeapTuple fields are setup in the few places that
didn't already do so.

I had a look at this part. Running the regression tests unveiled a case
where the tableOid wasn't being set (and thus caused an assertion to
fail), so I added that. I also noticed that the additions to
pruneheap.c are sometimes filling a tuple before it's strictly
necessary, leading to wasted work. Moved those too.

Looks good to me as attached.

I took a quick look at this and am just curious why we're adding the
requirement that t_tableOid has to be initialized?

Its a stepping stone for catalog timetravel. I separated it into a different
patch because it seems to make the real patch easier to review without having
to deal with all those unrelated hunks.

The reason why we need t_tableOid and a valid ItemPointer is that during
catalog timetravel (so we can decode the heaptuples in WAL) we need to
see tuples in the catalog that have been changed in the transaction we
travelled to. That means we need to lookup cmin/cmax values which aren't
stored separately anymore.

My first approach was to build support for logging allocated combocids
(only for catalog tables) and use the existing combocid infrastructure
to look them up.
Turns out thats not a correct solution, consider this:
* T100: INSERT (xmin: 100, xmax: Invalid, (cmin|cmax): 3)
* T101: UPDATE (xmin: 100, xmax: 101, (cmin|cmax): 10)

If you know travel to T100 and you want to decide whether that tuple is
visible when in CommandId = 5 you have the problem that the original
cmin value has been overwritten by the cmax from T101. Note that in this
scenario no ComboCids have been generated!
The problematic part is that the information about what happened is
only available in T101.

I took resolve to doing something similar to what the heap rewrite code
uses to track update chains. Everytime a catalog tuple
inserted/updated/deleted (filenode, ctid, cmin, cmax) is wal logged (if
wal_level=logical) and while traveling to a transaction all those are
put up in a hash table so they can get looked up if we need the
respective cmin/cmax values. As we do that for all modifications of
catalog tuples in that transaction we only ever need that mapping when
inspecting that specific transaction.

Seems to work very nicely, I have made quite some tests with it and I
know of no failure cases.

To be able to make that lookup we need to get the relfilenode & item
pointer of the tuple were just looking up. Thats why I changed the
signature to pass a HeapTuple instead of a HeapTupleHeader. We get the
relfilenode from the buffer that has been passed *not* from the passed
table oid.
So requiring a valid table oid isn't strictly required as long as the
item pointer is valid, but it has made debugging noticeably easier.

Makes sense?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Robert Haas (#22)

Re: logical changeset generation v4

Robert Haas <robertmhaas@gmail.com> writes:

I took a quick look at this and am just curious why we're adding the
requirement that t_tableOid has to be initialized?

I assume he meant it had been left at a random value, which is surely
bad practice even if a specific usage doesn't fall over today.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Steve Singer

steve@ssinger.info

almost 13 years ago

In reply to: Andres Freund (#1)

Re: logical changeset generation v4

On 13-01-14 08:38 PM, Andres Freund wrote:

Hi everyone,

Here is the newest version of logical changeset generation.

2) Currently the logical replication infrastructure assigns a 'slot-id'
when a new replica is setup. That slot id isn't really nice
(e.g. "id-321578-3"). It also requires that [18] keeps state in a global
variable to make writing regression tests easy.

I think it would be better to make the user specify those replication
slot ids, but I am not really sure about it.

Shortly after trying out the latest version I hit the following scenario
1. I started pg_receivellog but mistyped the name of my plugin
2. It looped and used up all of my logical replication slots

I killed pg_receivellog and restarted it with the correct plugin name
but it won't do anything because I have no free slots. I can't free the
slots with -F because I have no clue what the names of the slots are.
I can figure the names out by looking in pg_llog but if my replication
program can't do that so it won't be able to clean up from a failed attempt.

I agree with you that we should make the user program specify a slot, we
eventually might want to provide a view that shows the currently
allocated slots. For a logical based slony I would just generate the
slot name based on the remote node id. If walsender generates the slot
name then I would need to store a mapping between slot names and slons
so when a slon restarted it would know which slot to resume using. I'd
have to use a table in the slony schema on the remote database for
this. There would always be a risk of losing track of a slot id if the
slon crashed after getting the slot number but before committing the
mapping on the remote database.

3) Currently no options can be passed to an output plugin. I am thinking
about making "INIT_LOGICAL_REPLICATION 'plugin'" accept the now widely
used ('option' ['value'], ...) syntax and pass that to the output
plugin's initialization function.

I think we discussed this last CF, I like this idea.

4) Does anybody object to:
-- allocate a permanent replication slot
INIT_LOGICAL_REPLICATION 'plugin' 'slotname' (options);

-- stream data
START_LOGICAL_REPLICATION 'slotname' 'recptr';

-- deallocate a permanent replication slot
FREE_LOGICAL_REPLICATION 'slotname';

5) Currently its only allowed to access catalog tables, its fairly
trivial to extend this to additional tables if you can accept some
(noticeable but not too big) overhead for modifications on those tables.

I was thinking of making that an option for tables, that would be useful
for replication solutions configuration tables.

I think this will make the life of anyone developing a new replication
system easier. Slony has a lot of infrastructure for allowing slonik
scripts to wait for configuration changes to popogate everywhere before
making other configuration changes because you can get race conditions.
If I were designing a new replication system and I had this feature then
I would try to use it to come up with a simpler model of propagating
configuration changes.

Andres Freund

Steve

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Robert Haas

robertmhaas@gmail.com

almost 13 years ago

In reply to: Andres Freund (#23)

Re: logical changeset generation v4

On Fri, Jan 18, 2013 at 12:32 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Makes sense?

Yes. The catalog timetravel stuff still gives me heartburn. The idea
of treating system catalogs in a special way has never sat well with
me and still doesn't - not that I am sure what I'd like better. The
complexity of the whole system is also somewhat daunting.

But my question with relation to this specific patch was mostly
whether setting the table OID everywhere was worth worrying about from
a performance standpoint, or whether any of the other adjustments this
patch makes could have negative consequences there, since the
Satisfies functions can get very hot on some workloads. It seems like
the consensus is "no, that's not worth worrying about", at least as
far as the table OIDs are concerned.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Robert Haas (#26)

Re: logical changeset generation v4

On 2013-01-20 21:45:11 -0500, Robert Haas wrote:

On Fri, Jan 18, 2013 at 12:32 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Makes sense?

Yes. The catalog timetravel stuff still gives me heartburn. The idea
of treating system catalogs in a special way has never sat well with
me and still doesn't - not that I am sure what I'd like better. The
complexity of the whole system is also somewhat daunting.

Understandable :(

Althoutg it seems to me most parts of it have already been someplace
else in the pg code, and the actual timetravel code is relatively small.

But my question with relation to this specific patch was mostly
whether setting the table OID everywhere was worth worrying about from
a performance standpoint, or whether any of the other adjustments this
patch makes could have negative consequences there, since the
Satisfies functions can get very hot on some workloads. It seems like
the consensus is "no, that's not worth worrying about", at least as
far as the table OIDs are concerned.

I agree, I don't really see any such potential of that here, the
effectively changed amount of code is very minor since the interface
mostly stayed the same due to the HeapTupleSatisfiesVisibility wrapper.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Steve Singer (#25)

Re: logical changeset generation v4

Hi,

I pushed a new rebased version (the xlogreader commit made it annoying
to merge).

The main improvements are
* way much coherent code internally for intializing logical rep
* explicit control over slots
* options for logical replication

On 2013-01-19 23:42:02 -0500, Steve Singer wrote:

On 13-01-14 08:38 PM, Andres Freund wrote:

2) Currently the logical replication infrastructure assigns a 'slot-id'
when a new replica is setup. That slot id isn't really nice
(e.g. "id-321578-3"). It also requires that [18] keeps state in a global
variable to make writing regression tests easy.

I think it would be better to make the user specify those replication
slot ids, but I am not really sure about it.

Shortly after trying out the latest version I hit the following scenario
1. I started pg_receivellog but mistyped the name of my plugin
2. It looped and used up all of my logical replication slots

I killed pg_receivellog and restarted it with the correct plugin name but it
won't do anything because I have no free slots. I can't free the slots with
-F because I have no clue what the names of the slots are. I can figure
the names out by looking in pg_llog but if my replication program can't do
that so it won't be able to clean up from a failed attempt.

I agree with you that we should make the user program specify a slot, we
eventually might want to provide a view that shows the currently allocated
slots. For a logical based slony I would just generate the slot name based
on the remote node id. If walsender generates the slot name then I would
need to store a mapping between slot names and slons so when a slon
restarted it would know which slot to resume using. I'd have to use a
table in the slony schema on the remote database for this. There would
always be a risk of losing track of a slot id if the slon crashed after
getting the slot number but before committing the mapping on the remote
database.

This is changed now, slotnames need to be provided and there also is a
pg_stat_logical_replication view (thanks Abhijit!).

3) Currently no options can be passed to an output plugin. I am thinking
about making "INIT_LOGICAL_REPLICATION 'plugin'" accept the now widely
used ('option' ['value'], ...) syntax and pass that to the output
plugin's initialization function.

I think we discussed this last CF, I like this idea.

Added to the extension and walsender interface. Its used in the few
tests we have to specify that xids should not be included in the tests
for reproducability, so its even tested ;)

I haven't added code for setting up options via pg_receivellog yet.

5) Currently its only allowed to access catalog tables, its fairly
trivial to extend this to additional tables if you can accept some
(noticeable but not too big) overhead for modifications on those tables.

I was thinking of making that an option for tables, that would be useful
for replication solutions configuration tables.

I think this will make the life of anyone developing a new replication
system easier. Slony has a lot of infrastructure for allowing slonik
scripts to wait for configuration changes to popogate everywhere before
making other configuration changes because you can get race conditions. If
I were designing a new replication system and I had this feature then I
would try to use it to come up with a simpler model of propagating
configuration changes.

Working on it now.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Steve Singer (#25)

1 attachment(s)

Re: logical changeset generation v4

On 2013-01-19 23:42:02 -0500, Steve Singer wrote:

5) Currently its only allowed to access catalog tables, its fairly
trivial to extend this to additional tables if you can accept some
(noticeable but not too big) overhead for modifications on those tables.

I was thinking of making that an option for tables, that would be useful
for replication solutions configuration tables.

I think this will make the life of anyone developing a new replication
system easier. Slony has a lot of infrastructure for allowing slonik
scripts to wait for configuration changes to popogate everywhere before
making other configuration changes because you can get race conditions. If
I were designing a new replication system and I had this feature then I
would try to use it to come up with a simpler model of propagating
configuration changes.

I pushed support for this, turned out to be a rather moderate patch (after a
cleanup patch that was required anyway):

src/backend/access/common/reloptions.c | 10 ++++++++++
src/backend/utils/cache/relcache.c | 9 ++++++++-
src/include/utils/rel.h | 9 +++++++++
3 files changed, 27 insertions(+), 1 deletion(-)

With the (attached for convenience) patch applied you can do
# ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);

to enable this.
What I wonder about is:
* does anybody have a better name for the reloption?
* Currently this can be set mid-transaction but it will only provide access for
in the next transaction but doesn't error out when setting it
mid-transaction. I personally find that acceptable, other opinions?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-wal_decoding-mergme-Support-declaring-normal-tables-.patchtext/x-patch; charset=us-asciiDownload

>From b535ba12fad667725247281c43be2ef81f7e40d7 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 23 Jan 2013 13:02:34 +0100
Subject: [PATCH] wal_decoding: mergme: Support declaring normal tables as
 timetraveleable

This is useful to be able to access tables used for replication metadata inside
an output plugin.

The storage option 'treat_as_catalog_table' is used for that purpose, so it can
be enabled for a table with
ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);

It is currently possible to change that option mid-transaction although
timetravel access will only be possible in the next transaction.
---
 src/backend/access/common/reloptions.c |   10 ++++++++++
 src/backend/utils/cache/relcache.c     |    9 ++++++++-
 src/include/utils/rel.h                |    9 +++++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 456d746..f2d3c8b 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -62,6 +62,14 @@ static relopt_bool boolRelOpts[] =
 	},
 	{
 		{
+			"treat_as_catalog_table",
+			"Treat table as a catalog table for the purpose of logical replication",
+			RELOPT_KIND_HEAP
+		},
+		false
+	},
+	{
+		{
 			"fastupdate",
 			"Enables \"fast update\" feature for this GIN index",
 			RELOPT_KIND_GIN
@@ -1151,6 +1159,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_scale_factor)},
 		{"security_barrier", RELOPT_TYPE_BOOL,
 		offsetof(StdRdOptions, security_barrier)},
+		{"treat_as_catalog_table", RELOPT_TYPE_BOOL,
+		 offsetof(StdRdOptions, treat_as_catalog_table)},
 	};
 
 	options = parseRelOptions(reloptions, validate, kind, &numoptions);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 369a4d1..cc42ff4 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4718,12 +4718,19 @@ RelationIsDoingTimetravelInternal(Relation relation)
 	Assert(wal_level >= WAL_LEVEL_LOGICAL);
 
 	/*
-	 * XXX: Doing this test instead of using IsSystemNamespace has the
+	 * XXX: Doing this test instead of using IsSystemNamespace has the frak
 	 * advantage of classifying toast tables correctly.
 	 */
 	if (RelationGetRelid(relation) < FirstNormalObjectId)
 		return true;
 
+	/*
+	 * also log relevant data if we want the table to behave as a catalog
+	 * table, although its not a system provided one.
+	 */
+	if (RelationIsTreatedAsCatalogTable(relation))
+	    return true;
+
 	return false;
 }
 
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index e07ef3f..a026612 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -219,6 +219,7 @@ typedef struct StdRdOptions
 	int			fillfactor;		/* page fill factor in percent (0..100) */
 	AutoVacOpts autovacuum;		/* autovacuum-related options */
 	bool		security_barrier;		/* for views */
+	bool        treat_as_catalog_table; /* treat as timetraveleable table */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10
@@ -255,6 +256,14 @@ typedef struct StdRdOptions
 	 ((StdRdOptions *) (relation)->rd_options)->security_barrier : false)
 
 /*
+ * RelationIsTreatedAsCatalogTable
+ *		Returns whether the relation is security view, or not
+ */
+#define RelationIsTreatedAsCatalogTable(relation)	\
+	((relation)->rd_options ?				\
+	 ((StdRdOptions *) (relation)->rd_options)->treat_as_catalog_table : false)
+
+/*
  * RelationIsValid
  *		True iff relation descriptor is valid.
  */
-- 
1.7.10.4

#30

Robert Haas

robertmhaas@gmail.com

almost 13 years ago

In reply to: Andres Freund (#29)

Re: logical changeset generation v4

On Wed, Jan 23, 2013 at 7:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:

With the (attached for convenience) patch applied you can do
# ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);

to enable this.
What I wonder about is:
* does anybody have a better name for the reloption?

IMHO, it should somehow involve the words "logical" and "replication".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Robert Haas (#30)

Re: logical changeset generation v4

On 2013-01-23 10:18:50 -0500, Robert Haas wrote:

On Wed, Jan 23, 2013 at 7:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:

With the (attached for convenience) patch applied you can do
# ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);

to enable this.
What I wonder about is:
* does anybody have a better name for the reloption?

IMHO, it should somehow involve the words "logical" and "replication".

Not a bad point. In the back of my mind I was thinking of reusing it to
do error checking when accessing the heap via index methods as a way of
making sure index support writers are aware of the complexities of doing
so (c.f. ALTER TYPE .. ADD VALUE only being usable outside
transactions).
But thats probably over the top.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Andres Freund (#1)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

Hi,

I decided to reply on the patches thread to be able to find this later.

On 2013-01-23 22:48:50 +0200, Heikki Linnakangas wrote:

"logical changeset generation v4"
This is a boatload of infrastructure for supporting logical replication, yet
we have no code actually implementing logical replication that would go with
this. The premise of logical replication over trigger-based was that it'd be
faster, yet we cannot asses that without a working implementation. I don't
think this can be committed in this state.

Its a fair point that this is a huge amount of code without a user in
itself in-core.
But the reason it got no user included is because several people
explicitly didn't want a user in-core for now but said the first part of
this would be to implement the changeset generation as a separate
piece. Didn't you actually prefer not to have any users of this in-core
yourself?

Also, while the apply side surely isn't benchmarkable without any being
submitted, the changeset generation can very well be benchmarked.

A very, very adhoc benchmark:
-c max_wal_senders=10
-c max_logical_slots=10 --disabled for anything but logical
-c wal_level=logical --hot_standby for anything but logical
-c checkpoint_segments=100
-c log_checkpoints=on
-c shared_buffers=512MB
-c autovacuum=on
-c log_min_messages=notice
-c log_line_prefix='[%p %t] '
-c wal_keep_segments=100
-c fsync=off
-c synchronous_commit=off

pgbench -p 5440 -h /tmp -n -M prepared -c 16 -j 16 -T 30

pgbench upstream:
tps: 22275.941409
space overhead: 0%
pgbench logical-submitted
tps: 16274.603046
space overhead: 2.1%
pgbench logical-HEAD (will submit updated version tomorrow or so):
tps: 20853.341551
space overhead: 2.3%
pgbench single plpgsql trigger (INSERT INTO log(data) VALUES(NEW::text))
tps: 14101.349535
space overhead: 369%

Note that in the single trigger case nobody consumed the queue while the
logical version streamed the changes out and stored them to disk.

Adding a default NOW() or similar to the tables immediately makes
logical decoding faster by a factor of about 3 in comparison to the
above trivial trigger.

The only reason the submitted version of logical decoding is
comparatively slow is that its xmin update policy is braindamaged,
working on that right now.

Greetings,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Robert Haas

robertmhaas@gmail.com

almost 13 years ago

In reply to: Andres Freund (#32)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On Wed, Jan 23, 2013 at 5:30 PM, Andres Freund <andres@2ndquadrant.com> wrote:

pgbench upstream:
tps: 22275.941409
space overhead: 0%
pgbench logical-submitted
tps: 16274.603046
space overhead: 2.1%
pgbench logical-HEAD (will submit updated version tomorrow or so):
tps: 20853.341551
space overhead: 2.3%
pgbench single plpgsql trigger (INSERT INTO log(data) VALUES(NEW::text))
tps: 14101.349535
space overhead: 369%

Note that in the single trigger case nobody consumed the queue while the
logical version streamed the changes out and stored them to disk.

Adding a default NOW() or similar to the tables immediately makes
logical decoding faster by a factor of about 3 in comparison to the
above trivial trigger.

The only reason the submitted version of logical decoding is
comparatively slow is that its xmin update policy is braindamaged,
working on that right now.

I agree. The thing that scares me about the logical replication stuff
is not that it might be slow (and if your numbers are to be believed,
it isn't), but that I suspect it's riddled with bugs and possibly some
questionable design decisions. If we commit it and release it, then
we're going to be stuck maintaining it for a very, very long time. If
it turns out to have serious bugs that can't be fixed without a new
major release, it's going to be a serious black eye for the project.

Of course, I have no evidence that that will happen. But it is a
really big piece of code, and therefore unless you are superman, it's
probably got a really large number of bugs. The scary thing is that
it is not as if we can say, well, this is a big hunk of code, but it
doesn't really touch the core of the system, so if it's broken, it'll
be broken itself, but it won't break anything else. Rather, this code
is deeply in bed with WAL, with MVCC, and with the on-disk format of
tuples, and makes fundamental changes to the first two of those. You
agreed with Tom that 9.2 is the buggiest release in recent memory, but
I think logical replication could easily be an order of magnitude
worse.

I also have serious concerns about checksums and foreign key locks.
Any single one of those three patches could really inflict
unprecedented damage on our community's reputation for stability and
reliability if they turn out to be seriously buggy, and unfortunately
I don't consider that an unlikely outcome. I don't know what to do
about it, either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Joshua D. Drake

jd@commandprompt.com

almost 13 years ago

In reply to: Robert Haas (#33)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 01/23/2013 05:17 PM, Robert Haas wrote:

Of course, I have no evidence that that will happen. But it is a
really big piece of code, and therefore unless you are superman, it's
probably got a really large number of bugs. The scary thing is that
it is not as if we can say, well, this is a big hunk of code, but it
doesn't really touch the core of the system, so if it's broken, it'll
be broken itself, but it won't break anything else. Rather, this code
is deeply in bed with WAL, with MVCC, and with the on-disk format of
tuples, and makes fundamental changes to the first two of those. You
agreed with Tom that 9.2 is the buggiest release in recent memory, but
I think logical replication could easily be an order of magnitude
worse.

Command Prompt worked for YEARS to get logical replication right and we
never got it to the point where I would have been happy submitting it to
-core.

It behooves .Org to be extremely conservative about this feature.
Granted, it is a feature we should have had years ago but still. It is
not a simple thing, it is not an easy thing. It is complicated and
complex to get correcft.

--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Simon Riggs

simon@2ndQuadrant.com

almost 13 years ago

In reply to: Robert Haas (#33)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 24 January 2013 01:17, Robert Haas <robertmhaas@gmail.com> wrote:

I agree. The thing that scares me about the logical replication stuff
is not that it might be slow (and if your numbers are to be believed,
it isn't), but that I suspect it's riddled with bugs and possibly some
questionable design decisions. If we commit it and release it, then
we're going to be stuck maintaining it for a very, very long time. If
it turns out to have serious bugs that can't be fixed without a new
major release, it's going to be a serious black eye for the project.

Of course, I have no evidence that that will happen.

This is a generic argument against applying any invasive patch. I
agree 9.2 had major bugs on release, though that was because of the
invasive nature of some of the changes, even in seemingly minor
patches.

The most invasive and therefore risky changes in this release are
already committed - changes to the way WAL reading and timelines work.
If we don't apply a single additional patch in this CF, we will still
in my opinion have a major requirement for beta testing prior to
release.

The code executed here is isolated to users of the new feature and is
therefore low risk to non-users. Of course there will be bugs.
Everybody understands what new feature means and we as a project
aren't exposed to risks from this. New feature also means
groundbreaking new capabilities, so the balance of high reward, low
risk means this gets my vote to apply. I'm just about to spend some
days giving a final review on it to confirm/refute that opinion in
technical detail.

Code using these features is available and marked them clearly as full
copyright transfer to PGDG, TPL licenced. That code is external not by
author's choice, but at the specific request of the project to make it
thay way. I personally will be looking to add code to core over time.
It was useful for everybody that replication solutions started out of
core, but replication is now a core requirement for databases and we
must fully deliver on that thought.

I agree with your concern re: checksums and foreign key locks. FK
locks has had considerable review and support, so I expect that to be
a manageable issue.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Heikki Linnakangas

hlinnakangas@vmware.com

almost 13 years ago

In reply to: Andres Freund (#32)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 24.01.2013 00:30, Andres Freund wrote:

Hi,

I decided to reply on the patches thread to be able to find this later.

On 2013-01-23 22:48:50 +0200, Heikki Linnakangas wrote:

"logical changeset generation v4"
This is a boatload of infrastructure for supporting logical replication, yet
we have no code actually implementing logical replication that would go with
this. The premise of logical replication over trigger-based was that it'd be
faster, yet we cannot asses that without a working implementation. I don't
think this can be committed in this state.

Its a fair point that this is a huge amount of code without a user in
itself in-core.
But the reason it got no user included is because several people
explicitly didn't want a user in-core for now but said the first part of
this would be to implement the changeset generation as a separate
piece. Didn't you actually prefer not to have any users of this in-core
yourself?

Yes, I certainly did. But we still need to see the other piece of the
puzzle to see how this fits with it.

BTW, why does all the transaction reordering stuff has to be in core?

How much of this infrastructure is to support replicating DDL changes?
IOW, if we drop that requirement, how much code can we slash? Any other
features or requirements that could be dropped? I think it's clear at
this stage that this patch is not going to be committed as it is. If you
can reduce it to a fraction of what it is now, that fraction might have
a chance. Otherwise, it's just going to be pushed to the next commitfest
as whole, and we're going to be having the same doubts and discussions then.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Robert Haas (#33)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

Hi Robert, Hi all,

On 2013-01-23 20:17:04 -0500, Robert Haas wrote:

On Wed, Jan 23, 2013 at 5:30 PM, Andres Freund <andres@2ndquadrant.com> wrote:

The only reason the submitted version of logical decoding is
comparatively slow is that its xmin update policy is braindamaged,
working on that right now.

I agree. The thing that scares me about the logical replication stuff
is not that it might be slow (and if your numbers are to be believed,
it isn't), but that I suspect it's riddled with bugs and possibly some
questionable design decisions. If we commit it and release it, then
we're going to be stuck maintaining it for a very, very long time. If
it turns out to have serious bugs that can't be fixed without a new
major release, it's going to be a serious black eye for the project.

Thats way much more along the lines of what I am afraid of than the
performance stuff - but Heikki cited those, so I replied to that.

Note that I didn't say this must, must go in - I just don't think
Heikki's reasoning about why not hit the nail on the head.

Of course, I have no evidence that that will happen. But it is a
really big piece of code, and therefore unless you are superman, it's
probably got a really large number of bugs. The scary thing is that
it is not as if we can say, well, this is a big hunk of code, but it
doesn't really touch the core of the system, so if it's broken, it'll
be broken itself, but it won't break anything else. Rather, this code
is deeply in bed with WAL, with MVCC, and with the on-disk format of
tuples, and makes fundamental changes to the first two of those. You
agreed with Tom that 9.2 is the buggiest release in recent memory, but
I think logical replication could easily be an order of magnitude
worse.

I tried very, very hard to get the basics of the design & interface
solid. Which obviously doesn't man I am succeeding - luckily not being
superhuman after all ;). And I think thats very much where input is
desparetely needed and where I failed to raise enough attention. The
"output plugin" interface follewed by the walsender interface is what
needs to be most closely vetted.
Those are the permanent, user/developer exposed UI and the one we should
try to keep as stable as possible.

The output plugin callbacks are defined here:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/include/replication/output_plugin.h;hb=xlog-decoding-rebasing-cf4
To make it more agnostic of the technology to implement changeset
extraction we possibly should replace the ReorderBuffer(TXN|Change)
structs being passed by something more implementation agnostic.

walsender interface:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/replication/repl_gram.y;hb=xlog-decoding-rebasing-cf4
The interesting new commands are:
1) K_INIT_LOGICAL_REPLICATION NAME NAME
2) K_START_LOGICAL_REPLICATION NAME RECPTR plugin_options
3) K_FREE_LOGICAL_REPLICATION NAME

1 & 3 allocate (respectively free) the permanent state associated with
one changeset consumer whereas START_LOGICAL_REPLICATION streams out
changes starting at RECPTR.

Btw, there are currently *no* changes to the wal format at all if
wal_format < logical except that xl_running_xacts are logged more
frequently which obviously could easily be made conditional. Baring bugs
of course.
The changes with wal_level>=logical aren't that big either imo:
* heap_insert, heap_update prevent full page writes from removing their
normal record by using a separate XLogRecData block for the buffer and
the record
* heap_delete adds more data (the pkey of the tuple) after the unchanged
xl_heap_delete struct
* On changes to catalog tables (relfilenode, tid, cmin, cmax) are logged.

No changes to mvcc for normal backends at all, unless you count the very
slightly changed *Satisfies interface (getting passed a HeapTuple
instead of HeapTupleHeader).

I am not sure what you're concerned about WRT the on-disk format of the
tuples? We are pretty much nailed down on that due to pg_upgrade'ability
anyway and it could be changed from this patches POV without a problem,
the output plugin just sees normal HeapTuples? Or are you concerned
about the code extracting them from the xlog records?

So I think the "won't break anything else" argument can be made rather
fairly if the heapam.c changes, which aren't that complex, are vetted
closely.

Now, the disucssion about all the code thats active *during* decoding is
something else entirely :/

You
agreed with Tom that 9.2 is the buggiest release in recent memory, but
I think logical replication could easily be an order of magnitude
worse.

I unfortunately think that not providing more builtin capabilities in
this area also has significant dangers. Imo this is one of the weakest,
or even the weakest, area of postgres.

I personally have the impression that just about nobody did actual beta
testing of the lastest releases, especially 9.2, and that is the reason
why its the buggiest recent release.

I also have serious concerns about checksums and foreign key locks.
Any single one of those three patches could really inflict
unprecedented damage on our community's reputation for stability and
reliability if they turn out to be seriously buggy, and unfortunately
I don't consider that an unlikely outcome. I don't know what to do
about it, either.

I see your point although I would attest both having far more danger of
collateral damage than logical decoding itself. Especially fklocks
pretty much has to be active by default and there's not much that can
be reasonably done about that.
I tried to give fklocks a very thorough look but unfortunately I didn't
know anything beforehand about most areas of the code it touches which
obviously limits the amount of dangers one is seeing (FWIW I am still
mostly concerned with multixact logging and the locking changes in
heapam.c itself).

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Heikki Linnakangas

hlinnakangas@vmware.com

almost 13 years ago

In reply to: Andres Freund (#37)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

One random thing that caught my eye in the patch, I though I'd mention
it while I still remember: In heap_delete, you call heap_form_tuple() in
a critical section. That's a bad idea, because if it runs out of memory
-> PANIC.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Heikki Linnakangas (#36)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-24 12:38:25 +0200, Heikki Linnakangas wrote:

On 24.01.2013 00:30, Andres Freund wrote:

Hi,

I decided to reply on the patches thread to be able to find this later.

On 2013-01-23 22:48:50 +0200, Heikki Linnakangas wrote:

"logical changeset generation v4"
This is a boatload of infrastructure for supporting logical replication, yet
we have no code actually implementing logical replication that would go with
this. The premise of logical replication over trigger-based was that it'd be
faster, yet we cannot asses that without a working implementation. I don't
think this can be committed in this state.

Its a fair point that this is a huge amount of code without a user in
itself in-core.
But the reason it got no user included is because several people
explicitly didn't want a user in-core for now but said the first part of
this would be to implement the changeset generation as a separate
piece. Didn't you actually prefer not to have any users of this in-core
yourself?

Yes, I certainly did. But we still need to see the other piece of the puzzle
to see how this fits with it.

Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?

BTW, why does all the transaction reordering stuff has to be in core?

It didn't use to, but people argued pretty damned hard that no undecoded
data should ever allowed to leave the postgres cluster. And to be fair
it makes writing an output plugin *way* much easier. Check
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4
If you skip over tuple_to_stringinfo(), which is just pretty generic
scaffolding for converting a whole tuple to a string, writing out the
changes in some format by now is pretty damn simple.

How much of this infrastructure is to support replicating DDL changes? IOW,
if we drop that requirement, how much code can we slash?

Unfortunately I don't think too much unless we add in other code that
allows us to check whether the current definition of a table is still
the same as it was back when the tuple was logged.

Any other features or requirements that could be dropped? I think it's clear at this stage that
this patch is not going to be committed as it is. If you can reduce it to a
fraction of what it is now, that fraction might have a chance. Otherwise,
it's just going to be pushed to the next commitfest as whole, and we're
going to be having the same doubts and discussions then.

One thing that reduces complexity is to declare the following as
unsupported:
- CREATE TABLE foo(data text);
- DECODE UP TO HERE;
- INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple);
- DROP TABLE foo;
- DECODE UP TO HERE;

but thats just a minor thing.

I think what we can do more realistically than to chop of required parts
of changeset extraction is to start applying some of the preliminary
patches independently:
- the relmapper/relfilenode changes + pg_relation_by_filenode(spc,
relnode) should be independently committable if a bit boring
- allowing walsenders to connect to a database possibly needs an interface change
but otherwise it should be fine to go in independently. It also has
other potential use-cases, so I think thats fair.
- logging xl_running_xact's more frequently could also be committed
independently and makes sense independently as it allows a standby to
enter HS faster if the master is busy
- Introducing InvalidCommandId should be relatively uncontroversial. The
fact that no invalid value for command ids exists is imo an oversight
- the *Satisfies change could be applied and they are imo ready but
there's no use-case for it without the rest, so I am not sure whether
theres a point
- currently not separately available, but we could add wal_level=logical
independently. There would be no user of it, but it would be partial
work. That includes the relcache support for keeping track of the
primary key which already is available separately.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Heikki Linnakangas (#38)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-24 13:28:18 +0200, Heikki Linnakangas wrote:

One random thing that caught my eye in the patch, I though I'd mention it
while I still remember: In heap_delete, you call heap_form_tuple() in a
critical section. That's a bad idea, because if it runs out of memory ->
PANIC.

Good point, will fix.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Steve Singer

steve@ssinger.info

almost 13 years ago

In reply to: Andres Freund (#39)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 13-01-24 06:40 AM, Andres Freund wrote:

Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?

I was able to get something that generated output for INSERT statements
in a format similar to what a modified slony apply trigger would want.
This was with the list of tables to replicate hard-coded in the plugin.
This was with the patchset from the last commitfest.I had gotten a bit
hung up on the UPDATE and DELETE support because slony allows you to use
an arbitrary user specified unique index as your key. It looks like
better support for tables with a unique non-primary key is in the most
recent patch set. I am hoping to have time this weekend to update my
plugin to use parameters passed in on the init and other updates in the
most recent version. If I make some progress I will post a link to my
progress at the end of the weekend. My big issue is that I have limited
time to spend on this.

BTW, why does all the transaction reordering stuff has to be in core?

It didn't use to, but people argued pretty damned hard that no undecoded
data should ever allowed to leave the postgres cluster. And to be fair
it makes writing an output plugin *way* much easier. Check
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4
If you skip over tuple_to_stringinfo(), which is just pretty generic
scaffolding for converting a whole tuple to a string, writing out the
changes in some format by now is pretty damn simple.

I think we will find that the replication systems won't be the only
users of this feature. I have often seen systems that have a logging
requirement for auditing purposes or to log then reconstruct the
sequence of changes made to a set of tables in order to feed a
downstream application. Triggers and a journaling table are the
traditional way of doing this but it should be pretty easy to write a
plugin to accomplish the same thing that should give better
performance. If the reordering stuff wasn't in core this would be much
harder.

How much of this infrastructure is to support replicating DDL changes? IOW,
if we drop that requirement, how much code can we slash?

Unfortunately I don't think too much unless we add in other code that
allows us to check whether the current definition of a table is still
the same as it was back when the tuple was logged.

Any other features or requirements that could be dropped? I think it's clear at this stage that
this patch is not going to be committed as it is. If you can reduce it to a
fraction of what it is now, that fraction might have a chance. Otherwise,
it's just going to be pushed to the next commitfest as whole, and we're
going to be having the same doubts and discussions then.

One thing that reduces complexity is to declare the following as
unsupported:
- CREATE TABLE foo(data text);
- DECODE UP TO HERE;
- INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple);
- DROP TABLE foo;
- DECODE UP TO HERE;

but thats just a minor thing.

I think what we can do more realistically than to chop of required parts
of changeset extraction is to start applying some of the preliminary
patches independently:
- the relmapper/relfilenode changes + pg_relation_by_filenode(spc,
relnode) should be independently committable if a bit boring
- allowing walsenders to connect to a database possibly needs an interface change
but otherwise it should be fine to go in independently. It also has
other potential use-cases, so I think thats fair.
- logging xl_running_xact's more frequently could also be committed
independently and makes sense independently as it allows a standby to
enter HS faster if the master is busy
- Introducing InvalidCommandId should be relatively uncontroversial. The
fact that no invalid value for command ids exists is imo an oversight
- the *Satisfies change could be applied and they are imo ready but
there's no use-case for it without the rest, so I am not sure whether
theres a point
- currently not separately available, but we could add wal_level=logical
independently. There would be no user of it, but it would be partial
work. That includes the relcache support for keeping track of the
primary key which already is available separately.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Robert Haas

robertmhaas@gmail.com

almost 13 years ago

In reply to: Andres Freund (#37)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On Thu, Jan 24, 2013 at 6:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Thats way much more along the lines of what I am afraid of than the
performance stuff - but Heikki cited those, so I replied to that.

Note that I didn't say this must, must go in - I just don't think
Heikki's reasoning about why not hit the nail on the head.

Fair enough, no argument.

Before getting bogged down in technical commentary, let me say this
very clearly: I am enormously grateful for your work on this project.
Logical replication based on WAL decoding is a feature of enormous
value that PostgreSQL has needed for a long time, and your work has
made that look like an achievable goal. Furthermore, it seems to me
that you have pursued the community process with all the vigor and
sincerity for which anyone could ask. Serious design concerns were
raised early in the process and you made radical changes to the design
which I believe have improved it tremendously, and you've continued to
display an outstanding attitude at every phase of this process about
which I can't say enough good things. There is no question in my mind
that this work is going to be the beginning of a process that
revolutionizes the way people think about replication and PostgreSQL,
and you deserve our sincere thanks for that.

Now, the bad news is, I don't think it's very reasonable to try to
commit this to 9.3. I think it is just too much stuff too late in the
cycle. I've reviewed some of the patches from time to time but there
is a lot more stuff and it's big and complicated and it's not really
clear that we have the interface quite right yet, even though I think
it's also clear that we are a lot of closer than we were. I don't
want to be fixing that during beta, much less after release.

I tried very, very hard to get the basics of the design & interface
solid. Which obviously doesn't man I am succeeding - luckily not being
superhuman after all ;). And I think thats very much where input is
desparetely needed and where I failed to raise enough attention. The
"output plugin" interface follewed by the walsender interface is what
needs to be most closely vetted.
Those are the permanent, user/developer exposed UI and the one we should
try to keep as stable as possible.

The output plugin callbacks are defined here:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/include/replication/output_plugin.h;hb=xlog-decoding-rebasing-cf4
To make it more agnostic of the technology to implement changeset
extraction we possibly should replace the ReorderBuffer(TXN|Change)
structs being passed by something more implementation agnostic.

walsender interface:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/replication/repl_gram.y;hb=xlog-decoding-rebasing-cf4
The interesting new commands are:
1) K_INIT_LOGICAL_REPLICATION NAME NAME
2) K_START_LOGICAL_REPLICATION NAME RECPTR plugin_options
3) K_FREE_LOGICAL_REPLICATION NAME

1 & 3 allocate (respectively free) the permanent state associated with
one changeset consumer whereas START_LOGICAL_REPLICATION streams out
changes starting at RECPTR.

Forgive me for not having looked at the patch, but to what extent is
all this, ah, documented?

Btw, there are currently *no* changes to the wal format at all if
wal_format < logical except that xl_running_xacts are logged more
frequently which obviously could easily be made conditional. Baring bugs
of course.
The changes with wal_level>=logical aren't that big either imo:
* heap_insert, heap_update prevent full page writes from removing their
normal record by using a separate XLogRecData block for the buffer and
the record
* heap_delete adds more data (the pkey of the tuple) after the unchanged
xl_heap_delete struct
* On changes to catalog tables (relfilenode, tid, cmin, cmax) are logged.

No changes to mvcc for normal backends at all, unless you count the very
slightly changed *Satisfies interface (getting passed a HeapTuple
instead of HeapTupleHeader).

I am not sure what you're concerned about WRT the on-disk format of the
tuples? We are pretty much nailed down on that due to pg_upgrade'ability
anyway and it could be changed from this patches POV without a problem,
the output plugin just sees normal HeapTuples? Or are you concerned
about the code extracting them from the xlog records?

Mostly, my concern is that you've accidentally broken something, or
that your code will turn out to be flaky in ways we can't now predict.
My only really specific concern at this point is about the special
treatment of catalog tables. We've never done anything like that
before, and it feels like a bad idea. In particular, the fact that
you have to WAL-log new information about cmin/cmax really suggests
that we're committing ourselves to the MVCC infrastructure in a way
that we weren't previously. There's some category of stuff that our
MVCC implementation didn't previously require us to persist on disk
which, after this, it will. I don't understand exactly where the
boundaries of that are in terms of future changes we might want to
make - but I don't like moving the goalposts in that area.

So I think the "won't break anything else" argument can be made rather
fairly if the heapam.c changes, which aren't that complex, are vetted
closely.

Now, the disucssion about all the code thats active *during* decoding is
something else entirely :/

You
agreed with Tom that 9.2 is the buggiest release in recent memory, but
I think logical replication could easily be an order of magnitude
worse.

I unfortunately think that not providing more builtin capabilities in
this area also has significant dangers. Imo this is one of the weakest,
or even the weakest, area of postgres.

I personally have the impression that just about nobody did actual beta
testing of the lastest releases, especially 9.2, and that is the reason
why its the buggiest recent release.

I also have serious concerns about checksums and foreign key locks.
Any single one of those three patches could really inflict
unprecedented damage on our community's reputation for stability and
reliability if they turn out to be seriously buggy, and unfortunately
I don't consider that an unlikely outcome. I don't know what to do
about it, either.

I see your point although I would attest both having far more danger of
collateral damage than logical decoding itself. Especially fklocks
pretty much has to be active by default and there's not much that can
be reasonably done about that.
I tried to give fklocks a very thorough look but unfortunately I didn't
know anything beforehand about most areas of the code it touches which
obviously limits the amount of dangers one is seeing (FWIW I am still
mostly concerned with multixact logging and the locking changes in
heapam.c itself).

Yeah, I don't disagree with any of that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Stephen Frost

sfrost@snowman.net

almost 13 years ago

In reply to: Robert Haas (#42)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

* Robert Haas (robertmhaas@gmail.com) wrote:

Now, the bad news is, I don't think it's very reasonable to try to
commit this to 9.3. I think it is just too much stuff too late in the
cycle. I've reviewed some of the patches from time to time but there
is a lot more stuff and it's big and complicated and it's not really
clear that we have the interface quite right yet, even though I think
it's also clear that we are a lot of closer than we were. I don't
want to be fixing that during beta, much less after release.

The only way to avoid this happening again and again, imv, is to get it
committed early in whatever cycle it's slated to release for. We've got
some serious challenges there though because we want to encourage
everyone to focus on beta testing and going through the release process,
plus we don't want to tag/branch too early or we create more work for
ourselves.

It would have been nice to get this into 9.3, but I can certainly
understand needing to move it back, but can we get a slightly more
specific plan around getting it in then?

Thanks,

Stephen

#44

Heikki Linnakangas

hlinnakangas@vmware.com

almost 13 years ago

In reply to: Robert Haas (#42)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 24.01.2013 20:27, Robert Haas wrote:

Before getting bogged down in technical commentary, let me say this
very clearly: I am enormously grateful for your work on this project.
Logical replication based on WAL decoding is a feature of enormous
value that PostgreSQL has needed for a long time, and your work has
made that look like an achievable goal. Furthermore, it seems to me
that you have pursued the community process with all the vigor and
sincerity for which anyone could ask. Serious design concerns were
raised early in the process and you made radical changes to the design
which I believe have improved it tremendously, and you've continued to
display an outstanding attitude at every phase of this process about
which I can't say enough good things.

+1. I really appreciate all the work you Andres have put into this. I've
argued in the past myself that there should be a little tool that
scrapes the WAL to do logical replication. Essentially, just what you've
implemented.

That said (hah, you knew there would be a "but" ;-)), now that I see
what that looks like, I'm feeling that maybe it wasn't such a good idea
after all. It sounded like a fairly small patch that greatly reduces the
overhead in the master with existing replication systems like slony, but
it turned out to be a huge patch with a lot of new concepts and interfaces.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Robert Haas (#42)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

Hi!

On 2013-01-24 13:27:00 -0500, Robert Haas wrote:

On Thu, Jan 24, 2013 at 6:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Before getting bogged down in technical commentary, let me say this
very clearly: I am enormously grateful for your work on this project.
Logical replication based on WAL decoding is a feature of enormous
value that PostgreSQL has needed for a long time, and your work has
made that look like an achievable goal. Furthermore, it seems to me
that you have pursued the community process with all the vigor and
sincerity for which anyone could ask. Serious design concerns were
raised early in the process and you made radical changes to the design
which I believe have improved it tremendously, and you've continued to
display an outstanding attitude at every phase of this process about
which I can't say enough good things.

Very much appreciated. Especially as I can echo your feeling of not only
having positive feelings about the process ;)

Now, the bad news is, I don't think it's very reasonable to try to
commit this to 9.3. I think it is just too much stuff too late in the
cycle. I've reviewed some of the patches from time to time but there
is a lot more stuff and it's big and complicated and it's not really
clear that we have the interface quite right yet, even though I think
it's also clear that we are a lot of closer than we were. I don't
want to be fixing that during beta, much less after release.

It pains me to admit that you have a point there.

What I am afraid though is that it basically goes on like this in the
next commitfests:
* 9.4-CF1: no "serious" reviewer comments because they are busy doing release work
* 9.4-CF2: all are relieved that the release is over and a bit tired
* 9.4-CF3: first deeper review, some more complex restructuring required
* 9.4-CF4: too many changes to commit.

If you look at the development of the feature, after the first prototype
and the resulting design changes nobody with decision power had a more
than cursory look at the proposed interfaces. Thats very, very, very
understandable, you all are busy people and the patch & the interfaces
are complex so it takes noticeable amounts of time, but it unfortunately
doesn't help in getting an acceptable interface nailed down.

The problem with that is not only that it sucks huge amounts of energy
out of me and others but also that its very hard to really build the
layers/users above changeset extraction without being able to rely on
the interface and semantics. So we never get to the actually benefits
:(, and we don't get the users people require for the feature to be
committed.

So far, the only really effective way of getting people to comment on
patches in this state & complexity is the threat of an upcoming commit
because of the last commitfest :(

I honestly don't know how to go on about this...

I tried very, very hard to get the basics of the design & interface
solid. Which obviously doesn't man I am succeeding - luckily not being
superhuman after all ;). And I think thats very much where input is
desparetely needed and where I failed to raise enough attention. The
"output plugin" interface follewed by the walsender interface is what
needs to be most closely vetted.
Those are the permanent, user/developer exposed UI and the one we should
try to keep as stable as possible.

The output plugin callbacks are defined here:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/include/replication/output_plugin.h;hb=xlog-decoding-rebasing-cf4
To make it more agnostic of the technology to implement changeset
extraction we possibly should replace the ReorderBuffer(TXN|Change)
structs being passed by something more implementation agnostic.

walsender interface:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/replication/repl_gram.y;hb=xlog-decoding-rebasing-cf4
The interesting new commands are:
1) K_INIT_LOGICAL_REPLICATION NAME NAME
2) K_START_LOGICAL_REPLICATION NAME RECPTR plugin_options
3) K_FREE_LOGICAL_REPLICATION NAME

1 & 3 allocate (respectively free) the permanent state associated with
one changeset consumer whereas START_LOGICAL_REPLICATION streams out
changes starting at RECPTR.

Forgive me for not having looked at the patch, but to what extent is
all this, ah, documented?

There are several mails on -hackers where I ask for input on whether
that interface is what people want but all the comments have been from
non-core pg people, although mildly favorable.

I couldn't convince myself of writing real low-level documentation
instead of just the example code I needed for testing anyway and some
more higher-level docs before I had input from that side. Perhaps that
was a mistake.

So, here's a slightly less quick overview of the walsender interface:

Whenever a new replication consumer wants to stream data we need to make
sure on the primary that the data can be provided gapless, even across
disconnects, crashes et al.
The permanent state associated with this is currently called a
"replication slot".

$ psql "port=5440 dbname=postgres replication=1"
postgres=# INIT_LOGICAL_REPLICATION 'bdr-whatever-1' 'test_decoding';
replication_id | consistent_point | snapshot_name | plugin
----------------+------------------+---------------+---------------
bdr-whatever-1 | 0/3E8DFA08 | 000F54F1-1 | test_decoding
(1 row)

So now we have allocated a permanent slot identified by the name
'bdr-whatever-1'. It also automatically exported the snapshot
'000F54F1-1' that can be imported into another transaction, e.g. to
consistently dump an initial snapshot of the data.
The information returned in the 'consistent_point' column tells us that
we will be able to return all data from that LSN onwards.

That replication slot can *only* be used for replicating changes out of
the database postgres and with the plugin 'test_decoding' (a contrib
module).

That slot will persist across restarts and everything until somebody
issues a
FREE_LOGICAL_REPLICATION 'bdr-whatever-1'.

To start streaming out changes the command
postgres=# START_LOGICAL_REPLICATION 'bdr-whatever-1' 0/3E8DFA08;
WARNING: Starting logical replication
unexpected PQresultStatus: 8
Time: 76.346 ms

is used. Unfortunately psql isn't a suitable consumer as it cannot deal
with the unrequested copy, but thats what we have pg_receivellog for:

$ ~/.../pg_receivellog -p 5440 -d postgres --slot bdr-whatever-1 -f - --start -v
pg_receivellog: starting log streaming at 0/0 (slot bdr-whatever-1)
pg_receivellog: initiated streaming

Which will start streaming out changes when we do:
$ psql -h /tmp -p 5440 -U andres postgres
postgres=# CREATE TABLE frak(id serial primary key, data int);
CREATE TABLE
postgres=# INSERT INTO frak (data) SELECT * FROM generate_series(1, 1);
INSERT 0 1

back to receivellog:

BEGIN 1004786
table "frak_id_seq": INSERT: sequence_name[name]:frak_id_seq last_value[int8]:1 start_value[int8]:1 increment_by[int8]:1 max_value[int8]:9223372036854775807 min_value[int8]:1 cache_value[int8]:1 log_cnt[int8]:0 is_cycled[bool]:f is_called[bool]:f
COMMIT 1004786
pg_receivellog: confirming flush up to 0/3E8F0F30 (slot bdr-whatever-1)
BEGIN 1004787
table "frak": INSERT: id[int4]:1 data[int4]:1
COMMIT 1004787
pg_receivellog: confirming flush up to 0/3E8FCDC0 (slot bdr-whatever-1)

Makes sense so far?

The actual output you see there, the
BEGIN 1004787
table "frak": INSERT: id[int4]:1 data[int4]:1
COMMIT 1004787
bit, is generated by the test_decoding plugin referenced previously
which has functions like
extern void pg_decode_init(struct LogicalDecodingContext *ctx, bool is_init);
extern bool pg_decode_begin_txn(struct LogicalDecodingContext *ctx, ReorderBufferTXN* txn);
extern bool pg_decode_commit_txn(struct LogicalDecodingContext *ctx, ReorderBufferTXN* txn, XLogRecPtr commit_lsn);
extern bool pg_decode_change(struct LogicalDecodingContext *ctx, ReorderBufferTXN* txn, Oid tableoid, ReorderBufferChange *change);

And e.g. begin_txn looks like:

bool
pg_decode_begin_txn(struct LogicalDecodingContext *ctx, ReorderBufferTXN* txn)
{
TestDecodingData *data = ctx->output_plugin_private;

ctx->prepare_write(ctx, txn->lsn, txn->xid);
if (data->include_xids)
appendStringInfo(ctx->out, "BEGIN %u", txn->xid);
else
appendStringInfoString(ctx->out, "BEGIN");
ctx->write(ctx, txn->lsn, txn->xid);
return true;
}

As you see, it seems to have somehow gathered options from
somewhere. Those can be specified as optional argumetns to
START_LOGICAL_REPLICATION.

Btw, there are currently *no* changes to the wal format at all if
wal_format < logical except that xl_running_xacts are logged more
frequently which obviously could easily be made conditional. Baring bugs
of course.
The changes with wal_level>=logical aren't that big either imo:
* heap_insert, heap_update prevent full page writes from removing their
normal record by using a separate XLogRecData block for the buffer and
the record
* heap_delete adds more data (the pkey of the tuple) after the unchanged
xl_heap_delete struct
* On changes to catalog tables (relfilenode, tid, cmin, cmax) are logged.

No changes to mvcc for normal backends at all, unless you count the very
slightly changed *Satisfies interface (getting passed a HeapTuple
instead of HeapTupleHeader).

I am not sure what you're concerned about WRT the on-disk format of the
tuples? We are pretty much nailed down on that due to pg_upgrade'ability
anyway and it could be changed from this patches POV without a problem,
the output plugin just sees normal HeapTuples? Or are you concerned
about the code extracting them from the xlog records?

Mostly, my concern is that you've accidentally broken something, or
that your code will turn out to be flaky in ways we can't now predict.

I really think a look or two from experienced enough people should make
the heapam parts safe enough.

The changes basically are like:

heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- XLogRecData rdata[3];
+ XLogRecData rdata[4];
Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;

+       /*
+        * For the logical replication case we need the tuple even if were
+        * doing a full page write. We could alternatively store a pointer into
+        * the fpw though.
+        * For that to work we add another rdata entry for the buffer in that
+        * case.
+        */
+       bool        need_tuple_data = wal_level >= WAL_LEVEL_LOGICAL
+           && RelationGetRelid(relation)  >= FirstNormalObjectId;
+
+       /* For logical decode we need combocids to properly decode the catalog */
+       if (wal_level >= WAL_LEVEL_LOGICAL && RelationGetRelid(relation)  < FirstNormalObjectId)
+           log_heap_new_cid(relation, heaptup);
...
        rdata[1].data = (char *) &xlhdr;
        rdata[1].len = SizeOfHeapHeader;
-       rdata[1].buffer = buffer;
+       rdata[1].buffer = need_tuple_data ? InvalidBuffer : buffer;
        rdata[1].buffer_std = true;
        rdata[1].next = &(rdata[2]);

        /* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
        rdata[2].data = (char *) heaptup->t_data + offsetof(HeapTupleHeaderData, t_bits);
        rdata[2].len = heaptup->t_len - offsetof(HeapTupleHeaderData, t_bits);
-       rdata[2].buffer = buffer;
+       rdata[2].buffer = need_tuple_data ? InvalidBuffer : buffer;
        rdata[2].buffer_std = true;
        rdata[2].next = NULL;

        /*
+        * add record for the buffer without actual content thats removed if
+        * fpw is done for that buffer
+        */
+       if (need_tuple_data)
+       {
+           rdata[2].next = &(rdata[3]);
+
+           rdata[3].data = NULL;
+           rdata[3].len = 0;
+           rdata[3].buffer = buffer;
+           rdata[3].buffer_std = true;
+           rdata[3].next = NULL;
+       }

Both the wal_level >= logical && XXX checks are now nicely encapsulated
but this shows the complexity of whats being done better...

Thats about all the changes that are done to heapam.c. Well, the same is
done for update, multi_insert, and delete as well, but...

My only really specific concern at this point is about the special
treatment of catalog tables. We've never done anything like that
before, and it feels like a bad idea. In particular, the fact that
you have to WAL-log new information about cmin/cmax really suggests
that we're committing ourselves to the MVCC infrastructure in a way
that we weren't previously.

It basically restores the pre 8.3 (?) state again where cmin/max were
really stored - only that it only does so temporarily instead of
permanently bloating the tables again. It imo pretty closely resembles
what the normal code is doing with combocids, just that the combocid in
this case is slightly more complex because it needs to be looked up over
a longer timeframe.
I thought about simply re-adding cmin/max storage for catalog tables,
with some trickery thats not that hard to do (store it similar to oids),
but the impact of that would have been far, far greater.

And the decision of treating only some tables that way? Well, thats a
question of overhead. There simply is no need to do something like that
for tables that aren't required for converting a HeapTuple to the format
the output wants.
From my pov its somewhat similar to the way we log differently for
temporary, persistent and unlogged tables.

There's some category of stuff that our
MVCC implementation didn't previously require us to persist on disk
which, after this, it will. I don't understand exactly where the
boundaries of that are in terms of future changes we might want to
make - but I don't like moving the goalposts in that area.

I don't really see a problem there. If we decide to get rid of MVCC in a
fundamental manner, this will be the absolutely, smallest problem of it
all. IMNSHO ;)

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Bruce Momjian

bruce@momjian.us

almost 13 years ago

In reply to: Andres Freund (#45)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On Fri, Jan 25, 2013 at 02:16:09AM +0100, Andres Freund wrote:

What I am afraid though is that it basically goes on like this in the
next commitfests:
* 9.4-CF1: no "serious" reviewer comments because they are busy doing release work
* 9.4-CF2: all are relieved that the release is over and a bit tired
* 9.4-CF3: first deeper review, some more complex restructuring required
* 9.4-CF4: too many changes to commit.

If you look at the development of the feature, after the first prototype
and the resulting design changes nobody with decision power had a more
than cursory look at the proposed interfaces. Thats very, very, very
understandable, you all are busy people and the patch & the interfaces
are complex so it takes noticeable amounts of time, but it unfortunately
doesn't help in getting an acceptable interface nailed down.

The problem with that is not only that it sucks huge amounts of energy
out of me and others but also that its very hard to really build the
layers/users above changeset extraction without being able to rely on
the interface and semantics. So we never get to the actually benefits
:(, and we don't get the users people require for the feature to be
committed.

So far, the only really effective way of getting people to comment on
patches in this state & complexity is the threat of an upcoming commit
because of the last commitfest :(

I honestly don't know how to go on about this...

This is very accurate and the big challenge of large, invasive patches.
You almost need to hit it perfect the first time to get it committed in
less than a year.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Heikki Linnakangas (#44)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-24 20:53:18 +0200, Heikki Linnakangas wrote:

That said (hah, you knew there would be a "but" ;-)), now that I see what
that looks like, I'm feeling that maybe it wasn't such a good idea after
all. It sounded like a fairly small patch that greatly reduces the overhead
in the master with existing replication systems like slony, but it turned
out to be a huge patch with a lot of new concepts and interfaces.

Heh, I know the feeling that there must be a simpler way. But after
trying several approaches (more than I dare to admit) I don't really
think there's any that provides the asked for flexibility.
I really think the flexibility is whats required to satisfy the very
diverse aims people have for a feature like this.

And if you look at the overall diffstat, without minor changes, example
code and documentation:

src/backend/access/heap/heapam.c | 286 ++-
src/backend/replication/logical/decode.c | 514 ++++++
src/backend/replication/logical/logical.c | 943 ++++++++++
src/backend/replication/logical/logicalfuncs.c | 115 ++
src/backend/replication/logical/reorderbuffer.c | 1947 ++++++++++++++++++++
src/backend/replication/logical/snapbuild.c | 1596 ++++++++++++++++
src/backend/replication/walsender.c | 620 ++++++-
src/bin/pg_basebackup/pg_receivellog.c | 869 +++++++++
src/include/replication/decode.h | 20 +
src/include/replication/logical.h | 205 +++
src/include/replication/logicalfuncs.h | 14 +
src/include/replication/output_plugin.h | 73 +
src/include/replication/reorderbuffer.h | 296 +++
src/include/replication/snapbuild.h | 176 ++
src/include/replication/walsender_private.h | 6 +-
src/backend/storage/ipc/procarray.c | 63 +-
src/backend/utils/time/tqual.c | 272 ++-

Its not *that* big compared to other patches that have been committed.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Bruce Momjian (#46)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-24 20:28:41 -0500, Bruce Momjian wrote:

On Fri, Jan 25, 2013 at 02:16:09AM +0100, Andres Freund wrote:

What I am afraid though is that it basically goes on like this in the
next commitfests:
* 9.4-CF1: no "serious" reviewer comments because they are busy doing release work
* 9.4-CF2: all are relieved that the release is over and a bit tired
* 9.4-CF3: first deeper review, some more complex restructuring required
* 9.4-CF4: too many changes to commit.

If you look at the development of the feature, after the first prototype
and the resulting design changes nobody with decision power had a more
than cursory look at the proposed interfaces. Thats very, very, very
understandable, you all are busy people and the patch & the interfaces
are complex so it takes noticeable amounts of time, but it unfortunately
doesn't help in getting an acceptable interface nailed down.

The problem with that is not only that it sucks huge amounts of energy
out of me and others but also that its very hard to really build the
layers/users above changeset extraction without being able to rely on
the interface and semantics. So we never get to the actually benefits
:(, and we don't get the users people require for the feature to be
committed.

So far, the only really effective way of getting people to comment on
patches in this state & complexity is the threat of an upcoming commit
because of the last commitfest :(

I honestly don't know how to go on about this...

This is very accurate and the big challenge of large, invasive patches.
You almost need to hit it perfect the first time to get it committed in
less than a year.

My primary concern really isn't to get it committed inside a year, but
to be sure to get input in-time to be able to actually continue to
work. And to commit it then. And I am absolutely, absolutely not sure
thats going to work.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Bruce Momjian

bruce@momjian.us

almost 13 years ago

In reply to: Andres Freund (#48)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On Fri, Jan 25, 2013 at 02:40:03AM +0100, Andres Freund wrote:

The problem with that is not only that it sucks huge amounts of energy
out of me and others but also that its very hard to really build the
layers/users above changeset extraction without being able to rely on
the interface and semantics. So we never get to the actually benefits
:(, and we don't get the users people require for the feature to be
committed.

So far, the only really effective way of getting people to comment on
patches in this state & complexity is the threat of an upcoming commit
because of the last commitfest :(

I honestly don't know how to go on about this...

This is very accurate and the big challenge of large, invasive patches.
You almost need to hit it perfect the first time to get it committed in
less than a year.

My primary concern really isn't to get it committed inside a year, but
to be sure to get input in-time to be able to actually continue to
work. And to commit it then. And I am absolutely, absolutely not sure
thats going to work.

I have found that if I push out improvements right after they are
requested, I can sometimes get momentum for people to get excited about
the patch. That is very hard to do with any other time constraints. I
am not saying you didn't push out stuff quickly, only that this is hard
to do.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Steve Singer

steve@ssinger.info

almost 13 years ago

In reply to: Steve Singer (#41)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 13-01-24 11:15 AM, Steve Singer wrote:

On 13-01-24 06:40 AM, Andres Freund wrote:

Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?

I was able to get something that generated output for INSERT
statements in a format similar to what a modified slony apply trigger
would want. This was with the list of tables to replicate hard-coded
in the plugin. This was with the patchset from the last commitfest.I
had gotten a bit hung up on the UPDATE and DELETE support because
slony allows you to use an arbitrary user specified unique index as
your key. It looks like better support for tables with a unique
non-primary key is in the most recent patch set. I am hoping to have
time this weekend to update my plugin to use parameters passed in on
the init and other updates in the most recent version. If I make some
progress I will post a link to my progress at the end of the weekend.
My big issue is that I have limited time to spend on this.

This isn't a complete review just a few questions I've hit so far that I
thought I'd ask to see if I'm not seeing something related to updates.

*** a/src/include/catalog/index.h
--- b/src/include/catalog/index.h
*************** extern bool ReindexIsProcessingHeap(Oid
*** 114,117 ****
--- 114,121 ----
   extern bool ReindexIsProcessingIndex(Oid indexOid);
   extern Oid    IndexGetRelation(Oid indexId, bool missing_ok);

+ extern void relationFindPrimaryKey(Relation pkrel, Oid *indexOid,
+                                    int16 *nratts, int16 *attnums, Oid 
*atttypids,
+                                    Oid *opclasses);
+
   #endif   /* INDEX_H */

I don't see this defined anywhere could it be left over from a previous
version of the patch?

In decode.c
DecodeUpdate:
+
+   /*
+    * FIXME: need to get/save the old tuple as well if we want primary key
+    * changes to work.
+    */
+   change->newtuple = ReorderBufferGetTupleBuf(reorder);

I also don't see any code in heap_update to find + save the old primary
key values like you added to heap_delete. You didn't list "Add ability
to change the primary key on an UPDATE" in the TODO so I'm wondering if
I'm missing something. Is there another way I can bet the primary key
values for the old_tuple?

Also,

I think the name of the test contrib module was changed but you didn't
update the make file. This fixes it

diff --git a/contrib/Makefile b/contrib/Makefile
index 1cc30fe..36e6bfe 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -50,7 +50,7 @@ SUBDIRS = \
         tcn     \
         test_parser \
         test_decoding   \
-       test_logical_replication \
+       test_logical_decoding \
         tsearch2    \
         unaccent    \
         vacuumlo    \

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Steve Singer

steve@ssinger.info

almost 13 years ago

In reply to: Andres Freund (#28)

1 attachment(s)

Re: logical changeset generation v4

On 13-01-22 11:30 AM, Andres Freund wrote:

Hi,

I pushed a new rebased version (the xlogreader commit made it annoying
to merge).

The main improvements are
* way much coherent code internally for intializing logical rep
* explicit control over slots
* options for logical replication

Exactly what is the syntax for using that. My reading your changes to
repl_gram.y make me think that any of the following should work (but
they don't).

START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1')
ERROR: syntax error: unexpected character "("

"START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1' 'val1')
ERROR: syntax error: unexpected character "("

START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1','opt2')
ERROR: syntax error: unexpected character "("

I'm also attaching a patch to pg_receivellog that allows you to specify
these options on the command line. I'm not saying I think that it is
appropriate to be adding more bells and whistles to the utilities two
weeks into the CF but I found this useful for testing so I'm sharing it.

Show quoted text

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-allow-pg_receivellog-to-pass-plugin-options-from-the.patchtext/x-patch; name=0001-allow-pg_receivellog-to-pass-plugin-options-from-the.patchDownload

>From 176087bacec6cbf0b86e4ffeb918f41b4a5b8d7a Mon Sep 17 00:00:00 2001
From: Steve Singer <ssinger@ca.afilias.info>
Date: Sun, 27 Jan 2013 12:24:33 -0500
Subject: [PATCH] allow pg_receivellog to pass plugin options from the command line to the plugin

---
 src/bin/pg_basebackup/pg_receivellog.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c
index 04bedbe..30b3cea 100644
--- a/src/bin/pg_basebackup/pg_receivellog.c
+++ b/src/bin/pg_basebackup/pg_receivellog.c
@@ -54,7 +54,7 @@ static XLogRecPtr	startpos;
 static bool do_init_slot = false;
 static bool do_start_slot = false;
 static bool do_stop_slot = false;
-
+static const char * plugin_opts="";
 
 static void usage(void);
 static void StreamLog();
@@ -84,6 +84,7 @@ usage(void)
 	printf(_("  -s, --status-interval=INTERVAL\n"
 			 "                         time between status packets sent to server (in seconds)\n"));
 	printf(_("  -S, --slot=SLOT        use existing replication slot SLOT instead of starting a new one\n"));
+	printf(_("  -o --options=OPTIONS   A comma separated list of options to the plugin\n"));
 	printf(_("\nAction to be performed:\n"));
 	printf(_("      --init             initiate a new replication slot (for the slotname see --slot)\n"));
 	printf(_("      --start            start streaming in a replication slot (for the slotname see --slot)\n"));
@@ -264,8 +265,8 @@ StreamLog(void)
 				slot);
 
 	/* Initiate the replication stream at specified location */
-	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X",
-			 slot, (uint32) (startpos >> 32), (uint32) startpos);
+	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X (%s)",
+			 slot, (uint32) (startpos >> 32), (uint32) startpos,plugin_opts);
 	res = PQexec(conn, query);
 	if (PQresultStatus(res) != PGRES_COPY_BOTH)
 	{
@@ -560,6 +561,7 @@ main(int argc, char **argv)
 		{"init", no_argument, NULL, 1},
 		{"start", no_argument, NULL, 2},
 		{"stop", no_argument, NULL, 3},
+		{"options",required_argument,NULL,'o'},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -584,7 +586,7 @@ main(int argc, char **argv)
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:",
+	while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:o:",
 							long_options, &option_index)) != -1)
 	{
 		switch (c)
@@ -659,6 +661,10 @@ main(int argc, char **argv)
 			case 3:
 				do_stop_slot = true;
 				break;
+			case 'o':
+				if(optarg != NULL)
+					plugin_opts = pg_strdup(optarg);
+				break;
 /* action */
 
 			default:
-- 
1.7.0.4

#52

Steve Singer

steve@ssinger.info

almost 13 years ago

In reply to: Steve Singer (#41)

1 attachment(s)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 13-01-24 11:15 AM, Steve Singer wrote:

On 13-01-24 06:40 AM, Andres Freund wrote:

Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?

I was able to get something that generated output for INSERT
statements in a format similar to what a modified slony apply trigger
would want. This was with the list of tables to replicate hard-coded
in the plugin. This was with the patchset from the last commitfest.I
had gotten a bit hung up on the UPDATE and DELETE support because
slony allows you to use an arbitrary user specified unique index as
your key. It looks like better support for tables with a unique
non-primary key is in the most recent patch set. I am hoping to have
time this weekend to update my plugin to use parameters passed in on
the init and other updates in the most recent version. If I make some
progress I will post a link to my progress at the end of the weekend.
My big issue is that I have limited time to spend on this.

A few more comments;

In decode.c DecodeDelete

+   if (r->xl_len <= (SizeOfHeapDelete + SizeOfHeapHeader))
+   {
+       elog(DEBUG2, "huh, no primary key for a delete on wal_level = 
logical?");
+       return;
+   }
+

I think we should be passing delete's with candidate key data logged to
the plugin. If the table isn't a replicated table then ignoring the
delete is fine. If the table is a replicated table but someone has
deleted the unique index from the table then the plugin will receive
INSERT changes on the table but not DELETE changes. If this happens the
plugin would have any way of knowing that it is missing delete changes.
If my plugin gets passed a DELETE change record but with no key data
then my plugin could do any of
1. Start screaming for help (ie log errors)
2. Drop the table from replication
3. Pass the delete (with no key values) onto the replication client and
let it deal with it (see 1 and 2)

Also, 'huh' isn't one of our standard log message phrases :)

How do you plan on dealing with sequences?
I don't see my plugin being called on sequence changes and I don't see
XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer. Is there a reason
why this can't be easily added?

Also what do we want to do about TRUNCATE support. I could always leave
a TRUNCATE trigger in place that logged the truncate to a sl_truncates
and have my replication daemon respond to the insert on a sl_truncates
table by actually truncating the data on the replica.

I've spent some time this weekend updating my prototype plugin that
generates slony 2.2 style COPY output. I have attached my progress here
(also https://github.com/ssinger/slony1-engine/tree/logical_repl). I
have not gotten as far as modifying slon to act as a logical log
receiver, or made a version of the slony apply trigger that would
process these changes. I haven't looked into the details of what is
involved in setting up a subscription with the snapshot exporting.

I couldn't get the options on the START REPLICATION command to parse so
I just hard coded some list building code in the init method. I do plan
on pasing the list of tables to replicate from the replica to the plugin
(because this list comes from the replica). Passing what could be a
few thousand table names as a list of arguments is a bit ugly and I
admit my list processing code is rough. Does this make us want to
reconsider the format of the option_list ?

I guess should provide an opinion on if I think that the patch in this
CF, if committed could be used to act as a source for slony instead of
the log trigger.

The biggest missing piece I mentioned in my email yesterday, that we
aren't logging the old primary key on row UPDATEs. I don't see building
a credible replication system where you don't allow users to update any
column of a row.

The other issues I've raised (DecodeDelete hiding bad deletes,
replication options not parsing for me) look like easy fixes

no wal decoding support for sequences or truncate are things that I
could work around by doing things much like slony does today. The SYNC
can still capture the sequence changes in a table (where the INSERT's
would be logged) and I can have a trigger capture truncates.

I mostly did this review from the point of view of someone trying to use
the feature, I haven't done a line-by-line review of the code.

I suspect Andres can address these issues and get an updated patch out
during this CF. I think a more detailed code review by someone more
familiar with postgres internals will reveal a handful of other issues
that hopefully can be fixed without a lot of effort. If this were the
only patch in the commitfest I would encourage Andres to push to get
these changes done. If the standard for CF4 is that a patch needs to be
basically in a commitable state at the start of the CF, other than minor
issues, then I don't think this patch meets that bar. In a few more
weeks from now, with a handful of more updates and re-reviews it might.
If we give everyone in the CF that much time to get their patches into a
committable state then I think the CF will drag on until April or even
May and we might not see 9.3 released until close to Christmas (4
patches so far have been rejected or returned with feedback, 51 need
reviewer or committer attention) . I'm not sure I have a huge problem
with that but I don't think it is what was agreed to in the developer
meeting last May.

If this patch is going to get bumped to 9.4 I really hope that someone
with good knowledge of the internals (ie a committer) can give this
patch a good review sooner rather than later. If there are issues
Andres has overlooked that are more serious or complicated to fix I
would like to see them raised before the next CF in June.

Steve

Show quoted text

BTW, why does all the transaction reordering stuff has to be in core?

It didn't use to, but people argued pretty damned hard that no undecoded
data should ever allowed to leave the postgres cluster. And to be fair
it makes writing an output plugin *way* much easier. Check
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4

If you skip over tuple_to_stringinfo(), which is just pretty generic
scaffolding for converting a whole tuple to a string, writing out the
changes in some format by now is pretty damn simple.

I think we will find that the replication systems won't be the only
users of this feature. I have often seen systems that have a logging
requirement for auditing purposes or to log then reconstruct the
sequence of changes made to a set of tables in order to feed a
downstream application. Triggers and a journaling table are the
traditional way of doing this but it should be pretty easy to write a
plugin to accomplish the same thing that should give better
performance. If the reordering stuff wasn't in core this would be
much harder.

How much of this infrastructure is to support replicating DDL
changes? IOW,
if we drop that requirement, how much code can we slash?

Unfortunately I don't think too much unless we add in other code that
allows us to check whether the current definition of a table is still
the same as it was back when the tuple was logged.

Any other features or requirements that could be dropped? I think
it's clear at this stage that
this patch is not going to be committed as it is. If you can reduce
it to a
fraction of what it is now, that fraction might have a chance.
Otherwise,
it's just going to be pushed to the next commitfest as whole, and we're
going to be having the same doubts and discussions then.

One thing that reduces complexity is to declare the following as
unsupported:
- CREATE TABLE foo(data text);
- DECODE UP TO HERE;
- INSERT INTO foo(data)
VALUES(very-long-to-be-externally-toasted-tuple);
- DROP TABLE foo;
- DECODE UP TO HERE;

but thats just a minor thing.

I think what we can do more realistically than to chop of required parts
of changeset extraction is to start applying some of the preliminary
patches independently:
- the relmapper/relfilenode changes + pg_relation_by_filenode(spc,
relnode) should be independently committable if a bit boring
- allowing walsenders to connect to a database possibly needs an
interface change
but otherwise it should be fine to go in independently. It also has
other potential use-cases, so I think thats fair.
- logging xl_running_xact's more frequently could also be committed
independently and makes sense independently as it allows a standby to
enter HS faster if the master is busy
- Introducing InvalidCommandId should be relatively uncontroversial. The
fact that no invalid value for command ids exists is imo an oversight
- the *Satisfies change could be applied and they are imo ready but
there's no use-case for it without the rest, so I am not sure whether
theres a point
- currently not separately available, but we could add wal_level=logical
independently. There would be no user of it, but it would be partial
work. That includes the relcache support for keeping track of the
primary key which already is available separately.

Greetings,

Andres Freund

#53

Heikki Linnakangas

hlinnakangas@vmware.com

almost 13 years ago

In reply to: Andres Freund (#32)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 24.01.2013 00:30, Andres Freund wrote:

Also, while the apply side surely isn't benchmarkable without any being
submitted, the changeset generation can very well be benchmarked.

A very, very adhoc benchmark:
-c max_wal_senders=10
-c max_logical_slots=10 --disabled for anything but logical
-c wal_level=logical --hot_standby for anything but logical
-c checkpoint_segments=100
-c log_checkpoints=on
-c shared_buffers=512MB
-c autovacuum=on
-c log_min_messages=notice
-c log_line_prefix='[%p %t] '
-c wal_keep_segments=100
-c fsync=off
-c synchronous_commit=off

pgbench -p 5440 -h /tmp -n -M prepared -c 16 -j 16 -T 30

pgbench upstream:
tps: 22275.941409
space overhead: 0%
pgbench logical-submitted
tps: 16274.603046
space overhead: 2.1%
pgbench logical-HEAD (will submit updated version tomorrow or so):
tps: 20853.341551
space overhead: 2.3%
pgbench single plpgsql trigger (INSERT INTO log(data) VALUES(NEW::text))
tps: 14101.349535
space overhead: 369%

Note that in the single trigger case nobody consumed the queue while the
logical version streamed the changes out and stored them to disk.

That makes the space overhead comparison completely worthless, no? I
would expect the trigger-based approach to generate roughly 100% more
WAL, not close to 400%. As long as the queue is drained constantly,
there should be no big difference in the disk space used, except for the
WAL.

Adding a default NOW() or similar to the tables immediately makes
logical decoding faster by a factor of about 3 in comparison to the
above trivial trigger.

Hmm, is that because of the conversion to text? I believe slony also
converts all the values to text in the trigger, because that's simple
and flexible, but if we're trying to compare the performance of logical
changeset generation vs. trigger-based replication in general, we should
choose the most efficient trigger-based scheme to compare with. That
means, don't convert to text. And write the trigger in C.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Steve Singer (#50)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-26 16:20:33 -0500, Steve Singer wrote:

On 13-01-24 11:15 AM, Steve Singer wrote:

On 13-01-24 06:40 AM, Andres Freund wrote:

Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?

I was able to get something that generated output for INSERT statements in
a format similar to what a modified slony apply trigger would want. This
was with the list of tables to replicate hard-coded in the plugin. This
was with the patchset from the last commitfest.I had gotten a bit hung up
on the UPDATE and DELETE support because slony allows you to use an
arbitrary user specified unique index as your key. It looks like better
support for tables with a unique non-primary key is in the most recent
patch set. I am hoping to have time this weekend to update my plugin to
use parameters passed in on the init and other updates in the most recent
version. If I make some progress I will post a link to my progress at the
end of the weekend. My big issue is that I have limited time to spend on
this.

This isn't a complete review just a few questions I've hit so far that I
thought I'd ask to see if I'm not seeing something related to updates.

+ extern void relationFindPrimaryKey(Relation pkrel, Oid *indexOid,
+                                    int16 *nratts, int16 *attnums, Oid
*atttypids,
+                                    Oid *opclasses);
+
I don't see this defined anywhere could it be left over from a previous
version of the patch?

Yes, its dead and now gone.

In decode.c
DecodeUpdate:
+
+   /*
+    * FIXME: need to get/save the old tuple as well if we want primary key
+    * changes to work.
+    */
+   change->newtuple = ReorderBufferGetTupleBuf(reorder);
I also don't see any code in heap_update to find + save the old primary key
values like you added to heap_delete. You didn't list "Add ability to
change the primary key on an UPDATE" in the TODO so I'm wondering if I'm
missing something. Is there another way I can bet the primary key values
for the old_tuple?

Nope, there isn't any right now. I have considered as something not all
that interesting for real-world usecases based on my experience, but
adding support shouldn't be that hard anymore, so I can just bite the
bullet...

I think the name of the test contrib module was changed but you didn't
update the make file. This fixes it

Yea, I had forgotten to add that hunk when committing. Fixed.

Thanks,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Steve Singer (#52)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

Hi,

On 2013-01-27 23:07:51 -0500, Steve Singer wrote:

A few more comments;

In decode.c DecodeDelete

+   if (r->xl_len <= (SizeOfHeapDelete + SizeOfHeapHeader))
+   {
+       elog(DEBUG2, "huh, no primary key for a delete on wal_level =
logical?");
+       return;
+   }
+

I think we should be passing delete's with candidate key data logged to the
plugin. If the table isn't a replicated table then ignoring the delete is
fine. If the table is a replicated table but someone has deleted the unique
index from the table then the plugin will receive INSERT changes on the
table but not DELETE changes. If this happens the plugin would have any way
of knowing that it is missing delete changes. If my plugin gets passed a
DELETE change record but with no key data then my plugin could do any of

I basically didn't do that because I thought people would forget to
check whether oldtuple is empty I have no problem with addind support
for that though.

1. Start screaming for help (ie log errors)

Yes.

2. Drop the table from replication

No, you can't write from an output plugin, and I don't immediately see
support for that comming. There's no fundamental blockers, just makes
things more complicated.

3. Pass the delete (with no key values) onto the replication client and let
it deal with it (see 1 and 2)

Hm.

While I agree that nicer behaviour would be good I think the real
enforcement should happen on a higher level, e.g. with event triggers or
somesuch. It seems way too late to do anything about it when we're
already decoding. The transaction will already have committed...

Also, 'huh' isn't one of our standard log message phrases :)

You're right there ;). I bascially wanted to remove the log message
almost instantly but it was occasionally useful so I kept it arround...

How do you plan on dealing with sequences?
I don't see my plugin being called on sequence changes and I don't see
XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer. Is there a reason why
this can't be easily added?

I basically was hoping for Simon's sequence-am to get in before doing
anything real here. That didn't really happen yet.
I am not sure whether there's a real usecase in decoding normal
XLOG_SEQ_LOG records, their content isn't all that easy to interpet
unless youre rather familiar with pg's innards.

So, adding support wouldn't hard from a technical pov but it seems the
semantics are a bit hard to nail down.

Also what do we want to do about TRUNCATE support. I could always leave a
TRUNCATE trigger in place that logged the truncate to a sl_truncates and
have my replication daemon respond to the insert on a sl_truncates table
by actually truncating the data on the replica.

I have planned to add some generic "table_rewrite" handling, but I have
to admit I haven't thought too much about it yet. Currently if somebody
rewrites a table, e.g. with an ALTER ... ADD COLUMN .. DEFAULT .. or
ALTER COLUMN ... USING ..., you will see INSERTs into a temporary
table. That basically seems to be a good thing, but the user needs to be
told about that ;)

I've spent some time this weekend updating my prototype plugin that
generates slony 2.2 style COPY output. I have attached my progress here
(also https://github.com/ssinger/slony1-engine/tree/logical_repl). I have
not gotten as far as modifying slon to act as a logical log receiver, or
made a version of the slony apply trigger that would process these
changes.

I only gave it a quick look and have a couple of questions and
remarks. The way you used the options it looks like youre thinking of
specifying all the tables as options? I would have thought those would
get stored & queried locally and only something like the 'replication
set' name or such would be set as an option.

Iterating over a list with
for(i = 0; i < options->length; i= i + 2 )
{
DefElem * def_schema = (DefElem*) list_nth(options,i);
is not a good idea btw, thats quadratic in complexity ;)

In the REORDER_BUFFER_CHANGE_UPDATE I suggest using
relation->rd_primary, just as in the DELETE case, that should always
give you a consistent candidate key in an efficient manner.

I haven't looked into the details of what is involved in setting up a
subscription with the snapshot exporting.

That hopefully shouldn't be too hard... At least thats the idea :P

I couldn't get the options on the START REPLICATION command to parse so I
just hard coded some list building code in the init method. I do plan on
pasing the list of tables to replicate from the replica to the plugin
(because this list comes from the replica). Passing what could be a few
thousand table names as a list of arguments is a bit ugly and I admit my
list processing code is rough. Does this make us want to reconsider the
format of the option_list ?

Yea, something's screwed up there, sorry. Will push a fix later today.

I guess should provide an opinion on if I think that the patch in this CF,
if committed could be used to act as a source for slony instead of the log
trigger.

The biggest missing piece I mentioned in my email yesterday, that we aren't
logging the old primary key on row UPDATEs. I don't see building a credible
replication system where you don't allow users to update any column of a
row.

Ok, I really thought this wouldn't be that much of an issue in a first
version, but if you think its important, I'll add support for
it. Shouldn't be too hard.

The other issues I've raised (DecodeDelete hiding bad deletes, replication
options not parsing for me) look like easy fixes

no wal decoding support for sequences or truncate are things that I could
work around by doing things much like slony does today. The SYNC can still
capture the sequence changes in a table (where the INSERT's would be
logged) and I can have a trigger capture truncates.

Could you explan a bit what's being done there in slony?

If this patch is going to get bumped to 9.4 I really hope that someone with
good knowledge of the internals (ie a committer) can give this patch a good
review sooner rather than later. If there are issues Andres has overlooked
that are more serious or complicated to fix I would like to see them raised
before the next CF in June.

Absolutely seconded. I *really* would love to see a more technical
review, its hard to see issues after spending that much time in a
certain worldview...

Thanks!

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Steve Singer (#51)

Re: logical changeset generation v4

On 2013-01-27 12:28:21 -0500, Steve Singer wrote:

On 13-01-22 11:30 AM, Andres Freund wrote:

Hi,

I pushed a new rebased version (the xlogreader commit made it annoying
to merge).

The main improvements are
* way much coherent code internally for intializing logical rep
* explicit control over slots
* options for logical replication

Exactly what is the syntax for using that. My reading your changes to
repl_gram.y make me think that any of the following should work (but they
don't).

START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1')
ERROR: syntax error: unexpected character "("

"START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1' 'val1')
ERROR: syntax error: unexpected character "("

START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1','opt2')
ERROR: syntax error: unexpected character "("

The syntax is right, the grammar (or rather scanner) support is a bit
botched, will push a new version soon.

I'm also attaching a patch to pg_receivellog that allows you to specify
these options on the command line. I'm not saying I think that it is
appropriate to be adding more bells and whistles to the utilities two weeks
into the CF but I found this useful for testing so I'm sharing it.

The CF is also there to find UI warts and such, so something like this
seems perfectly fine. Even moreso as it doesn't look this will get into
9.3 anyway.

I wanted to add such an option, but I was too lazy^Wbusy to think about
the sematics. Your current syntax doesn't really allow arguments to be
specified in a nice way.
I was thinking of -o name=value and allowing multiple specifications of
-o to build the option string.

Any arguments against that?

/* Initiate the replication stream at specified location */
-	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X",
-			 slot, (uint32) (startpos >> 32), (uint32) startpos);
+	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X (%s)",
+			 slot, (uint32) (startpos >> 32), (uint32) startpos,plugin_opts);

ISTM that (%s) shouldn't be specified when there are no options, but as
the options need to be pre-escaped anyway, that looks like a non-problem
in a bit more complete implementation.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Heikki Linnakangas (#53)

Re: Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-28 11:59:52 +0200, Heikki Linnakangas wrote:

On 24.01.2013 00:30, Andres Freund wrote:

Also, while the apply side surely isn't benchmarkable without any being
submitted, the changeset generation can very well be benchmarked.

A very, very adhoc benchmark:
-c max_wal_senders=10
-c max_logical_slots=10 --disabled for anything but logical
-c wal_level=logical --hot_standby for anything but logical
-c checkpoint_segments=100
-c log_checkpoints=on
-c shared_buffers=512MB
-c autovacuum=on
-c log_min_messages=notice
-c log_line_prefix='[%p %t] '
-c wal_keep_segments=100
-c fsync=off
-c synchronous_commit=off

pgbench -p 5440 -h /tmp -n -M prepared -c 16 -j 16 -T 30

pgbench upstream:
tps: 22275.941409
space overhead: 0%
pgbench logical-submitted
tps: 16274.603046
space overhead: 2.1%
pgbench logical-HEAD (will submit updated version tomorrow or so):
tps: 20853.341551
space overhead: 2.3%
pgbench single plpgsql trigger (INSERT INTO log(data) VALUES(NEW::text))
tps: 14101.349535
space overhead: 369%

Note that in the single trigger case nobody consumed the queue while the
logical version streamed the changes out and stored them to disk.

That makes the space overhead comparison completely worthless, no? I would
expect the trigger-based approach to generate roughly 100% more WAL, not
close to 400%. As long as the queue is drained constantly, there should be
no big difference in the disk space used, except for the WAL.

Imo its a valid comparison as all such queues can only be drained in a
rather imperfect manner. I think these days all solutions use multiple
(two) queue tables and switch between those and truncate the non-active
one as vacuuming them works far too unreliable.
And those tables have to be plain logged once, so they matter in
checkpoints et al.

Adding a default NOW() or similar to the tables immediately makes
logical decoding faster by a factor of about 3 in comparison to the
above trivial trigger.

Hmm, is that because of the conversion to text? I believe slony also
converts all the values to text in the trigger, because that's simple and
flexible, but if we're trying to compare the performance of logical
changeset generation vs. trigger-based replication in general, we should
choose the most efficient trigger-based scheme to compare with. That means,
don't convert to text. And write the trigger in C.

Imo its basically impossible for the current queue-based solutions not
to convert to text because they otherwise would need to queue all the
conversion information as well. And the the test_decoding plugin also
converts everything to text, so thats a fair comparison from that
POV. In fact the test_decoding plugin does noticeably more as it also
outputs table, column and type name.

I aggree on the C argument. I really doubt its going to make that much
of a difference but we should try it.
In my experience a plpgsql trigger that just does a straight conversion
via cast is still noticeably faster than any of the "real" replication
triggers out there though, so I wouldn't expect much there.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Steve Singer

steve@ssinger.info

almost 13 years ago

In reply to: Andres Freund (#55)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 13-01-28 06:17 AM, Andres Freund wrote:

Hi,

3. Pass the delete (with no key values) onto the replication client and let
it deal with it (see 1 and 2)
Hm.

While I agree that nicer behaviour would be good I think the real
enforcement should happen on a higher level, e.g. with event triggers or
somesuch. It seems way too late to do anything about it when we're
already decoding. The transaction will already have committed...

Ideally the first line of enforcement would be with event triggers. The
thing with user-level mechanisms for enforcing things is that they
sometimes can be disabled or by-passed. I don't have a lot of sympathy
for people who do this but I like the idea of at least having the option
coding defensively to detect the situation and whine to the user.

How do you plan on dealing with sequences?
I don't see my plugin being called on sequence changes and I don't see
XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer. Is there a reason why
this can't be easily added?

I basically was hoping for Simon's sequence-am to get in before doing
anything real here. That didn't really happen yet.
I am not sure whether there's a real usecase in decoding normal
XLOG_SEQ_LOG records, their content isn't all that easy to interpet
unless youre rather familiar with pg's innards.

So, adding support wouldn't hard from a technical pov but it seems the
semantics are a bit hard to nail down.

Also what do we want to do about TRUNCATE support. I could always leave a
TRUNCATE trigger in place that logged the truncate to a sl_truncates and
have my replication daemon respond to the insert on a sl_truncates table
by actually truncating the data on the replica.

I have planned to add some generic "table_rewrite" handling, but I have
to admit I haven't thought too much about it yet. Currently if somebody
rewrites a table, e.g. with an ALTER ... ADD COLUMN .. DEFAULT .. or
ALTER COLUMN ... USING ..., you will see INSERTs into a temporary
table. That basically seems to be a good thing, but the user needs to be
told about that ;)

I've spent some time this weekend updating my prototype plugin that
generates slony 2.2 style COPY output. I have attached my progress here
(also https://github.com/ssinger/slony1-engine/tree/logical_repl). I have
not gotten as far as modifying slon to act as a logical log receiver, or
made a version of the slony apply trigger that would process these
changes.

I only gave it a quick look and have a couple of questions and
remarks. The way you used the options it looks like youre thinking of
specifying all the tables as options? I would have thought those would
get stored & queried locally and only something like the 'replication
set' name or such would be set as an option.

The way slony works today is that the list of tables to pull for a SYNC
comes from the subscriber because the subscriber might be behind the
provider, where a table has been removed from the set in the meantime.
The subscriber still needs to receive data from that table until it is
caught up to the point where that removal happens.

Having a time-travelled version of a user table (sl_table) might fix
that problem but I haven't yet figured out how that needs to work with
cascading (since that is a feature of slony today I can't ignore the
problem). I'm also not sure how that will work with table renames.
Today if the user renames a table inside of an EXECUTE SCRIPT slony will
update the name of the table in sl_table. This type of change wouldn't
be visible (yet) in the time-travelled catalog. There might be a
solution to this yet but I haven't figured out it. Sticking with what
slony does today seemed easier as a first step.

Iterating over a list with
for(i = 0; i < options->length; i= i + 2 )
{
DefElem * def_schema = (DefElem*) list_nth(options,i);
is not a good idea btw, thats quadratic in complexity ;)

Thanks I'll rewrite this to walk a list of ListCell objects with next.

In the REORDER_BUFFER_CHANGE_UPDATE I suggest using
relation->rd_primary, just as in the DELETE case, that should always
give you a consistent candidate key in an efficient manner.

I haven't looked into the details of what is involved in setting up a
subscription with the snapshot exporting.

That hopefully shouldn't be too hard... At least thats the idea :P

I couldn't get the options on the START REPLICATION command to parse so I
just hard coded some list building code in the init method. I do plan on
pasing the list of tables to replicate from the replica to the plugin
(because this list comes from the replica). Passing what could be a few
thousand table names as a list of arguments is a bit ugly and I admit my
list processing code is rough. Does this make us want to reconsider the
format of the option_list ?

Yea, something's screwed up there, sorry. Will push a fix later today.

I guess should provide an opinion on if I think that the patch in this CF,
if committed could be used to act as a source for slony instead of the log
trigger.
The biggest missing piece I mentioned in my email yesterday, that we aren't
logging the old primary key on row UPDATEs. I don't see building a credible
replication system where you don't allow users to update any column of a
row.

Ok, I really thought this wouldn't be that much of an issue in a first
version, but if you think its important, I'll add support for
it. Shouldn't be too hard.

If your using non-surragate /natural primary keys this tends to come up
occasionally due to data-entry errors or renames. I'm looking at this
from the point of view of what do I need to use this as a source for a
production replication system with fewer sharp-edges compared to trigger
source slony. My standard is a bit higher than 'first' version because
I intent to use it in the version 3.0 of slony not 1.0. If others feel
I'm asking for too much they should speak up, maybe I am. Also the way
things will fail if someone were to try and update a primary key value
is pretty nasty (it will leave them with inconsistent databases). We
could install UPDATE triggers to try and detect this type of thing but
I'd rather see us just log the old values so we can use them during replay.

The other issues I've raised (DecodeDelete hiding bad deletes, replication
options not parsing for me) look like easy fixes

no wal decoding support for sequences or truncate are things that I could
work around by doing things much like slony does today. The SYNC can still
capture the sequence changes in a table (where the INSERT's would be
logged) and I can have a trigger capture truncates.

Could you explan a bit what's being done there in slony?

Each time the slon connects to the local database to create a SYNC
event, which is when slony captures snapshot visiblity information, it
also gets also looks at all of the replicated sequences and finds any
that have changed since the last sync The values sequence values as of
the last SYNC are stored in memory. Any sequences that have changed get
there new values written to the table sl_seqlog. When slon applies row
updates for a SYNC it also updates (setval) on any sequences that have
changed.

For truncates the truncate trigger just logs a single row into sl_log
indicating that the table has been truncated. When slon encounters a
row of operation 'TRUNCATE' it executes a TRUNCATE ONLY on the table.

If this patch is going to get bumped to 9.4 I really hope that someone with
good knowledge of the internals (ie a committer) can give this patch a good
review sooner rather than later. If there are issues Andres has overlooked
that are more serious or complicated to fix I would like to see them raised
before the next CF in June.

Absolutely seconded. I *really* would love to see a more technical
review, its hard to see issues after spending that much time in a
certain worldview...

Thanks!

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

Steve Singer

steve@ssinger.info

almost 13 years ago

In reply to: Andres Freund (#56)

Re: logical changeset generation v4

On 13-01-28 06:23 AM, Andres Freund wrote:

The CF is also there to find UI warts and such, so something like this
seems perfectly fine. Even moreso as it doesn't look this will get
into 9.3 anyway. I wanted to add such an option, but I was too
lazy^Wbusy to think about the sematics. Your current syntax doesn't
really allow arguments to be specified in a nice way. I was thinking
of -o name=value and allowing multiple specifications of -o to build
the option string. Any arguments against that?

Multiple -o options sound fine to me.

/* Initiate the replication stream at specified location */
-	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X",
-			 slot, (uint32) (startpos >> 32), (uint32) startpos);
+	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X (%s)",
+			 slot, (uint32) (startpos >> 32), (uint32) startpos,plugin_opts);
ISTM that (%s) shouldn't be specified when there are no options, but as
the options need to be pre-escaped anyway, that looks like a non-problem
in a bit more complete implementation.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Andres Freund (#56)

Re: logical changeset generation v4

On 2013-01-28 12:23:02 +0100, Andres Freund wrote:

On 2013-01-27 12:28:21 -0500, Steve Singer wrote:

On 13-01-22 11:30 AM, Andres Freund wrote:

Hi,

I pushed a new rebased version (the xlogreader commit made it annoying
to merge).

The main improvements are
* way much coherent code internally for intializing logical rep
* explicit control over slots
* options for logical replication

Exactly what is the syntax for using that. My reading your changes to
repl_gram.y make me think that any of the following should work (but they
don't).

START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1')
ERROR: syntax error: unexpected character "("

"START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1' 'val1')
ERROR: syntax error: unexpected character "("

START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1','opt2')
ERROR: syntax error: unexpected character "("

The syntax is right, the grammar (or rather scanner) support is a bit
botched, will push a new version soon.

Pushed and rebased some minutes ago. I changed the syntax so that slot
names, plugins, and option names are identifiers and behave just as in
normal sql identifier. That means ' need to be changed to ".

The new version is rebased ontop of fklocks, walsender et al, which was
a bit of work but actually makes more comprehensive logging in
heap_update easier. That will come tomorrow.

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Andres Freund

andres@2ndquadrant.com

almost 13 years ago

In reply to: Steve Singer (#58)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-28 16:55:52 -0500, Steve Singer wrote:

If your using non-surragate /natural primary keys this tends to come up
occasionally due to data-entry errors or renames. I'm looking at this from
the point of view of what do I need to use this as a source for a production
replication system with fewer sharp-edges compared to trigger source slony.
My standard is a bit higher than 'first' version because I intent to use it
in the version 3.0 of slony not 1.0. If others feel I'm asking for too much
they should speak up, maybe I am. Also the way things will fail if someone
were to try and update a primary key value is pretty nasty (it will leave
them with inconsistent databases). We could install UPDATE triggers to
try and detect this type of thing but I'd rather see us just log the old
values so we can use them during replay.

I pushed support for this. I am not yet 100% happy with this due to two
issues:

* it increases the xlog size logged by heap_update by 2 bytes even with
wal_level < logical as it uses a variant of xl_heap_header that
includes its lenght. Conditionally using xl_heap_header would make the
code even harder to read. Is that acceptable?
* multi_insert should be converted to use xl_heap_header_len as well,
instead of using xl_multi_insert_tuple, that would also reduce the
amount of multi-insert specific code in decode.c
* both for update and delete we should denote more explicitly that
->oldtuple points to an index tuple, not to an full tuple

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62

Robert Haas

robertmhaas@gmail.com

almost 13 years ago

In reply to: Andres Freund (#61)

Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On Sat, Feb 2, 2013 at 4:38 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-01-28 16:55:52 -0500, Steve Singer wrote:

If your using non-surragate /natural primary keys this tends to come up
occasionally due to data-entry errors or renames. I'm looking at this from
the point of view of what do I need to use this as a source for a production
replication system with fewer sharp-edges compared to trigger source slony.
My standard is a bit higher than 'first' version because I intent to use it
in the version 3.0 of slony not 1.0. If others feel I'm asking for too much
they should speak up, maybe I am. Also the way things will fail if someone
were to try and update a primary key value is pretty nasty (it will leave
them with inconsistent databases). We could install UPDATE triggers to
try and detect this type of thing but I'd rather see us just log the old
values so we can use them during replay.

I pushed support for this. I am not yet 100% happy with this due to two
issues:

* it increases the xlog size logged by heap_update by 2 bytes even with
wal_level < logical as it uses a variant of xl_heap_header that
includes its lenght. Conditionally using xl_heap_header would make the
code even harder to read. Is that acceptable?

I think it's important to avoid adding to DML WAL volume when
wal_level < logical. I am not positive that 2 bytes is noticeable,
but I'm not positive that it isn't either: heap insert/update must be
our most commonly-used WAL records. On the other hand, we also need
to keep in mind that branches in hot code paths aren't free either. I
would be concerned more about the increased run-time cost of
constructing the correct WAL record than with the related code
complexity. None of that code is simple anyway.

* multi_insert should be converted to use xl_heap_header_len as well,
instead of using xl_multi_insert_tuple, that would also reduce the
amount of multi-insert specific code in decode.c
* both for update and delete we should denote more explicitly that
->oldtuple points to an index tuple, not to an full tuple

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers