pg_rewind in contrib

Started by Heikki Linnakangasabout 11 years ago64 messages
#1Heikki Linnakangas
hlinnakangas@vmware.com
1 attachment(s)

Hi,

I'd like to include pg_rewind in contrib. I originally wrote it as an
external project so that I could quickly get it working with the
existing versions, and because I didn't feel it was quite ready for
production use yet. Now, with the WAL format changes in master, it is a
lot more maintainable than before. Many bugs have been fixed since the
first prototypes, and I think it's fairly robust now.

I propose that we include pg_rewind in contrib/ now. Attached is a patch
for that. It just includes the latest sources from the current pg_rewind
repository at https://github.com/vmware/pg_rewind. It is released under
the PostgreSQL license.

For those who are not familiar with pg_rewind, it's a tool that allows
repurposing an old master server as a new standby server, after
promotion, even if the old master was not shut down cleanly. That's a
very often requested feature.

- Heikki

Attachments:

pg_rewind-contrib.patchtext/x-diff; name=pg_rewind-contrib.patchDownload
commit 2300e28b0d07328c7b37a92f7150e75edf24b10c
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   Fri Dec 12 16:08:14 2014 +0200

    Add pg_rewind to contrib.

diff --git a/contrib/Makefile b/contrib/Makefile
index 195d447..2fe861f 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -32,6 +32,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_rewind	\
 		pg_standby	\
 		pg_stat_statements \
 		pg_test_fsync	\
diff --git a/contrib/pg_rewind/.gitignore b/contrib/pg_rewind/.gitignore
new file mode 100644
index 0000000..cb50df2
--- /dev/null
+++ b/contrib/pg_rewind/.gitignore
@@ -0,0 +1,32 @@
+# Object files
+*.o
+
+# Libraries
+*.lib
+*.a
+
+# Shared objects (inc. Windows DLLs)
+*.dll
+*.so
+*.so.*
+*.dylib
+
+# Executables
+*.exe
+*.app
+
+# Dependencies
+.deps
+
+# Files generated during build
+/xlogreader.c
+
+# Binaries
+/pg_rewind
+
+# Generated by test suite
+/tmp_check/
+/regression.diffs
+/regression.out
+/results/
+/regress_log/
diff --git a/contrib/pg_rewind/Makefile b/contrib/pg_rewind/Makefile
new file mode 100644
index 0000000..d50a8cf
--- /dev/null
+++ b/contrib/pg_rewind/Makefile
@@ -0,0 +1,47 @@
+# Makefile for pg_rewind
+#
+# Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+#
+
+PGFILEDESC = "pg_rewind - repurpose an old master server as standby"
+PGAPPICON = win32
+
+PROGRAM = pg_rewind
+OBJS	= pg_rewind.o parsexlog.o xlogreader.o util.o datapagemap.o timeline.o \
+	fetch.o copy_fetch.o libpq_fetch.o filemap.o
+
+REGRESS = basictest extrafiles databases
+REGRESS_OPTS=--use-existing --launcher=./launcher
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS = $(libpq_pgport)
+
+override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+
+EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c
+
+all: pg_rewind
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_rewind
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/%
+	rm -f $@ && $(LN_S) $< .
+
+check-local:
+	echo "Running tests against local data directory, in copy-mode"
+	bindir=$(bindir) TEST_SUITE="local" $(MAKE) installcheck
+
+check-remote:
+	echo "Running tests against a running standby, via libpq"
+	bindir=$(bindir) TEST_SUITE="remote" $(MAKE) installcheck
+
+check-both: check-local check-remote
diff --git a/contrib/pg_rewind/README b/contrib/pg_rewind/README
new file mode 100644
index 0000000..cac6095
--- /dev/null
+++ b/contrib/pg_rewind/README
@@ -0,0 +1,100 @@
+pg_rewind
+=========
+
+pg_rewind is a tool for synchronizing a PostgreSQL data directory with another
+PostgreSQL data directory that was forked from the first one. The result is
+equivalent to rsyncing the first data directory (referred to as the old cluster
+from now on) with the second one (the new cluster). The advantage of pg_rewind
+over rsync is that pg_rewind uses the WAL to determine changed data blocks,
+and does not require reading through all files in the cluster. That makes it
+a lot faster when the database is large and only a small portion of it differs
+between the clusters.
+
+Download
+--------
+
+The latest version of this software can be found on the project website at
+https://github.com/vmware/pg_rewind.
+
+Installation
+------------
+
+Compiling pg_rewind requires the PostgreSQL source tree to be available.
+There are two ways to do that:
+
+1. Put pg_rewind project directory inside PostgreSQL source tree as
+contrib/pg_rewind, and use "make" to compile
+
+or
+
+2. Pass the path to the PostgreSQL source tree to make, in the top_srcdir
+variable: "make USE_PGXS=1 top_srcdir=<path to PostgreSQL source tree>"
+
+In addition, you must have pg_config in $PATH.
+
+The current version of pg_rewind is compatible with PostgreSQL version 9.4.
+
+Usage
+-----
+
+    pg_rewind --target-pgdata=<path> \
+        --source-server=<new server's conn string>
+
+The contents of the old data directory will be overwritten with the new data
+so that after pg_rewind finishes, the old data directory is equal to the new
+one.
+
+pg_rewind expects to find all the necessary WAL files in the pg_xlog
+directories of the clusters. This includes all the WAL on both clusters
+starting from the last common checkpoint preceding the fork. Fetching missing
+files from a WAL archive is currently not supported. However, you can copy any
+missing files manually from the WAL archive to the pg_xlog directory.
+
+Regression tests
+----------------
+
+The regression tests can be run separately against, using the libpq or local
+method to copy the files. For local mode, the makefile target is "check-local",
+and for libpq mode, "check-remote". The target check-both runs the tests in
+both modes. For example:
+
+1) Copy code inside PostgreSQL code tree as contrib/pg_rewind, and run:
+    make check-both
+
+2) As an independent module, outside the PostgreSQL source tree:
+    make check-both USE_PGXS=1
+
+Theory of operation
+-------------------
+
+The basic idea is to copy everything from the new cluster to the old cluster,
+except for the blocks that we know to be the same.
+
+1. Scan the WAL log of the old cluster, starting from the last checkpoint before
+the point where the new cluster's timeline history forked off from the old cluster.
+For each WAL record, make a note of the data blocks that were touched. This yields
+a list of all the data blocks that were changed in the old cluster, after the new
+cluster forked off.
+
+2. Copy all those changed blocks from the new cluster to the old cluster.
+
+3. Copy all other files like clog, conf files etc. from the new cluster to old cluster.
+Everything except the relation files.
+
+4. Apply the WAL from the new cluster, starting from the checkpoint created at
+failover. (pg_rewind doesn't actually apply the WAL, it just creates a backup
+label file indicating that when PostgreSQL is started, it will start replay
+from that checkpoint and apply all the required WAL)
+
+Restrictions
+------------
+
+pg_rewind needs that cluster uses either data checksums that can be enabled
+at server initialization with initdb or WAL logging of hint bits that can
+be enabled by settings "wal_log_hints = on" in postgresql.conf.
+
+License
+-------
+
+pg_rewind can be distributed under the BSD-style PostgreSQL license. See
+COPYRIGHT file for more information.
diff --git a/contrib/pg_rewind/copy_fetch.c b/contrib/pg_rewind/copy_fetch.c
new file mode 100644
index 0000000..bea1b09
--- /dev/null
+++ b/contrib/pg_rewind/copy_fetch.c
@@ -0,0 +1,584 @@
+/*-------------------------------------------------------------------------
+ *
+ * copy_fetch.c
+ *	  Functions for copying a PostgreSQL data directory
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/catalog.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "pg_rewind.h"
+#include "fetch.h"
+#include "filemap.h"
+#include "datapagemap.h"
+#include "util.h"
+
+static void recurse_dir(const char *datadir, const char *path,
+			process_file_callback_t callback);
+
+static void execute_pagemap(datapagemap_t *pagemap, const char *path);
+
+static void remove_target_file(const char *path);
+static void create_target_dir(const char *path);
+static void remove_target_dir(const char *path);
+static void create_target_symlink(const char *path, const char *link);
+static void remove_target_symlink(const char *path);
+
+/*
+ * Traverse through all files in a data directory, calling 'callback'
+ * for each file.
+ */
+void
+traverse_datadir(const char *datadir, process_file_callback_t callback)
+{
+	/* should this copy config files or not? */
+	recurse_dir(datadir, NULL, callback);
+}
+
+/*
+ * recursive part of traverse_datadir
+ */
+static void
+recurse_dir(const char *datadir, const char *parentpath,
+			process_file_callback_t callback)
+{
+	DIR		   *xldir;
+	struct dirent *xlde;
+	char		fullparentpath[MAXPGPATH];
+
+	if (parentpath)
+		snprintf(fullparentpath, MAXPGPATH, "%s/%s", datadir, parentpath);
+	else
+		snprintf(fullparentpath, MAXPGPATH, "%s", datadir);
+
+	xldir = opendir(fullparentpath);
+	if (xldir == NULL)
+	{
+		fprintf(stderr, "could not open directory \"%s\": %s\n",
+				fullparentpath, strerror(errno));
+		exit(1);
+	}
+
+	while ((xlde = readdir(xldir)) != NULL)
+	{
+		struct stat fst;
+		char		fullpath[MAXPGPATH];
+		char		path[MAXPGPATH];
+
+		if (strcmp(xlde->d_name, ".") == 0 ||
+			strcmp(xlde->d_name, "..") == 0)
+			continue;
+
+		snprintf(fullpath, MAXPGPATH, "%s/%s", fullparentpath, xlde->d_name);
+
+		if (lstat(fullpath, &fst) < 0)
+		{
+			fprintf(stderr, "warning: could not stat file \"%s\": %s",
+					fullpath, strerror(errno));
+			/*
+			 * This is ok, if the new master is running and the file was
+			 * just removed. If it was a data file, there should be a WAL
+			 * record of the removal. If it was something else, it couldn't
+			 * have been critical anyway.
+			 *
+			 * TODO: But complain if we're processing the target dir!
+			 */
+		}
+
+		if (parentpath)
+			snprintf(path, MAXPGPATH, "%s/%s", parentpath, xlde->d_name);
+		else
+			snprintf(path, MAXPGPATH, "%s", xlde->d_name);
+
+		if (S_ISREG(fst.st_mode))
+			callback(path, FILE_TYPE_REGULAR, fst.st_size, NULL);
+		else if (S_ISDIR(fst.st_mode))
+		{
+			callback(path, FILE_TYPE_DIRECTORY, 0, NULL);
+			/* recurse to handle subdirectories */
+			recurse_dir(datadir, path, callback);
+		}
+		else if (S_ISLNK(fst.st_mode))
+		{
+			char		link_target[MAXPGPATH];
+			ssize_t		len;
+
+			len = readlink(fullpath, link_target, sizeof(link_target) - 1);
+			if (len == -1)
+			{
+				fprintf(stderr, "readlink() failed on \"%s\": %s\n",
+						fullpath, strerror(errno));
+				exit(1);
+			}
+			if (len == sizeof(link_target) - 1)
+			{
+				/* path was truncated */
+				fprintf(stderr, "symbolic link \"%s\" target path too long\n",
+						fullpath);
+				exit(1);
+			}
+
+			callback(path, FILE_TYPE_SYMLINK, 0, link_target);
+
+			/*
+			 * If it's a symlink within pg_tblspc, we need to recurse into it,
+			 * to process all the tablespaces.
+			 */
+			if (strcmp(parentpath, "pg_tblspc") == 0)
+				recurse_dir(datadir, path, callback);
+		}
+	}
+	closedir(xldir);
+}
+
+static int dstfd = -1;
+static char dstpath[MAXPGPATH] = "";
+
+void
+open_target_file(const char *path, bool trunc)
+{
+	int			mode;
+
+	if (dry_run)
+		return;
+
+	if (dstfd != -1 && !trunc &&
+		strcmp(path, &dstpath[strlen(datadir_target) + 1]) == 0)
+		return; /* already open */
+
+	if (dstfd != -1)
+		close_target_file();
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+
+	mode = O_WRONLY | O_CREAT | PG_BINARY;
+	if (trunc)
+		mode |= O_TRUNC;
+	dstfd = open(dstpath, mode, 0600);
+	if (dstfd < 0)
+	{
+		fprintf(stderr, "could not open destination file \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+void
+close_target_file(void)
+{
+	if (close(dstfd) != 0)
+	{
+		fprintf(stderr, "error closing destination file \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+
+	dstfd = -1;
+	/* fsync? */
+}
+
+void
+write_file_range(char *buf, off_t begin, size_t size)
+{
+	int			writeleft;
+	char	   *p;
+
+	if (dry_run)
+		return;
+
+	if (lseek(dstfd, begin, SEEK_SET) == -1)
+	{
+		fprintf(stderr, "could not seek in destination file \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+
+	writeleft = size;
+	p = buf;
+	while (writeleft > 0)
+	{
+		int		writelen;
+
+		writelen = write(dstfd, p, writeleft);
+		if (writelen < 0)
+		{
+			fprintf(stderr, "could not write file \"%s\": %s\n",
+					dstpath, strerror(errno));
+			exit(1);
+		}
+
+		p += writelen;
+		writeleft -= writelen;
+	}
+
+	/* keep the file open, in case we need to copy more blocks in it */
+}
+
+
+/*
+ * Copy a file from source to target, between 'begin' and 'end' offsets.
+ */
+static void
+copy_file_range(const char *path, off_t begin, off_t end, bool trunc)
+{
+	char		buf[BLCKSZ];
+	char		srcpath[MAXPGPATH];
+	int			srcfd;
+
+	snprintf(srcpath, sizeof(srcpath), "%s/%s", datadir_source, path);
+
+	srcfd = open(srcpath, O_RDONLY | PG_BINARY, 0);
+	if (srcfd < 0)
+	{
+		fprintf(stderr, "could not open source file \"%s\": %s\n", srcpath, strerror(errno));
+		exit(1);
+	}
+
+	if (lseek(srcfd, begin, SEEK_SET) == -1)
+	{
+		fprintf(stderr, "could not seek in source file: %s\n", strerror(errno));
+		exit(1);
+	}
+
+	open_target_file(path, trunc);
+
+	while (end - begin > 0)
+	{
+		int		readlen;
+		int		len;
+
+		if (end - begin > sizeof(buf))
+			len = sizeof(buf);
+		else
+			len = end - begin;
+
+		readlen = read(srcfd, buf, len);
+
+		if (readlen < 0)
+		{
+			fprintf(stderr, "could not read file \"%s\": %s\n", srcpath, strerror(errno));
+			exit(1);
+		}
+		else if (readlen == 0)
+		{
+			fprintf(stderr, "unexpected EOF while reading file \"%s\"\n", srcpath);
+			exit(1);
+		}
+
+		write_file_range(buf, begin, readlen);
+		begin += readlen;
+	}
+}
+
+/*
+ * Checks if two file descriptors point to the same file. This is used as
+ * a sanity check, to make sure the user doesn't try to copy a data directory
+ * over itself.
+ */
+void
+check_samefile(int fd1, int fd2)
+{
+	struct stat statbuf1,
+				statbuf2;
+
+	if (fstat(fd1, &statbuf1) < 0)
+	{
+		fprintf(stderr, "fstat failed: %s\n", strerror(errno));
+		exit(1);
+	}
+
+	if (fstat(fd2, &statbuf2) < 0)
+	{
+		fprintf(stderr, "fstat failed: %s\n", strerror(errno));
+		exit(1);
+	}
+
+	if (statbuf1.st_dev == statbuf2.st_dev &&
+		statbuf1.st_ino == statbuf2.st_ino)
+	{
+		fprintf(stderr, "old and new data directory are the same\n");
+		exit(1);
+	}
+}
+
+/*
+ * Copy all relation data files from datadir_source to datadir_target, which
+ * are marked in the given data page map.
+ */
+void
+copy_executeFileMap(filemap_t *map)
+{
+	file_entry_t *entry;
+	int			i;
+
+	for (i = 0; i < map->narray; i++)
+	{
+		entry = map->array[i];
+		execute_pagemap(&entry->pagemap, entry->path);
+
+		switch (entry->action)
+		{
+			case FILE_ACTION_NONE:
+				/* ok, do nothing.. */
+				break;
+
+			case FILE_ACTION_COPY:
+				copy_file_range(entry->path, 0, entry->newsize, true);
+				break;
+
+			case FILE_ACTION_TRUNCATE:
+				truncate_target_file(entry->path, entry->newsize);
+				break;
+
+			case FILE_ACTION_COPY_TAIL:
+				copy_file_range(entry->path, entry->oldsize, entry->newsize, false);
+				break;
+
+			case FILE_ACTION_CREATE:
+				create_target(entry);
+				break;
+
+			case FILE_ACTION_REMOVE:
+				remove_target(entry);
+				break;
+		}
+	}
+
+	if (dstfd != -1)
+		close_target_file();
+}
+
+
+void
+remove_target(file_entry_t *entry)
+{
+	Assert(entry->action == FILE_ACTION_REMOVE);
+
+	switch (entry->type)
+	{
+		case FILE_TYPE_DIRECTORY:
+			remove_target_dir(entry->path);
+			break;
+
+		case FILE_TYPE_REGULAR:
+			remove_target_symlink(entry->path);
+			break;
+
+		case FILE_TYPE_SYMLINK:
+			remove_target_file(entry->path);
+			break;
+	}
+}
+
+void
+create_target(file_entry_t *entry)
+{
+	Assert(entry->action == FILE_ACTION_CREATE);
+
+	switch (entry->type)
+	{
+		case FILE_TYPE_DIRECTORY:
+			create_target_dir(entry->path);
+			break;
+
+		case FILE_TYPE_SYMLINK:
+			create_target_symlink(entry->path, entry->link_target);
+			break;
+
+		case FILE_TYPE_REGULAR:
+			/* can't happen */
+			fprintf (stderr, "invalid action (CREATE) for regular file\n");
+			exit(1);
+			break;
+	}
+}
+
+static void
+remove_target_file(const char *path)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (unlink(dstpath) != 0)
+	{
+		fprintf(stderr, "could not remove file \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+void
+truncate_target_file(const char *path, off_t newsize)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (truncate(dstpath, newsize) != 0)
+	{
+		fprintf(stderr, "could not truncate file \"%s\" to %u bytes: %s\n",
+				dstpath, (unsigned int) newsize, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
+create_target_dir(const char *path)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (mkdir(dstpath, S_IRWXU) != 0)
+	{
+		fprintf(stderr, "could not create directory \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
+remove_target_dir(const char *path)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (rmdir(dstpath) != 0)
+	{
+		fprintf(stderr, "could not remove directory \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
+create_target_symlink(const char *path, const char *link)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (symlink(link, dstpath) != 0)
+	{
+		fprintf(stderr, "could not create symbolic link at \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
+remove_target_symlink(const char *path)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (unlink(dstpath) != 0)
+	{
+		fprintf(stderr, "could not remove symbolic link \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+
+static void
+execute_pagemap(datapagemap_t *pagemap, const char *path)
+{
+	datapagemap_iterator_t *iter;
+	BlockNumber blkno;
+
+	iter = datapagemap_iterate(pagemap);
+	while (datapagemap_next(iter, &blkno))
+	{
+		off_t offset = blkno * BLCKSZ;
+
+		copy_file_range(path, offset, offset + BLCKSZ, false);
+		/* Ok, this block has now been copied from new data dir to old */
+	}
+	free(iter);
+}
+
+/*
+ * Read a file into memory. The file to be read is <datadir>/<path>.
+ * The file contents are returned in a malloc'd buffer, and *filesize
+ * is set to the length of the file.
+ *
+ * The returned buffer is always zero-terminated; the size of the returned
+ * buffer is actually *filesize + 1. That's handy when reading a text file.
+ * This function can be used to read binary files as well, you can just
+ * ignore the zero-terminator in that case.
+ *
+ * This function is used to implement the fetchFile function in the "fetch"
+ * interface (see fetch.c), but is also called directly.
+ */
+char *
+slurpFile(const char *datadir, const char *path, size_t *filesize)
+{
+	int			fd;
+	char	   *buffer;
+	struct stat statbuf;
+	char		fullpath[MAXPGPATH];
+	int			len;
+
+	snprintf(fullpath, sizeof(fullpath), "%s/%s", datadir, path);
+
+	if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) == -1)
+	{
+		fprintf(stderr, _("could not open file \"%s\" for reading: %s\n"),
+				fullpath, strerror(errno));
+		exit(2);
+	}
+
+	if (fstat(fd, &statbuf) < 0)
+	{
+		fprintf(stderr, _("could not open file \"%s\" for reading: %s\n"),
+				fullpath, strerror(errno));
+		exit(2);
+	}
+
+	len = statbuf.st_size;
+
+	buffer = pg_malloc(len + 1);
+
+	if (read(fd, buffer, len) != len)
+	{
+		fprintf(stderr, _("could not read file \"%s\": %s\n"),
+				fullpath, strerror(errno));
+		exit(2);
+	}
+	close(fd);
+
+	/* Zero-terminate the buffer. */
+	buffer[len] = '\0';
+
+	if (filesize)
+		*filesize = len;
+	return buffer;
+}
diff --git a/contrib/pg_rewind/datapagemap.c b/contrib/pg_rewind/datapagemap.c
new file mode 100644
index 0000000..25284b7
--- /dev/null
+++ b/contrib/pg_rewind/datapagemap.c
@@ -0,0 +1,123 @@
+/*
+ * A data structure for keeping track of data pages that have changed.
+ *
+ * This is a fairly simple bitmap.
+ *
+ * Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ */
+
+#include "postgres_fe.h"
+
+#include "datapagemap.h"
+#include "util.h"
+
+struct datapagemap_iterator
+{
+	datapagemap_t *map;
+	BlockNumber	nextblkno;
+};
+
+/*****
+ * Public functions
+ */
+
+/*
+ * Add a block to the bitmap.
+ */
+void
+datapagemap_add(datapagemap_t *map, BlockNumber blkno)
+{
+	int			offset;
+	int			bitno;
+
+	offset = blkno / 8;
+	bitno = blkno % 8;
+
+	/* enlarge or create bitmap if needed */
+	if (map->bitmapsize <= offset)
+	{
+		int oldsize = map->bitmapsize;
+		int newsize;
+
+		/*
+		 * The minimum to hold the new bit is offset + 1. But add some
+		 * headroom, so that we don't need to repeatedly enlarge the bitmap
+		 * in the common case that blocks are modified in order, from beginning
+		 * of a relation to the end.
+		 */
+		newsize = offset + 1;
+		newsize += 10;
+
+		if (map->bitmap == NULL)
+			map->bitmap = pg_malloc(newsize);
+		else
+			map->bitmap = pg_realloc(map->bitmap, newsize);
+
+		/* zero out the newly allocated region */
+		memset(&map->bitmap[oldsize], 0, newsize - oldsize);
+
+		map->bitmapsize = newsize;
+	}
+
+	/* Set the bit */
+	map->bitmap[offset] |= (1 << bitno);
+}
+
+/*
+ * Start iterating through all entries in the page map.
+ *
+ * After datapagemap_iterate, call datapagemap_next to return the entries,
+ * until it returns NULL. After you're done, use free() to destroy the
+ * iterator.
+ */
+datapagemap_iterator_t *
+datapagemap_iterate(datapagemap_t *map)
+{
+	datapagemap_iterator_t *iter = pg_malloc(sizeof(datapagemap_iterator_t));
+	iter->map = map;
+	iter->nextblkno = 0;
+	return iter;
+}
+
+bool
+datapagemap_next(datapagemap_iterator_t *iter, BlockNumber *blkno)
+{
+	datapagemap_t *map = iter->map;
+
+	for (;;)
+	{
+		BlockNumber blk = iter->nextblkno;
+		int		nextoff = blk / 8;
+		int		bitno = blk % 8;
+
+		if (nextoff >= map->bitmapsize)
+			break;
+
+		iter->nextblkno++;
+
+		if (map->bitmap[nextoff] & (1 << bitno))
+		{
+			*blkno = blk;
+			return true;
+		}
+	}
+
+	/* no more set bits in this bitmap. */
+	return false;
+}
+
+/*
+ * A debugging aid. Prints out the contents of the page map.
+ */
+void
+datapagemap_print(datapagemap_t *map)
+{
+	datapagemap_iterator_t *iter = datapagemap_iterate(map);
+	BlockNumber blocknum;
+
+	while (datapagemap_next(iter, &blocknum))
+	{
+		printf("  blk %u\n", blocknum);
+	}
+	free(iter);
+}
diff --git a/contrib/pg_rewind/datapagemap.h b/contrib/pg_rewind/datapagemap.h
new file mode 100644
index 0000000..b9a2cd2
--- /dev/null
+++ b/contrib/pg_rewind/datapagemap.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * datapagemap.h
+ *
+ * Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATAPAGEMAP_H
+#define DATAPAGEMAP_H
+
+#include "storage/relfilenode.h"
+#include "storage/block.h"
+
+
+struct datapagemap
+{
+	char	   *bitmap;
+	int			bitmapsize;
+};
+
+typedef struct datapagemap datapagemap_t;
+typedef struct datapagemap_iterator datapagemap_iterator_t;
+
+extern datapagemap_t *datapagemap_create(void);
+extern void datapagemap_destroy(datapagemap_t *map);
+extern void datapagemap_add(datapagemap_t *map, BlockNumber blkno);
+extern datapagemap_iterator_t *datapagemap_iterate(datapagemap_t *map);
+extern bool datapagemap_next(datapagemap_iterator_t *iter, BlockNumber *blkno);
+extern void datapagemap_print(datapagemap_t *map);
+
+#endif   /* DATAPAGEMAP_H */
diff --git a/contrib/pg_rewind/expected/basictest.out b/contrib/pg_rewind/expected/basictest.out
new file mode 100644
index 0000000..b67ead5
--- /dev/null
+++ b/contrib/pg_rewind/expected/basictest.out
@@ -0,0 +1,27 @@
+Master initialized and running.
+CREATE TABLE tbl1 (d text);
+CREATE TABLE
+INSERT INTO tbl1 VALUES ('in master');
+INSERT 0 1
+CHECKPOINT;
+CHECKPOINT
+Standby initialized and running.
+INSERT INTO tbl1 values ('in master, before promotion');
+INSERT 0 1
+CHECKPOINT;
+CHECKPOINT
+Standby promoted.
+INSERT INTO tbl1 VALUES ('in master, after promotion');
+INSERT 0 1
+INSERT INTO tbl1 VALUES ('in standby, after promotion');
+INSERT 0 1
+Running pg_rewind...
+Old master restarted after rewind.
+SELECT * from tbl1
+              d              
+-----------------------------
+ in master
+ in master, before promotion
+ in standby, after promotion
+(3 rows)
+
diff --git a/contrib/pg_rewind/expected/databases.out b/contrib/pg_rewind/expected/databases.out
new file mode 100644
index 0000000..e486107
--- /dev/null
+++ b/contrib/pg_rewind/expected/databases.out
@@ -0,0 +1,24 @@
+Master initialized and running.
+CREATE DATABASE inmaster;
+CREATE DATABASE
+Standby initialized and running.
+CREATE DATABASE beforepromotion
+CREATE DATABASE
+Standby promoted.
+CREATE DATABASE master_afterpromotion
+CREATE DATABASE
+CREATE DATABASE standby_afterpromotion
+CREATE DATABASE
+Running pg_rewind...
+Old master restarted after rewind.
+SELECT datname from pg_database
+        datname         
+------------------------
+ template1
+ template0
+ postgres
+ inmaster
+ beforepromotion
+ standby_afterpromotion
+(6 rows)
+
diff --git a/contrib/pg_rewind/expected/extrafiles.out b/contrib/pg_rewind/expected/extrafiles.out
new file mode 100644
index 0000000..8e3f3f1
--- /dev/null
+++ b/contrib/pg_rewind/expected/extrafiles.out
@@ -0,0 +1,15 @@
+Master initialized and running.
+Standby initialized and running.
+Standby promoted.
+Running pg_rewind...
+Old master restarted after rewind.
+tst_both_dir
+tst_both_dir/both_file1
+tst_both_dir/both_file2
+tst_both_dir/both_subdir
+tst_both_dir/both_subdir/both_file3
+tst_standby_dir
+tst_standby_dir/standby_file1
+tst_standby_dir/standby_file2
+tst_standby_dir/standby_subdir
+tst_standby_dir/standby_subdir/standby_file3
diff --git a/contrib/pg_rewind/fetch.c b/contrib/pg_rewind/fetch.c
new file mode 100644
index 0000000..7feba25
--- /dev/null
+++ b/contrib/pg_rewind/fetch.c
@@ -0,0 +1,60 @@
+/*-------------------------------------------------------------------------
+ *
+ * fetch.c
+ *	  Functions for fetching files from a local or remote data dir
+ *
+ * This file forms an abstraction of getting files from the "source".
+ * There are two implementations of this interface: one for copying files
+ * from a data directory via normal filesystem operations (copy_fetch.c),
+ * and another for fetching files from a remote server via a libpq
+ * connection (libpq_fetch.c)
+ *
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "pg_rewind.h"
+#include "fetch.h"
+#include "filemap.h"
+
+void
+fetchRemoteFileList(void)
+{
+	if (datadir_source)
+		traverse_datadir(datadir_source, &process_remote_file);
+	else
+		libpqProcessFileList();
+}
+
+/*
+ * Fetch all relation data files that are marked in the given data page map.
+ */
+void
+executeFileMap(void)
+{
+	if (datadir_source)
+		copy_executeFileMap(filemap);
+	else
+		libpq_executeFileMap(filemap);
+}
+
+/*
+ * Fetch a single file into a malloc'd buffer. The file size is returned
+ * in *filesize. The returned buffer is always zero-terminated.
+ */
+char *
+fetchFile(char *filename, size_t *filesize)
+{
+	if (datadir_source)
+		return slurpFile(datadir_source, filename, filesize);
+	else
+		return libpqGetFile(filename, filesize);
+}
diff --git a/contrib/pg_rewind/fetch.h b/contrib/pg_rewind/fetch.h
new file mode 100644
index 0000000..8a302a7
--- /dev/null
+++ b/contrib/pg_rewind/fetch.h
@@ -0,0 +1,56 @@
+/*-------------------------------------------------------------------------
+ *
+ * fetch.h
+ *	  Fetching data from a local or remote data directory.
+ *
+ * This file includes the prototypes for functions used to copy files from
+ * one data directory to another. The source to copy from can be a local
+ * directory (copy method), or a remote PostgreSQL server (libpq fetch
+ * method).
+ *
+ * Portions Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FETCH_H
+#define FETCH_H
+
+#include "c.h"
+
+#include "filemap.h"
+
+/*
+ * Common interface. Calls the copy or libpq method depending on global
+ * config options.
+ */
+extern void fetchRemoteFileList(void);
+extern char *fetchFile(char *filename, size_t *filesize);
+extern void executeFileMap(void);
+
+/* in libpq_fetch.c */
+extern void libpqConnect(const char *connstr);
+extern void libpqProcessFileList(void);
+extern void libpq_executeFileMap(filemap_t *map);
+extern void libpqGetChangedDataPages(datapagemap_t *pagemap);
+extern void libpqGetOtherFiles(void);
+extern char *libpqGetFile(const char *filename, size_t *filesize);
+
+/* in copy_fetch.c */
+extern void copy_executeFileMap(filemap_t *map);
+
+extern void open_target_file(const char *path, bool trunc);
+extern void write_file_range(char *buf, off_t begin, size_t size);
+extern void close_target_file(void);
+
+extern char *slurpFile(const char *datadir, const char *path, size_t *filesize);
+
+typedef void (*process_file_callback_t) (const char *path, file_type_t type, size_t size, const char *link_target);
+extern void traverse_datadir(const char *datadir, process_file_callback_t callback);
+
+extern void truncate_target_file(const char *path, off_t newsize);
+extern void create_target(file_entry_t *t);
+extern void remove_target(file_entry_t *t);
+extern void check_samefile(int fd1, int fd2);
+
+
+#endif   /* FETCH_H */
diff --git a/contrib/pg_rewind/filemap.c b/contrib/pg_rewind/filemap.c
new file mode 100644
index 0000000..c2ca80c
--- /dev/null
+++ b/contrib/pg_rewind/filemap.c
@@ -0,0 +1,584 @@
+/*
+ * A data structure for keeping track of files that have changed.
+ *
+ * Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ */
+
+#include "postgres_fe.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <regex.h>
+
+#include "datapagemap.h"
+#include "filemap.h"
+#include "util.h"
+#include "pg_rewind.h"
+#include "storage/fd.h"
+
+filemap_t *filemap = NULL;
+
+static bool isRelDataFile(const char *path);
+static int path_cmp(const void *a, const void *b);
+static int final_filemap_cmp(const void *a, const void *b);
+static void filemap_list_to_array(void);
+
+
+/*****
+ * Public functions
+ */
+
+/*
+ * Create a new file map.
+ */
+filemap_t *
+filemap_create(void)
+{
+	filemap_t *map = pg_malloc(sizeof(filemap_t));
+	map->first = map->last = NULL;
+	map->nlist = 0;
+	map->array = NULL;
+	map->narray = 0;
+
+	Assert(filemap == NULL);
+	filemap = map;
+
+	return map;
+}
+
+static bool
+endswith(const char *haystack, const char *needle)
+{
+	int needlelen = strlen(needle);
+	int haystacklen = strlen(haystack);
+
+	if (haystacklen < needlelen)
+		return false;
+
+	return strcmp(&haystack[haystacklen - needlelen], needle) == 0;
+}
+
+/*
+ * Callback for processing remote file list.
+ */
+void
+process_remote_file(const char *path, file_type_t type, size_t newsize,
+					const char *link_target)
+{
+	bool		exists;
+	char		localpath[MAXPGPATH];
+	struct stat statbuf;
+	filemap_t  *map = filemap;
+	file_action_t action = FILE_ACTION_NONE;
+	size_t		oldsize = 0;
+	file_entry_t *entry;
+
+	Assert(map->array == NULL);
+
+	/*
+	 * Completely ignore some special files in source and destination.
+	 */
+	if (strcmp(path, "postmaster.pid") == 0 ||
+		strcmp(path, "postmaster.opts") == 0)
+		return;
+
+	/*
+	 * Skip temporary files, .../pgsql_tmp/... and .../pgsql_tmp.* in source.
+	 * This has the effect that all temporary files in the destination will
+	 * be removed.
+	 */
+	if (strstr(path, "/" PG_TEMP_FILE_PREFIX) != NULL)
+		return;
+	if (strstr(path, "/" PG_TEMP_FILES_DIR "/") != NULL)
+		return;
+
+	/*
+	 * sanity check: a filename that looks like a data file better be a
+	 * regular file
+	 */
+	if (type != FILE_TYPE_REGULAR && isRelDataFile(path))
+	{
+		fprintf(stderr, "data file in source \"%s\" is a directory.\n", path);
+		exit(1);
+	}
+
+	snprintf(localpath, sizeof(localpath), "%s/%s", datadir_target, path);
+
+	/* Does the corresponding local file exist? */
+	if (lstat(localpath, &statbuf) < 0)
+	{
+		/* does not exist */
+		if (errno != ENOENT)
+		{
+			fprintf(stderr, "could not stat file \"%s\": %s",
+					localpath, strerror(errno));
+			exit(1);
+		}
+
+		exists = false;
+	}
+	else
+		exists = true;
+
+	switch (type)
+	{
+		case FILE_TYPE_DIRECTORY:
+			if (exists && !S_ISDIR(statbuf.st_mode))
+			{
+				/* it's a directory in target, but not in source. Strange.. */
+				fprintf(stderr, "\"%s\" is not a directory.\n", localpath);
+				exit(1);
+			}
+
+			if (!exists)
+				action = FILE_ACTION_CREATE;
+			else
+				action = FILE_ACTION_NONE;
+			oldsize = 0;
+			break;
+
+		case FILE_TYPE_SYMLINK:
+			if (exists && !S_ISLNK(statbuf.st_mode))
+			{
+				/* it's a symbolic link in target, but not in source. Strange.. */
+				fprintf(stderr, "\"%s\" is not a symbolic link.\n", localpath);
+				exit(1);
+			}
+
+			if (!exists)
+				action = FILE_ACTION_CREATE;
+			else
+				action = FILE_ACTION_NONE;
+			oldsize = 0;
+			break;
+
+		case FILE_TYPE_REGULAR:
+			if (exists && !S_ISREG(statbuf.st_mode))
+			{
+				fprintf(stderr, "\"%s\" is not a regular file.\n", localpath);
+				exit(1);
+			}
+
+			if (!exists || !isRelDataFile(path))
+			{
+				/*
+				 * File exists in source, but not in target. Or it's a non-data
+				 * file that we have no special processing for. Copy it in toto.
+				 *
+				 * An exception: PG_VERSIONs should be identical, but avoid
+				 * overwriting it for paranoia.
+				 */
+				if (endswith(path, "PG_VERSION"))
+				{
+					action = FILE_ACTION_NONE;
+					oldsize = statbuf.st_size;
+				}
+				else
+				{
+					action = FILE_ACTION_COPY;
+					oldsize = 0;
+				}
+			}
+			else
+			{
+				/*
+				 * It's a data file that exists in both.
+				 *
+				 * If it's larger in target, we can truncate it. There will
+				 * also be a WAL record of the truncation in the source system,
+				 * so WAL replay would eventually truncate the target too, but
+				 * we might as well do it now.
+				 *
+				 * If it's smaller in the target, it means that it has been
+				 * truncated in the target, or enlarged in the source, or both.
+				 * If it was truncated locally, we need to copy the missing
+				 * tail from the remote system. If it was enlarged in the
+				 * remote system, there will be WAL records in the remote
+				 * system for the new blocks, so we wouldn't need to copy them
+				 * here. But we don't know which scenario we're dealing with,
+				 * and there's no harm in copying the missing blocks now, so do
+				 * it now.
+				 *
+				 * If it's the same size, do nothing here. Any locally modified
+				 * blocks will be copied based on parsing the local WAL, and
+				 * any remotely modified blocks will be updated after
+				 * rewinding, when the remote WAL is replayed.
+				 */
+				oldsize = statbuf.st_size;
+				if (oldsize < newsize)
+					action = FILE_ACTION_COPY_TAIL;
+				else if (oldsize > newsize)
+					action = FILE_ACTION_TRUNCATE;
+				else
+					action = FILE_ACTION_NONE;
+			}
+			break;
+	}
+
+	/* Create a new entry for this file */
+	entry = pg_malloc(sizeof(file_entry_t));
+	entry->path = pg_strdup(path);
+	entry->type = type;
+	entry->action = action;
+	entry->oldsize = oldsize;
+	entry->newsize = newsize;
+	entry->link_target = link_target ? pg_strdup(link_target) : NULL;
+	entry->next = NULL;
+	entry->pagemap.bitmap = NULL;
+	entry->pagemap.bitmapsize = 0;
+	entry->isrelfile = isRelDataFile(path);
+
+	if (map->last)
+	{
+		map->last->next = entry;
+		map->last = entry;
+	}
+	else
+		map->first = map->last = entry;
+	map->nlist++;
+}
+
+
+/*
+ * Callback for processing local file list.
+ *
+ * All remote files must be processed before calling this. This only marks
+ * local files that don't exist in the remote system for deletion.
+ */
+void
+process_local_file(const char *path, file_type_t type, size_t oldsize,
+				   const char *link_target)
+{
+	bool		exists;
+	char		localpath[MAXPGPATH];
+	struct stat statbuf;
+	file_entry_t key;
+	file_entry_t *key_ptr;
+	filemap_t  *map = filemap;
+	file_entry_t *entry;
+
+	snprintf(localpath, sizeof(localpath), "%s/%s", datadir_target, path);
+	if (lstat(localpath, &statbuf) < 0)
+	{
+		if (errno == ENOENT)
+			exists = false;
+		else
+		{
+			fprintf(stderr, "could not stat file \"%s\": %s",
+					localpath, strerror(errno));
+			exit(1);
+		}
+	}
+
+	if (map->array == NULL)
+	{
+		/* on first call, initialize lookup array */
+		if (map->nlist == 0)
+		{
+			/* should not happen */
+			fprintf(stderr, "remote file list is empty\n");
+			exit(1);
+		}
+
+		filemap_list_to_array();
+		qsort(map->array, map->narray, sizeof(file_entry_t *), path_cmp);
+	}
+
+	/*
+	 * Completely ignore some special files
+	 */
+	if (strcmp(path, "postmaster.pid") == 0 ||
+		strcmp(path, "postmaster.opts") == 0)
+		return;
+
+	key.path = (char *) path;
+	key_ptr = &key;
+	exists = bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
+					 path_cmp) != NULL;
+
+	/* Remove any file or folder that doesn't exist in the remote system. */
+	if (!exists)
+	{
+		entry = pg_malloc(sizeof(file_entry_t));
+		entry->path = pg_strdup(path);
+		entry->type = type;
+		entry->action = FILE_ACTION_REMOVE;
+		entry->oldsize = oldsize;
+		entry->newsize = 0;
+		entry->link_target = link_target ? pg_strdup(link_target) : NULL;
+		entry->next = NULL;
+		entry->pagemap.bitmap = NULL;
+		entry->pagemap.bitmapsize = 0;
+		entry->isrelfile = isRelDataFile(path);
+
+		if (map->last == NULL)
+			map->first = entry;
+		else
+			map->last->next = entry;
+		map->last = entry;
+		map->nlist++;
+	}
+	else
+	{
+		/*
+		 * We already handled all files that exist in the remote system
+		 * in process_remote_file().
+		 */
+	}
+}
+
+/*
+ * This callback gets called while we read the old WAL, for every block that
+ * have changed in the local system. It makes note of all the changed blocks
+ * in the pagemap of the file.
+ */
+void
+process_block_change(ForkNumber forknum, RelFileNode rnode, BlockNumber blkno)
+{
+	char	   *path;
+	file_entry_t key;
+	file_entry_t *key_ptr;
+	file_entry_t *entry;
+	BlockNumber	blkno_inseg;
+	int			segno;
+	filemap_t  *map = filemap;
+
+	Assert(filemap->array);
+
+	segno = blkno / RELSEG_SIZE;
+	blkno_inseg = blkno % RELSEG_SIZE;
+
+	path = datasegpath(rnode, forknum, segno);
+
+	key.path = (char *) path;
+	key_ptr = &key;
+
+	{
+		file_entry_t **e = bsearch(&key_ptr, map->array, map->narray,
+								   sizeof(file_entry_t *), path_cmp);
+		if (e)
+			entry = *e;
+		else
+			entry = NULL;
+	}
+	free(path);
+
+	if (entry)
+	{
+		Assert(entry->isrelfile);
+
+		switch (entry->action)
+		{
+			case FILE_ACTION_NONE:
+			case FILE_ACTION_COPY_TAIL:
+			case FILE_ACTION_TRUNCATE:
+				/* skip if we're truncating away the modified block anyway */
+				if ((blkno_inseg + 1) * BLCKSZ <= entry->newsize)
+					datapagemap_add(&entry->pagemap, blkno_inseg);
+				break;
+
+			case FILE_ACTION_COPY:
+			case FILE_ACTION_REMOVE:
+				return;
+
+			case FILE_ACTION_CREATE:
+				fprintf(stderr, "unexpected page modification for directory or symbolic link \"%s\"", entry->path);
+				exit(1);
+		}
+	}
+	else
+	{
+		/*
+		 * If we don't have any record of this file in the file map, it means
+		 * that it's a relation that doesn't exist in the remote system, and
+		 * it was also subsequently removed in the local system, too. We can
+		 * safely ignore it.
+		 */
+	}
+}
+
+/*
+ * Convert the linked list of entries in filemap->first/last to the array,
+ * filemap->array.
+ */
+static void
+filemap_list_to_array(void)
+{
+	int			narray;
+	file_entry_t *entry,
+				*next;
+
+	if (filemap->array == NULL)
+		filemap->array = pg_malloc(filemap->nlist * sizeof(file_entry_t));
+	else
+		filemap->array = pg_realloc(filemap->array,
+									(filemap->nlist + filemap->narray) * sizeof(file_entry_t));
+
+	narray = filemap->narray;
+	for (entry = filemap->first; entry != NULL; entry = next)
+	{
+		filemap->array[narray++] = entry;
+		next = entry->next;
+		entry->next = NULL;
+	}
+	Assert (narray == filemap->nlist + filemap->narray);
+	filemap->narray = narray;
+	filemap->nlist = 0;
+	filemap->first = filemap->last = NULL;
+}
+
+void
+filemap_finalize(void)
+{
+	filemap_list_to_array();
+	qsort(filemap->array, filemap->narray, sizeof(file_entry_t *),
+		  final_filemap_cmp);
+}
+
+static const char *
+action_to_str(file_action_t action)
+{
+	switch (action)
+	{
+		case FILE_ACTION_NONE:
+			return "NONE";
+		case FILE_ACTION_COPY:
+			return "COPY";
+		case FILE_ACTION_TRUNCATE:
+			return "TRUNCATE";
+		case FILE_ACTION_COPY_TAIL:
+			return "COPY_TAIL";
+		case FILE_ACTION_CREATE:
+			return "CREATE";
+		case FILE_ACTION_REMOVE:
+			return "REMOVE";
+
+		default:
+			return "unknown";
+	}
+}
+
+void
+print_filemap(void)
+{
+	file_entry_t *entry;
+	int			i;
+
+	for (i = 0; i < filemap->narray; i++)
+	{
+		entry = filemap->array[i];
+		if (entry->action != FILE_ACTION_NONE ||
+			entry->pagemap.bitmapsize > 0)
+		{
+			printf("%s (%s)\n", entry->path, action_to_str(entry->action));
+
+			if (entry->pagemap.bitmapsize > 0)
+				datapagemap_print(&entry->pagemap);
+		}
+	}
+	fflush(stdout);
+}
+
+/*
+ * Does it look like a relation data file?
+ */
+static bool
+isRelDataFile(const char *path)
+{
+	static bool	regexps_compiled = false;
+	static regex_t datasegment_regex;
+	int			rc;
+
+	/* Compile the regexp if not compiled yet. */
+	if (!regexps_compiled)
+	{
+		/*
+		 * Relation data files can be in one of the following directories:
+		 *
+		 * global/
+		 *		shared relations
+		 *
+		 * base/<db oid>/
+		 *		regular relations, default tablespace
+		 *
+		 * pg_tblspc/<tblspc oid>/PG_9.4_201403261/
+		 *		within a non-default tablespace (the name of the directory
+		 *		depends on version)
+		 *
+		 * And the relation data files themselves have a filename like:
+		 *
+		 * <oid>.<segment number>
+		 *
+		 * This regular expression tries to capture all of above.
+		 */
+		const char *datasegment_regex_str =
+			"("
+			"global"
+			"|"
+			"base/[0-9]+"
+			"|"
+			"pg_tblspc/[0-9]+/[PG_0-9.0-9_0-9]+/[0-9]+"
+			")/"
+			"[0-9]+(\\.[0-9]+)?$";
+		rc = regcomp(&datasegment_regex, datasegment_regex_str, REG_NOSUB | REG_EXTENDED);
+		if (rc != 0)
+		{
+			char errmsg[100];
+			regerror(rc, &datasegment_regex, errmsg, sizeof(errmsg));
+			fprintf(stderr, "could not compile regular expression: %s\n",
+					errmsg);
+			exit(1);
+		}
+	}
+
+	rc = regexec(&datasegment_regex, path, 0, NULL, 0);
+	if (rc == 0)
+	{
+		/* it's a data segment file */
+		return true;
+	}
+	else if (rc != REG_NOMATCH)
+	{
+		char errmsg[100];
+		regerror(rc, &datasegment_regex, errmsg, sizeof(errmsg));
+		fprintf(stderr, "could not execute regular expression: %s\n", errmsg);
+		exit(1);
+	}
+	return false;
+}
+
+static int
+path_cmp(const void *a, const void *b)
+{
+	file_entry_t *fa = *((file_entry_t **) a);
+	file_entry_t *fb = *((file_entry_t **) b);
+	return strcmp(fa->path, fb->path);
+}
+
+/*
+ * In the final stage, the filemap is sorted so that removals come last.
+ * From disk space usage point of view, it would be better to do removals
+ * first, but for now, safety first. If a whole directory is deleted, all
+ * files and subdirectories inside it need to removed first. On creation,
+ * parent directory needs to be created before files and directories inside
+ * it. To achieve that, the file_action_t enum is ordered so that we can
+ * just sort on that first. Furthermore, sort REMOVE entries in reverse
+ * path order, so that "foo/bar" subdirectory is removed before "foo".
+ */
+static int
+final_filemap_cmp(const void *a, const void *b)
+{
+	file_entry_t *fa = *((file_entry_t **) a);
+	file_entry_t *fb = *((file_entry_t **) b);
+
+	if (fa->action > fb->action)
+		return 1;
+	if (fa->action < fb->action)
+		return -1;
+
+	if (fa->action == FILE_ACTION_REMOVE)
+		return -strcmp(fa->path, fb->path);
+	else
+		return strcmp(fa->path, fb->path);
+}
diff --git a/contrib/pg_rewind/filemap.h b/contrib/pg_rewind/filemap.h
new file mode 100644
index 0000000..342f4c8
--- /dev/null
+++ b/contrib/pg_rewind/filemap.h
@@ -0,0 +1,98 @@
+/*-------------------------------------------------------------------------
+ *
+ * filemap.h
+ *
+ * Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *-------------------------------------------------------------------------
+ */
+#ifndef FILEMAP_H
+#define FILEMAP_H
+
+#include "storage/relfilenode.h"
+#include "storage/block.h"
+
+/*
+ * For every file found in the local or remote system, we have a file entry
+ * which says what we are going to do with the file. For relation files,
+ * there is also a page map, marking pages in the file that were changed
+ * locally.
+ *
+ * The enum values are sorted in the order we want actions to be processed.
+ */
+typedef enum
+{
+	FILE_ACTION_CREATE,		/* create local directory or symbolic link */
+	FILE_ACTION_COPY,		/* copy whole file, overwriting if exists */
+	FILE_ACTION_COPY_TAIL,	/* copy tail from 'oldsize' to 'newsize' */
+	FILE_ACTION_NONE,		/* no action (we might still copy modified blocks
+							 * based on the parsed WAL) */
+	FILE_ACTION_TRUNCATE,	/* truncate local file to 'newsize' bytes */
+	FILE_ACTION_REMOVE,		/* remove local file / directory / symlink */
+
+} file_action_t;
+
+typedef enum
+{
+	FILE_TYPE_REGULAR,
+	FILE_TYPE_DIRECTORY,
+	FILE_TYPE_SYMLINK
+} file_type_t;
+
+struct file_entry_t
+{
+	char	   *path;
+	file_type_t type;
+
+	file_action_t action;
+
+	/* for a regular file */
+	size_t		oldsize;
+	size_t		newsize;
+	bool		isrelfile;		/* is it a relation data file? */
+
+	datapagemap_t	pagemap;
+
+	/* for a symlink */
+	char		*link_target;
+
+	struct file_entry_t *next;
+};
+
+typedef struct file_entry_t file_entry_t;
+
+struct filemap_t
+{
+	/*
+	 * New entries are accumulated to a linked list, in process_remote_file
+	 * and process_local_file.
+	 */
+	file_entry_t *first;
+	file_entry_t *last;
+	int			nlist;
+
+	/*
+	 * After processing all the remote files, the entries in the linked list
+	 * are moved to this array. After processing local file, too, all the
+	 * local entries are added to the array by filemap_finalize, and sorted
+	 * in the final order. After filemap_finalize, all the entries are in
+	 * the array, and the linked list is empty.
+	 */
+	file_entry_t **array;
+	int			narray;
+};
+
+typedef struct filemap_t filemap_t;
+
+extern filemap_t * filemap;
+
+extern filemap_t *filemap_create(void);
+
+extern void print_filemap(void);
+
+/* Functions for populating the filemap */
+extern void process_remote_file(const char *path, file_type_t type, size_t newsize, const char *link_target);
+extern void process_local_file(const char *path, file_type_t type, size_t newsize, const char *link_target);
+extern void process_block_change(ForkNumber forknum, RelFileNode rnode, BlockNumber blkno);
+extern void filemap_finalize(void);
+
+#endif   /* FILEMAP_H */
diff --git a/contrib/pg_rewind/launcher b/contrib/pg_rewind/launcher
new file mode 100755
index 0000000..56f8cc0
--- /dev/null
+++ b/contrib/pg_rewind/launcher
@@ -0,0 +1,6 @@
+#!/bin/bash
+#
+# Normally, psql feeds the files in sql/ directory to psql, but we want to
+# run them as shell scripts instead.
+
+bash
diff --git a/contrib/pg_rewind/libpq_fetch.c b/contrib/pg_rewind/libpq_fetch.c
new file mode 100644
index 0000000..c281714
--- /dev/null
+++ b/contrib/pg_rewind/libpq_fetch.c
@@ -0,0 +1,408 @@
+/*-------------------------------------------------------------------------
+ *
+ * libpq_fetch.c
+ *	  Functions for fetching files from a remote server.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/catalog.h"
+#include "catalog/pg_type.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include <libpq-fe.h>
+
+#include "pg_rewind.h"
+#include "fetch.h"
+#include "filemap.h"
+#include "datapagemap.h"
+
+static PGconn *conn = NULL;
+
+#define CHUNKSIZE 1000000
+
+static void receiveFileChunks(const char *sql);
+static void execute_pagemap(datapagemap_t *pagemap, const char *path);
+
+void
+libpqConnect(const char *connstr)
+{
+	conn = PQconnectdb(connstr);
+	if (PQstatus(conn) == CONNECTION_BAD)
+	{
+		fprintf(stderr, "could not connect to remote server: %s\n",
+				PQerrorMessage(conn));
+		exit(1);
+	}
+
+	if (verbose)
+		printf("connected to remote server\n");
+}
+
+/*
+ * Get a file list.
+ */
+void
+libpqProcessFileList(void)
+{
+	PGresult   *res;
+	const char *sql;
+	int			i;
+
+	sql =
+		"-- Create a recursive directory listing of the whole data directory\n"
+		"with recursive files (path, filename, size, isdir) as (\n"
+		"  select '' as path, filename, size, isdir from\n"
+		"  (select pg_ls_dir('.') as filename) as fn,\n"
+		"        pg_stat_file(fn.filename) as this\n"
+		"  union all\n"
+		"  select parent.path || parent.filename || '/' as path,\n"
+		"         fn, this.size, this.isdir\n"
+		"  from files as parent,\n"
+		"       pg_ls_dir(parent.path || parent.filename) as fn,\n"
+		"       pg_stat_file(parent.path || parent.filename || '/' || fn) as this\n"
+		"       where parent.isdir = 't'\n"
+		")\n"
+		"-- Using the cte, fetch a listing of the all the files.\n"
+		"--\n"
+		"-- For tablespaces, use pg_tablespace_location() function to fetch\n"
+		"-- the link target (there is no backend function to get a symbolic\n"
+		"-- link's target in general, so if the admin has put any custom\n"
+		"-- symbolic links in the data directory, they won't be copied\n"
+		"-- correctly)\n"
+		"select path || filename, size, isdir,\n"
+		"       pg_tablespace_location(pg_tablespace.oid) as link_target\n"
+		"from files\n"
+		"left outer join pg_tablespace on files.path = 'pg_tblspc/'\n"
+		"                             and oid::text = files.filename\n";
+	res = PQexec(conn, sql);
+
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		fprintf(stderr, "unexpected result while fetching file list: %s\n",
+				PQresultErrorMessage(res));
+		exit(1);
+	}
+
+	/* sanity check the result set */
+	if (!(PQnfields(res) == 4))
+	{
+		fprintf(stderr, "unexpected result set while fetching file list\n");
+		exit(1);
+	}
+
+	/* Read result to local variables */
+	for (i = 0; i < PQntuples(res); i++)
+	{
+		char	   *path = PQgetvalue(res, i, 0);
+		int			filesize = atoi(PQgetvalue(res, i, 1));
+		bool		isdir = (strcmp(PQgetvalue(res, i, 2), "t") == 0);
+		char	   *link_target = PQgetvalue(res, i, 3);
+		file_type_t type;
+
+		if (link_target[0])
+			type = FILE_TYPE_SYMLINK;
+		else if (isdir)
+			type = FILE_TYPE_DIRECTORY;
+		else
+			type = FILE_TYPE_REGULAR;
+
+		process_remote_file(path, type, filesize, link_target);
+	}
+}
+
+/*
+ * Runs a query, which returns pieces of files from the remote source data
+ * directory, and overwrites the corresponding parts of target files with
+ * the received parts. The result set is expected to be of format:
+ *
+ * path		text	-- path in the data directory, e.g "base/1/123"
+ * begin	int4	-- offset within the file
+ * chunk	bytea	-- file content
+ *
+ */
+static void
+receiveFileChunks(const char *sql)
+{
+	PGresult   *res;
+
+	if (PQsendQueryParams(conn, sql, 0, NULL, NULL, NULL, NULL, 1) != 1)
+	{
+		fprintf(stderr, "could not send query: %s\n", PQerrorMessage(conn));
+		exit(1);
+	}
+
+	if (verbose)
+		fprintf(stderr, "getting chunks: %s\n", sql);
+
+	if (PQsetSingleRowMode(conn) != 1)
+	{
+		fprintf(stderr, "could not set libpq connection to single row mode\n");
+		exit(1);
+	}
+
+	if (verbose)
+		fprintf(stderr, "sent query\n");
+
+	while ((res = PQgetResult(conn)) != NULL)
+	{
+		char   *filename;
+		int		filenamelen;
+		int		chunkoff;
+		int		chunksize;
+		char   *chunk;
+
+		switch(PQresultStatus(res))
+		{
+			case PGRES_SINGLE_TUPLE:
+				break;
+
+			case PGRES_TUPLES_OK:
+				continue; /* final zero-row result */
+			default:
+				fprintf(stderr, "unexpected result while fetching remote files: %s\n",
+						PQresultErrorMessage(res));
+				exit(1);
+		}
+
+		/* sanity check the result set */
+		if (!(PQnfields(res) == 3 && PQntuples(res) == 1))
+		{
+			fprintf(stderr, "unexpected result set size while fetching remote files\n");
+			exit(1);
+		}
+
+		if (!(PQftype(res, 0) == TEXTOID && PQftype(res, 1) == INT4OID && PQftype(res, 2) == BYTEAOID))
+		{
+			fprintf(stderr, "unexpected data types in result set while fetching remote files: %u %u %u\n", PQftype(res, 0), PQftype(res, 1), PQftype(res, 2));
+			exit(1);
+		}
+		if (!(PQfformat(res, 0) == 1 && PQfformat(res, 1) == 1 && PQfformat(res, 2) == 1))
+		{
+			fprintf(stderr, "unexpected result format while fetching remote files\n");
+			exit(1);
+		}
+
+		if (!(!PQgetisnull(res, 0, 0) && !PQgetisnull(res, 0, 1) && !PQgetisnull(res, 0, 2) &&
+			  PQgetlength(res, 0, 1) == sizeof(int32)))
+		{
+			fprintf(stderr, "unexpected result set while fetching remote files\n");
+			exit(1);
+		}
+
+		/* Read result set to local variables */
+		memcpy(&chunkoff, PQgetvalue(res, 0, 1), sizeof(int32));
+		chunkoff = ntohl(chunkoff);
+		chunksize = PQgetlength(res, 0, 2);
+
+		filenamelen = PQgetlength(res, 0, 0);
+		filename = pg_malloc(filenamelen + 1);
+		memcpy(filename, PQgetvalue(res, 0, 0), filenamelen);
+		filename[filenamelen] = '\0';
+
+		chunk = PQgetvalue(res, 0, 2);
+
+		if (verbose)
+			fprintf(stderr, "received chunk for file \"%s\", off %d, len %d\n",
+					filename, chunkoff, chunksize);
+
+		open_target_file(filename, false);
+
+		write_file_range(chunk, chunkoff, chunksize);
+	}
+}
+
+/*
+ * Receive a single file.
+ */
+char *
+libpqGetFile(const char *filename, size_t *filesize)
+{
+	PGresult   *res;
+	char	   *result;
+	int			len;
+	const char *paramValues[1];
+	paramValues[0] = filename;
+
+	res = PQexecParams(conn, "select pg_read_binary_file($1)",
+					   1, NULL, paramValues, NULL, NULL, 1);
+
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		fprintf(stderr, "unexpected result while fetching remote file \"%s\": %s\n",
+				filename, PQresultErrorMessage(res));
+		exit(1);
+	}
+
+
+	/* sanity check the result set */
+	if (!(PQntuples(res) == 1 && !PQgetisnull(res, 0, 0)))
+	{
+		fprintf(stderr, "unexpected result set while fetching remote file \"%s\"\n",
+				filename);
+		exit(1);
+	}
+
+	/* Read result to local variables */
+	len = PQgetlength(res, 0, 0);
+	result = pg_malloc(len + 1);
+	memcpy(result, PQgetvalue(res, 0, 0), len);
+	result[len] = '\0';
+
+	if (verbose)
+		printf("fetched file \"%s\", length %d\n", filename, len);
+
+	if (filesize)
+		*filesize = len;
+	return result;
+}
+
+static void
+copy_file_range(const char *path, unsigned int begin, unsigned int end)
+{
+	char linebuf[MAXPGPATH + 23];
+
+	/* Split the range into CHUNKSIZE chunks */
+	while (end - begin > 0)
+	{
+		unsigned int len;
+
+		if (end - begin > CHUNKSIZE)
+			len = CHUNKSIZE;
+		else
+			len = end - begin;
+
+		snprintf(linebuf, sizeof(linebuf), "%s\t%u\t%u\n", path, begin, len);
+
+		if (PQputCopyData(conn, linebuf, strlen(linebuf)) != 1)
+		{
+			fprintf(stderr, "error sending COPY data: %s\n",
+					PQerrorMessage(conn));
+			exit(1);
+		}
+		begin += len;
+	}
+}
+
+/*
+ * Fetch all changed blocks from remote source data directory.
+ */
+void
+libpq_executeFileMap(filemap_t *map)
+{
+	file_entry_t *entry;
+	const char *sql;
+	PGresult   *res;
+	int			i;
+
+	/*
+	 * First create a temporary table, and load it with the blocks that
+	 * we need to fetch.
+	 */
+	sql = "create temporary table fetchchunks(path text, begin int4, len int4);";
+	res = PQexec(conn, sql);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		fprintf(stderr, "error creating temporary table: %s\n",
+				PQresultErrorMessage(res));
+		exit(1);
+	}
+
+	sql = "copy fetchchunks from stdin";
+	res = PQexec(conn, sql);
+
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+	{
+		fprintf(stderr, "unexpected result while sending file list: %s\n",
+				PQresultErrorMessage(res));
+		exit(1);
+	}
+
+	for (i = 0; i < map->narray; i++)
+	{
+		entry = map->array[i];
+		execute_pagemap(&entry->pagemap, entry->path);
+
+		switch (entry->action)
+		{
+			case FILE_ACTION_NONE:
+				/* ok, do nothing.. */
+				break;
+
+			case FILE_ACTION_COPY:
+				/* Truncate the old file out of the way, if any */
+				open_target_file(entry->path, true);
+				copy_file_range(entry->path, 0, entry->newsize);
+				break;
+
+			case FILE_ACTION_TRUNCATE:
+				truncate_target_file(entry->path, entry->newsize);
+				break;
+
+			case FILE_ACTION_COPY_TAIL:
+				copy_file_range(entry->path, entry->oldsize, entry->newsize);
+				break;
+
+			case FILE_ACTION_REMOVE:
+				remove_target(entry);
+				break;
+
+			case FILE_ACTION_CREATE:
+				create_target(entry);
+				break;
+		}
+	}
+
+	if (PQputCopyEnd(conn, NULL) != 1)
+	{
+		fprintf(stderr, "error sending end-of-COPY: %s\n",
+				PQerrorMessage(conn));
+		exit(1);
+	}
+
+	while ((res = PQgetResult(conn)) != NULL)
+	{
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		{
+			fprintf(stderr, "unexpected result while sending file list: %s\n",
+					PQresultErrorMessage(res));
+			exit(1);
+		}
+	}
+
+	/* Ok, we've sent the file list. Now receive the files */
+	sql =
+		"-- fetch all the blocks listed in the temp table.\n"
+		"select path, begin, \n"
+		"  pg_read_binary_file(path, begin, len) as chunk\n"
+		"from fetchchunks\n";
+
+	receiveFileChunks(sql);
+}
+
+
+static void
+execute_pagemap(datapagemap_t *pagemap, const char *path)
+{
+	datapagemap_iterator_t *iter;
+	BlockNumber blkno;
+
+	iter = datapagemap_iterate(pagemap);
+	while (datapagemap_next(iter, &blkno))
+	{
+		off_t offset = blkno * BLCKSZ;
+
+		copy_file_range(path, offset, offset + BLCKSZ);
+	}
+	free(iter);
+}
diff --git a/contrib/pg_rewind/parsexlog.c b/contrib/pg_rewind/parsexlog.c
new file mode 100644
index 0000000..c88d6e8
--- /dev/null
+++ b/contrib/pg_rewind/parsexlog.c
@@ -0,0 +1,369 @@
+/*-------------------------------------------------------------------------
+ *
+ * parsexlog.c
+ *	  Functions for reading Write-Ahead-Log
+ *
+ * Portions Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1996-2008, Nippon Telegraph and Telephone Corporation
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#define FRONTEND 1
+#include "c.h"
+#undef FRONTEND
+#include "postgres.h"
+
+#include "pg_rewind.h"
+#include "filemap.h"
+
+#include <unistd.h>
+
+#include "access/rmgr.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+#include "catalog/pg_control.h"
+#include "catalog/storage_xlog.h"
+#include "commands/dbcommands.h"
+
+
+/*
+ * RmgrNames is an array of resource manager names, to make error messages
+ * a bit nicer.
+ */
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup) \
+  name,
+
+static const char *RmgrNames[RM_MAX_ID + 1] = {
+#include "access/rmgrlist.h"
+};
+
+static void extractPageInfo(XLogReaderState *record);
+
+static int xlogreadfd = -1;
+static XLogSegNo xlogreadsegno = -1;
+static char xlogfpath[MAXPGPATH];
+
+typedef struct XLogPageReadPrivate
+{
+	const char	   *datadir;
+	TimeLineID		tli;
+} XLogPageReadPrivate;
+
+static int SimpleXLogPageRead(XLogReaderState *xlogreader,
+				   XLogRecPtr targetPagePtr,
+				   int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
+				   TimeLineID *pageTLI);
+
+/*
+ * Read all the WAL in the datadir/pg_xlog,  starting from 'startpoint' on
+ * timeline 'tli'. Make note of the data blocks touched by the WAL records,
+ * and return them in a page map.
+ */
+void
+extractPageMap(const char *datadir, XLogRecPtr startpoint, TimeLineID tli)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char *errormsg;
+	XLogPageReadPrivate private;
+
+	private.datadir = datadir;
+	private.tli = tli;
+	xlogreader = XLogReaderAllocate(&SimpleXLogPageRead, &private);
+
+	record = XLogReadRecord(xlogreader, startpoint, &errormsg);
+	if (record == NULL)
+	{
+		fprintf(stderr, "could not read WAL starting at %X/%X",
+				(uint32) (startpoint >> 32),
+				(uint32) (startpoint));
+		if (errormsg)
+			fprintf(stderr, ": %s", errormsg);
+		fprintf(stderr, "\n");
+		exit(1);
+	}
+
+	do
+	{
+		extractPageInfo(xlogreader);
+
+		record = XLogReadRecord(xlogreader, InvalidXLogRecPtr, &errormsg);
+
+		if (errormsg)
+			fprintf(stderr, "error reading xlog record: %s\n", errormsg);
+	} while(record != NULL);
+
+	XLogReaderFree(xlogreader);
+	if (xlogreadfd != -1)
+	{
+		close(xlogreadfd);
+		xlogreadfd = -1;
+	}
+}
+
+/*
+ * Reads one WAL record. Returns the end position of the record, without
+ * doing anything the record itself.
+ */
+XLogRecPtr
+readOneRecord(const char *datadir, XLogRecPtr ptr, TimeLineID tli)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+	XLogPageReadPrivate private;
+	XLogRecPtr	endptr;
+
+	private.datadir = datadir;
+	private.tli = tli;
+	xlogreader = XLogReaderAllocate(&SimpleXLogPageRead, &private);
+
+	record = XLogReadRecord(xlogreader, ptr, &errormsg);
+	if (record == NULL)
+	{
+		fprintf(stderr, "could not read WAL record at %X/%X",
+				(uint32) (ptr >> 32), (uint32) (ptr));
+		if (errormsg)
+			fprintf(stderr, ": %s", errormsg);
+		fprintf(stderr, "\n");
+		exit(1);
+	}
+	endptr = xlogreader->EndRecPtr;
+
+	XLogReaderFree(xlogreader);
+	if (xlogreadfd != -1)
+	{
+		close(xlogreadfd);
+		xlogreadfd = -1;
+	}
+
+	return endptr;
+}
+
+/*
+ * Find the previous checkpoint preceding given WAL position.
+ */
+void
+findLastCheckpoint(const char *datadir, XLogRecPtr forkptr, TimeLineID tli,
+				   XLogRecPtr *lastchkptrec, TimeLineID *lastchkpttli,
+				   XLogRecPtr *lastchkptredo)
+{
+	/* Walk backwards, starting from the given record */
+	XLogRecord *record;
+	XLogRecPtr searchptr;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+	XLogPageReadPrivate private;
+
+
+	/*
+	 * The given fork pointer points to the end of the last common record,
+	 * which is not necessarily the beginning of the next record, if the
+	 * previous record happens to end at a page boundary. Skip over the
+	 * page header in that case to find the next record.
+	 */
+	if (forkptr % XLOG_BLCKSZ == 0)
+		forkptr += (forkptr % XLogSegSize == 0) ? SizeOfXLogLongPHD : SizeOfXLogShortPHD;
+
+	private.datadir = datadir;
+	private.tli = tli;
+	xlogreader = XLogReaderAllocate(&SimpleXLogPageRead, &private);
+
+	searchptr = forkptr;
+	for (;;)
+	{
+		uint8		info;
+
+		record = XLogReadRecord(xlogreader, searchptr, &errormsg);
+
+		if (record == NULL)
+		{
+			fprintf(stderr, "could not find previous WAL record at %X/%X",
+					(uint32) (searchptr >> 32),
+					(uint32) (searchptr));
+			if (errormsg)
+				fprintf(stderr, ": %s", errormsg);
+			fprintf(stderr, "\n");
+			exit(1);
+		}
+
+		/*
+		 * Check if it is a checkpoint record. This checkpoint record
+		 * needs to be the latest checkpoint before WAL forked and not
+		 * the checkpoint where the master has been stopped to be
+		 * rewinded.
+		 */
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+		if (searchptr < forkptr &&
+			XLogRecGetRmid(xlogreader) == RM_XLOG_ID &&
+			(info == XLOG_CHECKPOINT_SHUTDOWN || info == XLOG_CHECKPOINT_ONLINE))
+		{
+			CheckPoint	checkPoint;
+
+			memcpy(&checkPoint, XLogRecGetData(xlogreader), sizeof(CheckPoint));
+			*lastchkptrec = searchptr;
+			*lastchkpttli = checkPoint.ThisTimeLineID;
+			*lastchkptredo = checkPoint.redo;
+			break;
+		}
+
+		/* Walk backwards to previous record. */
+		searchptr = record->xl_prev;
+	}
+
+	XLogReaderFree(xlogreader);
+	if (xlogreadfd != -1)
+	{
+		close(xlogreadfd);
+		xlogreadfd = -1;
+	}
+}
+
+/* XLogreader callback function, to read a WAL page */
+int
+SimpleXLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
+				   int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
+				   TimeLineID *pageTLI)
+{
+	XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data;
+	uint32		targetPageOff;
+	XLogSegNo	targetSegNo PG_USED_FOR_ASSERTS_ONLY;
+
+	XLByteToSeg(targetPagePtr, targetSegNo);
+	targetPageOff = targetPagePtr % XLogSegSize;
+
+	/*
+	 * See if we need to switch to a new segment because the requested record
+	 * is not in the currently open one.
+	 */
+	if (xlogreadfd >= 0 && !XLByteInSeg(targetPagePtr, xlogreadsegno))
+	{
+		close(xlogreadfd);
+		xlogreadfd = -1;
+	}
+
+	XLByteToSeg(targetPagePtr, xlogreadsegno);
+
+	if (xlogreadfd < 0)
+	{
+		char		xlogfname[MAXFNAMELEN];
+
+		XLogFileName(xlogfname, private->tli, xlogreadsegno);
+
+		snprintf(xlogfpath, MAXPGPATH, "%s/" XLOGDIR "/%s", private->datadir, xlogfname);
+
+		xlogreadfd = open(xlogfpath, O_RDONLY | PG_BINARY, 0);
+
+		if (xlogreadfd < 0)
+		{
+			fprintf(stderr, "could not open file \"%s\": %s\n", xlogfpath,
+					strerror(errno));
+			return -1;
+		}
+	}
+
+	/*
+	 * At this point, we have the right segment open.
+	 */
+	Assert(xlogreadfd != -1);
+
+	/* Read the requested page */
+	if (lseek(xlogreadfd, (off_t) targetPageOff, SEEK_SET) < 0)
+	{
+		fprintf(stderr, "could not seek in file \"%s\": %s\n", xlogfpath,
+				strerror(errno));
+		return -1;
+	}
+
+	if (read(xlogreadfd, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
+	{
+		fprintf(stderr, "could not read from file \"%s\": %s\n", xlogfpath,
+				strerror(errno));
+		return -1;
+	}
+
+	Assert(targetSegNo == xlogreadsegno);
+
+	*pageTLI = private->tli;
+	return XLOG_BLCKSZ;
+}
+
+static void
+extractPageInfo(XLogReaderState *record)
+{
+#define pageinfo_set_truncation(forkno, rnode, blkno) datapagemap_set_truncation(pagemap, forkno, rnode, blkno)
+
+	int			block_id;
+	RmgrId		rmid = XLogRecGetRmid(record);
+	uint8		info = XLogRecGetInfo(record);
+	uint8		rminfo = info & ~XLR_INFO_MASK;
+
+	/* Is this a special record type that I recognize? */
+
+	if (rmid == RM_DBASE_ID && rminfo == XLOG_DBASE_CREATE)
+	{
+		/*
+		 * New databases can be safely ignored. It won't be present in the
+		 * remote system, so it will be copied in toto. There's one
+		 * corner-case, though: if a new, different, database is also created
+		 * in the remote system, we'll see that the files already exist and
+		 * not copy them. That's OK, though; WAL replay of creating the new
+		 * database, from the remote WAL, will re-copy the new database,
+		 * overwriting the database created in the local system.
+		 */
+	}
+	else if (rmid == RM_DBASE_ID && rminfo == XLOG_DBASE_DROP)
+	{
+		/*
+		 * An existing database was dropped. We'll see that the files don't
+		 * exist in local system, and copy them in toto from the remote
+		 * system. No need to do anything special here.
+		 */
+	}
+	else if (rmid == RM_SMGR_ID && rminfo == XLOG_SMGR_CREATE)
+	{
+		/*
+		 * We can safely ignore these. The local file will be removed, if it
+		 * doesn't exist in remote system. If a file with same name is created
+		 * in remote system, too, there will be WAL records for all the blocks
+		 * in it.
+		 */
+	}
+	else if (rmid == RM_SMGR_ID && rminfo == XLOG_SMGR_TRUNCATE)
+	{
+		/*
+		 * We can safely ignore these. If a file is truncated locally, we'll
+		 * notice that when we compare the sizes, and will copy the missing
+		 * tail from remote system.
+		 *
+		 * TODO: But it would be nice to do some sanity cross-checking here..
+		 */
+	}
+	else if (info & XLR_SPECIAL_REL_UPDATE)
+	{
+		/*
+		 * This record type modifies a relation file in some special
+		 * way, but we don't recognize the type. That's bad - we don't
+		 * know how to track that change.
+		 */
+		fprintf(stderr, "WAL record modifies a relation, but record type is not recognized\n");
+		fprintf(stderr, "lsn: %X/%X, rmgr: %s, info: %02X\n",
+			(uint32) (record->ReadRecPtr >> 32), (uint32) (record->ReadRecPtr),
+				RmgrNames[rmid], info);
+		exit(1);
+	}
+
+	for (block_id = 0; block_id <= record->max_block_id; block_id++)
+	{
+		RelFileNode rnode;
+		ForkNumber forknum;
+		BlockNumber blkno;
+
+		if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno))
+			continue;
+		process_block_change(forknum, rnode, blkno);
+	}
+}
diff --git a/contrib/pg_rewind/pg_rewind.c b/contrib/pg_rewind/pg_rewind.c
new file mode 100644
index 0000000..2fbf88a
--- /dev/null
+++ b/contrib/pg_rewind/pg_rewind.c
@@ -0,0 +1,542 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_rewind.c
+ *	  Synchronizes an old master server to a new timeline
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "pg_rewind.h"
+#include "fetch.h"
+#include "filemap.h"
+
+#include "access/timeline.h"
+#include "access/xlog_internal.h"
+#include "catalog/catversion.h"
+#include "catalog/pg_control.h"
+#include "storage/bufpage.h"
+
+#include "getopt_long.h"
+
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <time.h>
+#include <unistd.h>
+
+static void usage(const char *progname);
+
+static void createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli,
+				  XLogRecPtr checkpointloc);
+
+static void digestControlFile(ControlFileData *ControlFile, char *source, size_t size);
+static void updateControlFile(ControlFileData *ControlFile,
+							  char *datadir);
+static void sanityChecks(void);
+static void findCommonAncestorTimeline(XLogRecPtr *recptr, TimeLineID *tli);
+
+static ControlFileData ControlFile_target;
+static ControlFileData ControlFile_source;
+
+const char *progname;
+
+char *datadir_target = NULL;
+char *datadir_source = NULL;
+char *connstr_source = NULL;
+
+int verbose;
+int dry_run;
+
+static void
+usage(const char *progname)
+{
+	printf("%s resynchronizes a cluster with another copy of the cluster.\n\n", progname);
+	printf("Usage:\n  %s [OPTION]...\n\n", progname);
+	printf("Options:\n");
+	printf("  -D, --target-pgdata=DIRECTORY\n");
+	printf("                 existing data directory to modify\n");
+	printf("  --source-pgdata=DIRECTORY\n");
+	printf("                 source data directory to sync with\n");
+	printf("  --source-server=CONNSTR\n");
+	printf("                 source server to sync with\n");
+	printf("  -v             write a lot of progress messages\n");
+	printf("  -n, --dry-run  stop before modifying anything\n");
+	printf("  -V, --version  output version information, then exit\n");
+	printf("  -?, --help     show this help, then exit\n");
+	printf("\n");
+	printf("Report bugs to https://github.com/vmware/pg_rewind.\n");
+}
+
+
+int
+main(int argc, char **argv)
+{
+	static struct option long_options[] = {
+		{"help", no_argument, NULL, '?'},
+		{"target-pgdata", required_argument, NULL, 'D'},
+		{"source-pgdata", required_argument, NULL, 1},
+		{"source-server", required_argument, NULL, 2},
+		{"version", no_argument, NULL, 'V'},
+		{"dry-run", no_argument, NULL, 'n'},
+		{"verbose", no_argument, NULL, 'v'},
+		{NULL, 0, NULL, 0}
+	};
+	int			option_index;
+	int			c;
+	XLogRecPtr	divergerec;
+	TimeLineID	lastcommontli;
+	XLogRecPtr	chkptrec;
+	TimeLineID	chkpttli;
+	XLogRecPtr	chkptredo;
+	size_t		size;
+	char	   *buffer;
+	bool		rewind_needed;
+	ControlFileData	ControlFile;
+
+	progname = get_progname(argv[0]);
+
+	/* Set default parameter values */
+	verbose = 0;
+	dry_run = 0;
+
+	/* Process command-line arguments */
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage(progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_rewind " PG_REWIND_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt_long(argc, argv, "D:vn", long_options, &option_index)) != -1)
+	{
+		switch (c)
+		{
+			case '?':
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+			case ':':
+				exit(1);
+			case 'v':
+				verbose = 1;
+				break;
+			case 'n':
+				dry_run = 1;
+				break;
+
+			case 'D':	/* -D or --target-pgdata */
+				datadir_target = pg_strdup(optarg);
+				break;
+
+			case 1:		/* --source-pgdata */
+				datadir_source = pg_strdup(optarg);
+				break;
+			case 2:		/* --source-server */
+				connstr_source = pg_strdup(optarg);
+				break;
+		}
+	}
+
+	/* No source given? Show usage */
+	if (datadir_source == NULL && connstr_source == NULL)
+	{
+		fprintf(stderr, "%s: no source specified\n", progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	if (datadir_target == NULL)
+	{
+		fprintf(stderr, "%s: no target data directory specified\n", progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	if (argc != optind)
+	{
+		fprintf(stderr, "%s: invalid arguments\n", progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	/*
+	 * Connect to remote server
+	 */
+	if (connstr_source)
+		libpqConnect(connstr_source);
+
+	/*
+	 * Ok, we have all the options and we're ready to start. Read in all the
+	 * information we need from both clusters.
+	 */
+	buffer = slurpFile(datadir_target, "global/pg_control", &size);
+	digestControlFile(&ControlFile_target, buffer, size);
+	pg_free(buffer);
+
+	buffer = fetchFile("global/pg_control", &size);
+	digestControlFile(&ControlFile_source, buffer, size);
+	pg_free(buffer);
+
+	sanityChecks();
+
+	/*
+	 * If both clusters are already on the same timeline, there's nothing
+	 * to do.
+	 */
+	if (ControlFile_target.checkPointCopy.ThisTimeLineID == ControlFile_source.checkPointCopy.ThisTimeLineID)
+	{
+		fprintf(stderr, "source and target cluster are both on the same timeline.\n");
+		exit(1);
+	}
+
+	findCommonAncestorTimeline(&divergerec, &lastcommontli);
+	printf("The servers diverged at WAL position %X/%X on timeline %u.\n",
+		   (uint32) (divergerec >> 32), (uint32) divergerec, lastcommontli);
+
+	/*
+	 * Check for the possibility that the target is in fact a direct ancestor
+	 * of the source. In that case, there is no divergent history in the
+	 * target that needs rewinding.
+	 */
+	if (ControlFile_target.checkPoint >= divergerec)
+	{
+		rewind_needed = true;
+	}
+	else
+	{
+		XLogRecPtr chkptendrec;
+
+		/* Read the checkpoint record on the target to see where it ends. */
+		chkptendrec = readOneRecord(datadir_target,
+									ControlFile_target.checkPoint,
+									ControlFile_target.checkPointCopy.ThisTimeLineID);
+
+		/*
+		 * If the histories diverged exactly at the end of the shutdown
+		 * checkpoint record on the target, there are no WAL records in the
+		 * target that don't belong in the source's history, and no rewind is
+		 * needed.
+		 */
+		if (chkptendrec == divergerec)
+			rewind_needed = false;
+		else
+			rewind_needed = true;
+	}
+
+	if (!rewind_needed)
+	{
+		printf("No rewind required.\n");
+		exit(0);
+	}
+	findLastCheckpoint(datadir_target, divergerec, lastcommontli,
+					   &chkptrec, &chkpttli, &chkptredo);
+	printf("Rewinding from Last common checkpoint at %X/%X on timeline %u\n",
+		   (uint32) (chkptrec >> 32), (uint32) chkptrec,
+		   chkpttli);
+
+	/*
+	 * Build the filemap, by comparing the remote and local data directories
+	 */
+	(void) filemap_create();
+	fetchRemoteFileList();
+	traverse_datadir(datadir_target, &process_local_file);
+
+	/*
+	 * Read the target WAL from last checkpoint before the point of fork,
+	 * to extract all the pages that were modified on the target cluster
+	 * after the fork.
+	 */
+	extractPageMap(datadir_target, chkptrec, lastcommontli);
+
+	filemap_finalize();
+
+	/* XXX: this is probably too verbose even in verbose mode */
+	if (verbose)
+		print_filemap();
+
+	/* Ok, we're ready to start copying things over. */
+	executeFileMap();
+
+	createBackupLabel(chkptredo, chkpttli, chkptrec);
+
+	/*
+	 * Update control file of target file and make it ready to
+	 * perform archive recovery when restarting.
+	 */
+	memcpy(&ControlFile, &ControlFile_source, sizeof(ControlFileData));
+	ControlFile.minRecoveryPoint = divergerec;
+	ControlFile.minRecoveryPointTLI = ControlFile_target.checkPointCopy.ThisTimeLineID;
+	ControlFile.state = DB_IN_ARCHIVE_RECOVERY;
+	updateControlFile(&ControlFile, datadir_target);
+
+	printf("Done!\n");
+
+	return 0;
+}
+
+static void
+sanityChecks(void)
+{
+	/* Check that there's no backup_label in either cluster */
+	/* Check system_id match */
+	if (ControlFile_target.system_identifier != ControlFile_source.system_identifier)
+	{
+		fprintf(stderr, "source and target clusters are from different systems\n");
+		exit(1);
+	}
+	/* check version */
+	if (ControlFile_target.pg_control_version != PG_CONTROL_VERSION ||
+		ControlFile_source.pg_control_version != PG_CONTROL_VERSION ||
+		ControlFile_target.catalog_version_no != CATALOG_VERSION_NO ||
+		ControlFile_source.catalog_version_no != CATALOG_VERSION_NO)
+	{
+		fprintf(stderr, "clusters are not compatible with this version of pg_rewind\n");
+		exit(1);
+	}
+
+	/*
+	 * Target cluster need to use checksums or hint bit wal-logging, this to
+	 * prevent from data corruption that could occur because of hint bits.
+	 */
+	if (ControlFile_target.data_checksum_version != PG_DATA_CHECKSUM_VERSION &&
+		!ControlFile_target.wal_log_hints)
+	{
+		fprintf(stderr, "target master need to use either data checksums or \"wal_log_hints = on\".\n");
+		exit(1);
+	}
+
+	/*
+	 * Target cluster better not be running. This doesn't guard against someone
+	 * starting the cluster concurrently. Also, this is probably more strict
+	 * than necessary; it's OK if the master was not shut down cleanly, as
+	 * long as it isn't running at the moment.
+	 */
+	if (ControlFile_target.state != DB_SHUTDOWNED)
+	{
+		fprintf(stderr, "target master must be shut down cleanly.\n");
+		exit(1);
+	}
+}
+
+/*
+ * Determine the TLI of the last common timeline in the histories of the two
+ * clusters. *tli is set to the last common timeline, and *recptr is set to
+ * the position where the histories diverged (ie. the first WAL record that's
+ * not the same in both clusters).
+ *
+ * Control files of both clusters must be read into ControlFile_target/source
+ * before calling this.
+ */
+static void
+findCommonAncestorTimeline(XLogRecPtr *recptr, TimeLineID *tli)
+{
+	TimeLineID	targettli;
+	TimeLineHistoryEntry *sourceHistory;
+	int			nentries;
+	int			i;
+	TimeLineID	sourcetli;
+
+	targettli = ControlFile_target.checkPointCopy.ThisTimeLineID;
+	sourcetli = ControlFile_source.checkPointCopy.ThisTimeLineID;
+
+	/* Timeline 1 does not have a history file, so no need to check */
+	if (sourcetli == 1)
+	{
+		sourceHistory = (TimeLineHistoryEntry *) pg_malloc(sizeof(TimeLineHistoryEntry));
+		sourceHistory->tli = sourcetli;
+		sourceHistory->begin = sourceHistory->end = InvalidXLogRecPtr;
+		nentries = 1;
+	}
+	else
+	{
+		char		path[MAXPGPATH];
+		char	   *histfile;
+
+		TLHistoryFilePath(path, sourcetli);
+		histfile = fetchFile(path, NULL);
+
+		sourceHistory = rewind_parseTimeLineHistory(histfile,
+													ControlFile_source.checkPointCopy.ThisTimeLineID,
+													&nentries);
+		pg_free(histfile);
+	}
+
+	/*
+	 * Trace the history backwards, until we hit the target timeline.
+	 *
+	 * TODO: This assumes that there are no timeline switches on the target
+	 * cluster after the fork.
+	 */
+	for (i = nentries - 1; i >= 0; i--)
+	{
+		TimeLineHistoryEntry *entry = &sourceHistory[i];
+		if (entry->tli == targettli)
+		{
+			/* found it */
+			*recptr = entry->end;
+			*tli = entry->tli;
+
+			free(sourceHistory);
+			return;
+		}
+	}
+
+	fprintf(stderr, "could not find common ancestor of the source and target cluster's timelines\n");
+	exit(1);
+}
+
+
+/*
+ * Create a backup_label file that forces recovery to begin at the last common
+ * checkpoint.
+ */
+static void
+createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli, XLogRecPtr checkpointloc)
+{
+	XLogSegNo	startsegno;
+	char		BackupLabelFilePath[MAXPGPATH];
+	FILE	   *fp;
+	time_t		stamp_time;
+	char		strfbuf[128];
+	char		xlogfilename[MAXFNAMELEN];
+	struct tm  *tmp;
+
+	if (dry_run)
+		return;
+
+	XLByteToSeg(startpoint, startsegno);
+	XLogFileName(xlogfilename, starttli, startsegno);
+
+	/*
+	 * TODO: move old file out of the way, if any. And perhaps create the
+	 * file with temporary name first and rename in place after it's done.
+	 */
+	snprintf(BackupLabelFilePath, MAXPGPATH,
+			 "%s/backup_label" /* BACKUP_LABEL_FILE */, datadir_target);
+
+	/*
+	 * Construct backup label file
+	 */
+
+	fp = fopen(BackupLabelFilePath, "wb");
+
+	stamp_time = time(NULL);
+	tmp = localtime(&stamp_time);
+	strftime(strfbuf, sizeof(strfbuf), "%Y-%m-%d %H:%M:%S %Z", tmp);
+	fprintf(fp, "START WAL LOCATION: %X/%X (file %s)\n",
+			(uint32) (startpoint >> 32), (uint32) startpoint, xlogfilename);
+	fprintf(fp, "CHECKPOINT LOCATION: %X/%X\n",
+			(uint32) (checkpointloc >> 32), (uint32) checkpointloc);
+	fprintf(fp, "BACKUP METHOD: pg_rewind\n");
+	fprintf(fp, "BACKUP FROM: standby\n");
+	fprintf(fp, "START TIME: %s\n", strfbuf);
+	/* omit LABEL: line */
+
+	if (fclose(fp) != 0)
+	{
+		fprintf(stderr, _("could not write backup label file \"%s\": %s\n"),
+				BackupLabelFilePath, strerror(errno));
+		exit(2);
+	}
+}
+
+/*
+ * Check CRC of control file
+ */
+static void
+checkControlFile(ControlFileData *ControlFile)
+{
+	pg_crc32	crc;
+
+	/* Calculate CRC */
+	INIT_CRC32C(crc);
+	COMP_CRC32C(crc,
+			   (char *) ControlFile,
+			   offsetof(ControlFileData, crc));
+	FIN_CRC32C(crc);
+
+	/* And simply compare it */
+	if (!EQ_CRC32C(crc, ControlFile->crc))
+	{
+		fprintf(stderr, "unexpected control file CRC\n");
+		exit(1);
+	}
+}
+
+/*
+ * Verify control file contents in the buffer src, and copy it to *ControlFile.
+ */
+static void
+digestControlFile(ControlFileData *ControlFile, char *src, size_t size)
+{
+	if (size != PG_CONTROL_SIZE)
+	{
+		fprintf(stderr, "unexpected control file size %d, expected %d\n",
+				(int) size, PG_CONTROL_SIZE);
+		exit(1);
+	}
+	memcpy(ControlFile, src, sizeof(ControlFileData));
+
+	/* Additional checks on control file */
+	checkControlFile(ControlFile);
+}
+
+/*
+ * Update a control file with fresh content
+ */
+static void
+updateControlFile(ControlFileData *ControlFile, char *datadir)
+{
+	char    path[MAXPGPATH];
+	char	buffer[PG_CONTROL_SIZE];
+	FILE    *fp;
+
+	if (dry_run)
+		return;
+
+	/* Recalculate CRC of control file */
+	INIT_CRC32C(ControlFile->crc);
+	COMP_CRC32C(ControlFile->crc,
+			   (char *) ControlFile,
+			   offsetof(ControlFileData, crc));
+	FIN_CRC32C(ControlFile->crc);
+
+	/*
+	 * Write out PG_CONTROL_SIZE bytes into pg_control by zero-padding
+	 * the excess over sizeof(ControlFileData) to avoid premature EOF
+	 * related errors when reading it.
+	 */
+	memset(buffer, 0, PG_CONTROL_SIZE);
+	memcpy(buffer, ControlFile, sizeof(ControlFileData));
+
+	snprintf(path, MAXPGPATH,
+			 "%s/global/pg_control", datadir);
+
+	if ((fp  = fopen(path, "wb")) == NULL)
+	{
+		fprintf(stderr,"Could not open the pg_control file\n");
+		exit(1);
+	}
+
+	if (fwrite(buffer, 1,
+			   PG_CONTROL_SIZE, fp) != PG_CONTROL_SIZE)
+	{
+		fprintf(stderr,"Could not write the pg_control file\n");
+		exit(1);
+	}
+
+	if (fclose(fp))
+	{
+		fprintf(stderr,"Could not close the pg_control file\n");
+		exit(1);
+	}
+ }
diff --git a/contrib/pg_rewind/pg_rewind.h b/contrib/pg_rewind/pg_rewind.h
new file mode 100644
index 0000000..8b7b5d8
--- /dev/null
+++ b/contrib/pg_rewind/pg_rewind.h
@@ -0,0 +1,46 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_rewind.h
+ *
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ * Portions Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_REWIND_H
+#define PG_REWIND_H
+
+#include "c.h"
+
+#include "datapagemap.h"
+#include "util.h"
+
+#include "access/timeline.h"
+#include "storage/block.h"
+#include "storage/relfilenode.h"
+
+#define PG_REWIND_VERSION "0.1"
+
+/* Configuration options */
+extern char *datadir_target;
+extern char *datadir_source;
+extern char *connstr_source;
+extern int verbose;
+extern int dry_run;
+
+
+/* in parsexlog.c */
+extern void extractPageMap(const char *datadir, XLogRecPtr startpoint, TimeLineID tli);
+extern void findLastCheckpoint(const char *datadir, XLogRecPtr searchptr, TimeLineID tli,
+				   XLogRecPtr *lastchkptrec, TimeLineID *lastchkpttli,
+				   XLogRecPtr *lastchkptredo);
+extern XLogRecPtr readOneRecord(const char *datadir, XLogRecPtr ptr, TimeLineID tli);
+
+
+/* in timeline.c */
+extern TimeLineHistoryEntry *rewind_parseTimeLineHistory(char *buffer,
+		    TimeLineID targetTLI, int *nentries);
+
+#endif   /* PG_REWIND_H */
diff --git a/contrib/pg_rewind/sql/basictest.sql b/contrib/pg_rewind/sql/basictest.sql
new file mode 100644
index 0000000..cee59c2
--- /dev/null
+++ b/contrib/pg_rewind/sql/basictest.sql
@@ -0,0 +1,53 @@
+#!/bin/bash
+
+# This file has the .sql extension, but it is actually launched as a shell
+# script. This contortion is necessary because pg_regress normally uses
+# psql to run the input scripts, and requires them to have the .sql
+# extension, but we use a custom launcher script that runs the scripts using
+# a shell instead.
+
+TESTNAME=basictest
+
+. sql/config_test.sh
+
+# Do an insert in master.
+function before_standby
+{
+$MASTER_PSQL <<EOF
+CREATE TABLE tbl1 (d text);
+INSERT INTO tbl1 VALUES ('in master');
+CHECKPOINT;
+EOF
+}
+
+function standby_following_master
+{
+# Insert additional data on master that will be replicated to standby
+$MASTER_PSQL -c "INSERT INTO tbl1 values ('in master, before promotion');"
+
+# Launch checkpoint after standby has been started
+$MASTER_PSQL -c "CHECKPOINT;"
+}
+
+# This script runs after the standby has been promoted. Old Master is still
+# running.
+function after_promotion
+{
+# Insert a row in the old master. This causes the master and standby to have
+# "diverged", it's no longer possible to just apply the standy's logs over
+# master directory - you need to rewind.
+$MASTER_PSQL -c "INSERT INTO tbl1 VALUES ('in master, after promotion');"
+
+# Also insert a new row in the standby, which won't be present in the old
+# master.
+$STANDBY_PSQL -c "INSERT INTO tbl1 VALUES ('in standby, after promotion');"
+}
+
+# Compare results generated by querying new master after rewind
+function after_rewind
+{
+$MASTER_PSQL -c "SELECT * from tbl1"
+}
+
+# Run the test
+. sql/run_test.sh
diff --git a/contrib/pg_rewind/sql/config_test.sh b/contrib/pg_rewind/sql/config_test.sh
new file mode 100644
index 0000000..0baa468
--- /dev/null
+++ b/contrib/pg_rewind/sql/config_test.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+#
+# Initialize some variables, before running pg_rewind.sh.
+
+set -e
+
+mkdir -p "regress_log"
+log_path="regress_log/pg_rewind_log_${TESTNAME}_${TEST_SUITE}"
+: ${MAKE=make}
+
+rm -f $log_path
+
+# Guard against parallel make issues (see comments in pg_regress.c)
+unset MAKEFLAGS
+unset MAKELEVEL
+
+# Check at least that the option given is suited
+if [ "$TEST_SUITE" = 'remote' ]; then
+	echo "Running tests with libpq connection as source" >>$log_path 2>&1
+	TEST_SUITE="remote"
+elif [ "$TEST_SUITE" = 'local' ]; then
+	echo "Running tests with local data folder as source" >>$log_path 2>&1
+	TEST_SUITE="local"
+else
+	echo "TEST_SUITE environment variable must be set to either \"local\" or \"remote\""
+	exit 1
+fi
+
+# Set listen_addresses desirably
+testhost=`uname -s`
+case $testhost in
+	MINGW*) LISTEN_ADDRESSES="localhost" ;;
+	*)      LISTEN_ADDRESSES="" ;;
+esac
+
+# Indicate of binaries
+PATH=$bindir:$PATH
+export PATH
+
+# Adjust these paths for your environment
+TESTROOT=$PWD/tmp_check
+TEST_MASTER=$TESTROOT/data_master
+TEST_STANDBY=$TESTROOT/data_standby
+
+# Create the root folder for test data
+mkdir -p $TESTROOT
+
+# Clear out any environment vars that might cause libpq to connect to
+# the wrong postmaster (cf pg_regress.c)
+#
+# Some shells, such as NetBSD's, return non-zero from unset if the variable
+# is already unset. Since we are operating under 'set -e', this causes the
+# script to fail. To guard against this, set them all to an empty string first.
+PGDATABASE="";        unset PGDATABASE
+PGUSER="";            unset PGUSER
+PGSERVICE="";         unset PGSERVICE
+PGSSLMODE="";         unset PGSSLMODE
+PGREQUIRESSL="";      unset PGREQUIRESSL
+PGCONNECT_TIMEOUT=""; unset PGCONNECT_TIMEOUT
+PGHOST="";            unset PGHOST
+PGHOSTADDR="";        unset PGHOSTADDR
+
+export PGDATABASE="postgres"
+
+# Define non conflicting ports for both nodes, this could be a bit
+# smarter with for example dynamic port recognition using psql but
+# this will make it for now.
+PG_VERSION_NUM=90401
+PORT_MASTER=`expr $PG_VERSION_NUM % 16384 + 49152`
+PORT_STANDBY=`expr $PORT_MASTER + 1`
+
+MASTER_PSQL="psql -a --no-psqlrc -p $PORT_MASTER"
+STANDBY_PSQL="psql -a --no-psqlrc -p $PORT_STANDBY"
diff --git a/contrib/pg_rewind/sql/databases.sql b/contrib/pg_rewind/sql/databases.sql
new file mode 100644
index 0000000..60520d2
--- /dev/null
+++ b/contrib/pg_rewind/sql/databases.sql
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+# This file has the .sql extension, but it is actually launched as a shell
+# script. This contortion is necessary because pg_regress normally uses
+# psql to run the input scripts, and requires them to have the .sql
+# extension, but we use a custom launcher script that runs the scripts using
+# a shell instead.
+
+TESTNAME=databases
+
+. sql/config_test.sh
+
+# Create a database in master.
+function before_standby
+{
+$MASTER_PSQL <<EOF
+CREATE DATABASE inmaster;
+EOF
+}
+
+function standby_following_master
+{
+# Create another database after promotion
+$MASTER_PSQL -c "CREATE DATABASE beforepromotion"
+}
+
+# This script runs after the standby has been promoted. Old Master is still
+# running.
+function after_promotion
+{
+$MASTER_PSQL -c "CREATE DATABASE master_afterpromotion"
+
+$STANDBY_PSQL -c "CREATE DATABASE standby_afterpromotion"
+}
+
+# Compare results generated by querying new master after rewind
+function after_rewind
+{
+$MASTER_PSQL -c "SELECT datname from pg_database"
+}
+
+# Run the test
+. sql/run_test.sh
diff --git a/contrib/pg_rewind/sql/extrafiles.sql b/contrib/pg_rewind/sql/extrafiles.sql
new file mode 100644
index 0000000..8512369
--- /dev/null
+++ b/contrib/pg_rewind/sql/extrafiles.sql
@@ -0,0 +1,52 @@
+#!/bin/bash
+
+# This file has the .sql extension, but it is actually launched as a shell
+# script. This contortion is necessary because pg_regress normally uses
+# psql to run the input scripts, and requires them to have the .sql
+# extension, but we use a custom launcher script that runs the scripts using
+# a shell instead.
+
+# Test how pg_rewind reacts to extra files and directories in the data dirs.
+
+TESTNAME=extrafiles
+
+. sql/config_test.sh
+
+# Create a subdir that will be present in both
+function before_standby
+{
+  mkdir $TEST_MASTER/tst_both_dir
+  echo "in both1" > $TEST_MASTER/tst_both_dir/both_file1
+  echo "in both2" > $TEST_MASTER/tst_both_dir/both_file2
+  mkdir $TEST_MASTER/tst_both_dir/both_subdir/
+  echo "in both3" > $TEST_MASTER/tst_both_dir/both_subdir/both_file3
+}
+
+# Create subdirs that will be present only in one data dir.
+function standby_following_master
+{
+  mkdir $TEST_STANDBY/tst_standby_dir
+  echo "in standby1" > $TEST_STANDBY/tst_standby_dir/standby_file1
+  echo "in standby2" > $TEST_STANDBY/tst_standby_dir/standby_file2
+  mkdir $TEST_STANDBY/tst_standby_dir/standby_subdir/
+  echo "in standby3" > $TEST_STANDBY/tst_standby_dir/standby_subdir/standby_file3
+  mkdir $TEST_MASTER/tst_master_dir
+  echo "in master1" > $TEST_MASTER/tst_master_dir/master_file1
+  echo "in master2" > $TEST_MASTER/tst_master_dir/master_file2
+  mkdir $TEST_MASTER/tst_master_dir/master_subdir/
+  echo "in master3" > $TEST_MASTER/tst_master_dir/master_subdir/master_file3
+}
+
+function after_promotion
+{
+  :
+}
+
+# See what files and directories are present after rewind.
+function after_rewind
+{
+    (cd $TEST_MASTER; find tst_* | sort)
+}
+
+# Run the test
+. sql/run_test.sh
diff --git a/contrib/pg_rewind/sql/run_test.sh b/contrib/pg_rewind/sql/run_test.sh
new file mode 100644
index 0000000..bad9993
--- /dev/null
+++ b/contrib/pg_rewind/sql/run_test.sh
@@ -0,0 +1,146 @@
+#!/bin/bash
+#
+# pg_rewind.sh
+#
+# Test driver for pg_rewind. This test script initdb's and configures a
+# cluster and creates a table with some data in it. Then, it makes a
+# standby of it with pg_basebackup, and promotes the standby.
+#
+# The result is two clusters, so that the old "master" cluster can be
+# resynchronized with pg_rewind to catch up with the new "standby" cluster.
+# This test can be run with either a local data folder or a remote
+# connection as source.
+#
+# Before running this script, the calling script should've included
+# config_test.sh, and defined four functions to define the test case:
+#
+#  before_standby  - runs after initializing the master, before creating the
+#                    standby
+#  standby_following_master - runs after standby has been created and started
+#  after_promotion - runs after standby has been promoted, but old master is
+#                    still running
+#  after_rewind    - runs after pg_rewind and after restarting the rewound
+#                    old master
+#
+# In those functions, the test script can use $MASTER_PSQL and $STANDBY_PSQL
+# to run psql against the master and standby servers, to cause the servers
+# to diverge.
+
+# Initialize master, data checksums are mandatory
+rm -rf $TEST_MASTER
+initdb -N -A trust -D $TEST_MASTER >>$log_path
+
+# Custom parameters for master's postgresql.conf
+cat >> $TEST_MASTER/postgresql.conf <<EOF
+wal_level = hot_standby
+max_wal_senders = 2
+wal_keep_segments = 20
+checkpoint_segments = 50
+shared_buffers = 1MB
+wal_log_hints = on
+log_line_prefix = 'M  %m %p '
+hot_standby = on
+autovacuum = off
+max_connections = 50
+listen_addresses = '$LISTEN_ADDRESSES'
+port = $PORT_MASTER
+EOF
+
+# Accept replication connections on master
+cat >> $TEST_MASTER/pg_hba.conf <<EOF
+local replication all trust
+host replication all 127.0.0.1/32 trust
+host replication all ::1/128 trust
+EOF
+
+pg_ctl -w -D $TEST_MASTER start >>$log_path 2>&1
+
+#### Now run the test-specific parts to initialize the master before setting
+# up standby
+echo "Master initialized and running."
+before_standby
+
+# Set up standby with necessary parameter
+rm -rf $TEST_STANDBY
+
+# Base backup is taken with xlog files included
+pg_basebackup -D $TEST_STANDBY -p $PORT_MASTER -x >>$log_path 2>&1
+echo "port = $PORT_STANDBY" >> $TEST_STANDBY/postgresql.conf
+
+cat > $TEST_STANDBY/recovery.conf <<EOF
+primary_conninfo='port=$PORT_MASTER'
+standby_mode=on
+recovery_target_timeline='latest'
+EOF
+
+# Start standby
+pg_ctl -w -D $TEST_STANDBY start >>$log_path 2>&1
+
+#### Now run the test-specific parts to run after standby has been started
+# up standby
+echo "Standby initialized and running."
+standby_following_master
+
+# sleep a bit to make sure the standby has caught up.
+sleep 1
+
+# Now promote slave and insert some new data on master, this will put
+# the master out-of-sync with the standby.
+pg_ctl -w -D $TEST_STANDBY promote >>$log_path 2>&1
+sleep 1
+
+#### Now run the test-specific parts to run after promotion
+echo "Standby promoted."
+after_promotion
+
+# Stop the master and be ready to perform the rewind
+pg_ctl -w -D $TEST_MASTER stop -m fast >>$log_path 2>&1
+
+# At this point, the rewind processing is ready to run.
+# We now have a very simple scenario with a few diverged WAL record.
+# The real testing begins really now with a bifurcation of the possible
+# scenarios that pg_rewind supports.
+
+# Keep a temporary postgresql.conf for master node or it would be
+# overwritten during the rewind.
+cp $TEST_MASTER/postgresql.conf $TESTROOT/master-postgresql.conf.tmp
+
+# Now run pg_rewind
+echo "Running pg_rewind..."
+echo "Running pg_rewind..." >> $log_path
+if [ $TEST_SUITE == "local" ]; then
+	# Do rewind using a local pgdata as source
+	pg_rewind \
+		--source-pgdata=$TEST_STANDBY \
+		--target-pgdata=$TEST_MASTER >>$log_path 2>&1
+elif [ $TEST_SUITE == "remote" ]; then
+	# Do rewind using a remote connection as source
+	pg_rewind \
+		--source-server="port=$PORT_STANDBY dbname=postgres" \
+		--target-pgdata=$TEST_MASTER >>$log_path 2>&1
+else
+	# Cannot come here normally
+	echo "Incorrect test suite specified"
+	exit 1
+fi
+
+# Now move back postgresql.conf with old settings
+mv $TESTROOT/master-postgresql.conf.tmp $TEST_MASTER/postgresql.conf
+
+# Plug-in rewound node to the now-promoted standby node
+cat > $TEST_MASTER/recovery.conf <<EOF
+primary_conninfo='port=$PORT_STANDBY'
+standby_mode=on
+recovery_target_timeline='latest'
+EOF
+
+# Restart the master to check that rewind went correctly
+pg_ctl -w -D $TEST_MASTER start >>$log_path 2>&1
+
+#### Now run the test-specific parts to check the result
+echo "Old master restarted after rewind."
+after_rewind
+
+# Stop remaining servers
+pg_ctl stop -D $TEST_MASTER -m fast -w >>$log_path 2>&1
+pg_ctl stop -D $TEST_STANDBY -m fast -w >>$log_path 2>&1
diff --git a/contrib/pg_rewind/timeline.c b/contrib/pg_rewind/timeline.c
new file mode 100644
index 0000000..e96d010
--- /dev/null
+++ b/contrib/pg_rewind/timeline.c
@@ -0,0 +1,132 @@
+/*-------------------------------------------------------------------------
+ *
+ * timeline.c
+ *	  timeline-related functions.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "pg_rewind.h"
+
+#include "access/timeline.h"
+#include "access/xlog_internal.h"
+
+/*
+ * This is copy-pasted from the backend readTimeLineHistory, modified to
+ * return a malloc'd array and to work without backend functions.
+ */
+/*
+ * Try to read a timeline's history file.
+ *
+ * If successful, return the list of component TLIs (the given TLI followed by
+ * its ancestor TLIs).	If we can't find the history file, assume that the
+ * timeline has no parents, and return a list of just the specified timeline
+ * ID.
+ */
+TimeLineHistoryEntry *
+rewind_parseTimeLineHistory(char *buffer, TimeLineID targetTLI, int *nentries)
+{
+	char	   *fline;
+	TimeLineHistoryEntry *entry;
+	TimeLineHistoryEntry *entries = NULL;
+	int			nlines = 0;
+	TimeLineID	lasttli = 0;
+	XLogRecPtr	prevend;
+	char	   *bufptr;
+	bool		lastline = false;
+
+	/*
+	 * Parse the file...
+	 */
+	prevend = InvalidXLogRecPtr;
+	bufptr = buffer;
+	while (!lastline)
+	{
+		char	   *ptr;
+		TimeLineID	tli;
+		uint32		switchpoint_hi;
+		uint32		switchpoint_lo;
+		int			nfields;
+
+		fline = bufptr;
+		while (*bufptr && *bufptr != '\n')
+			bufptr++;
+		if (!(*bufptr))
+			lastline = true;
+		else
+			*bufptr++ = '\0';
+
+		/* skip leading whitespace and check for # comment */
+		for (ptr = fline; *ptr; ptr++)
+		{
+			if (!isspace((unsigned char) *ptr))
+				break;
+		}
+		if (*ptr == '\0' || *ptr == '#')
+			continue;
+
+		nfields = sscanf(fline, "%u\t%X/%X", &tli, &switchpoint_hi, &switchpoint_lo);
+
+		if (nfields < 1)
+		{
+			/* expect a numeric timeline ID as first field of line */
+			fprintf(stderr, "syntax error in history file: %s\n", fline);
+			fprintf(stderr, "Expected a numeric timeline ID.\n");
+		}
+		if (nfields != 3)
+		{
+			fprintf(stderr, "syntax error in history file: %s\n", fline);
+			fprintf(stderr, "Expected an XLOG switchpoint location.\n");
+		}
+		if (entries && tli <= lasttli)
+		{
+			fprintf(stderr, "invalid data in history file: %s\n", fline);
+			fprintf(stderr, "Timeline IDs must be in increasing sequence.\n");
+		}
+
+		lasttli = tli;
+
+		nlines++;
+		if (entries)
+			entries = pg_realloc(entries, nlines * sizeof(TimeLineHistoryEntry));
+		else
+			entries = pg_malloc(1 * sizeof(TimeLineHistoryEntry));
+
+		entry = &entries[nlines - 1];
+		entry->tli = tli;
+		entry->begin = prevend;
+		entry->end = ((uint64) (switchpoint_hi)) << 32 | (uint64) switchpoint_lo;
+		prevend = entry->end;
+
+		/* we ignore the remainder of each line */
+	}
+
+	if (entries && targetTLI <= lasttli)
+	{
+		fprintf(stderr, "invalid data in history file\n");
+		fprintf(stderr, "Timeline IDs must be less than child timeline's ID.\n");
+		exit(1);
+	}
+
+	/*
+	 * Create one more entry for the "tip" of the timeline, which has no
+	 * entry in the history file.
+	 */
+	nlines++;
+	if (entries)
+		entries = pg_realloc(entries, nlines * sizeof(TimeLineHistoryEntry));
+	else
+		entries = pg_malloc(1 * sizeof(TimeLineHistoryEntry));
+
+	entry = &entries[nlines - 1];
+	entry->tli = targetTLI;
+	entry->begin = prevend;
+	entry->end = InvalidXLogRecPtr;
+
+	*nentries = nlines;
+	return entries;
+}
diff --git a/contrib/pg_rewind/util.c b/contrib/pg_rewind/util.c
new file mode 100644
index 0000000..0261fee
--- /dev/null
+++ b/contrib/pg_rewind/util.c
@@ -0,0 +1,32 @@
+/*-------------------------------------------------------------------------
+ *
+ * util.c
+ *		Misc utility functions
+ *
+ * Portions Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "common/relpath.h"
+#include "catalog/catalog.h"
+#include "catalog/pg_tablespace.h"
+
+#include "pg_rewind.h"
+
+char *
+datasegpath(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
+{
+	char *path = relpathperm(rnode, forknum);
+
+	if (segno > 0)
+	{
+		char *segpath = pg_malloc(strlen(path) + 13);
+		sprintf(segpath, "%s.%u", path, segno);
+		pg_free(path);
+		return segpath;
+	}
+	else
+		return path;
+}
diff --git a/contrib/pg_rewind/util.h b/contrib/pg_rewind/util.h
new file mode 100644
index 0000000..205c55c
--- /dev/null
+++ b/contrib/pg_rewind/util.h
@@ -0,0 +1,15 @@
+/*-------------------------------------------------------------------------
+ *
+ * util.h
+ *		Prototypes for functions in util.c
+ *
+ * Portions Copyright (c) 2013 VMware, Inc. All Rights Reserved.
+ *-------------------------------------------------------------------------
+ */
+#ifndef UTIL_H
+#define UTIL_H
+
+extern char *datasegpath(RelFileNode rnode, ForkNumber forknum,
+	    BlockNumber segno);
+
+#endif   /* UTIL_H */
#2Andres Freund
andres@2ndquadrant.com
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind in contrib

Hi,

On 2014-12-12 16:13:13 +0200, Heikki Linnakangas wrote:

I'd like to include pg_rewind in contrib. I originally wrote it as an
external project so that I could quickly get it working with the existing
versions, and because I didn't feel it was quite ready for production use
yet. Now, with the WAL format changes in master, it is a lot more
maintainable than before. Many bugs have been fixed since the first
prototypes, and I think it's fairly robust now.

Obviously there's a need for a fair amount of review, but generally I
think it should be included.

Not sure if the copyright notices in the current form are actually ok?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#2)
Re: pg_rewind in contrib

On Fri, Dec 12, 2014 at 03:20:47PM +0100, Andres Freund wrote:

Hi,

On 2014-12-12 16:13:13 +0200, Heikki Linnakangas wrote:

I'd like to include pg_rewind in contrib. I originally wrote it as an
external project so that I could quickly get it working with the existing
versions, and because I didn't feel it was quite ready for production use
yet. Now, with the WAL format changes in master, it is a lot more
maintainable than before. Many bugs have been fixed since the first
prototypes, and I think it's fairly robust now.

Obviously there's a need for a fair amount of review, but generally I
think it should be included.

I certainly think it is useful enough to be in /contrib.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Michael Paquier
michael.paquier@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind in contrib

On Fri, Dec 12, 2014 at 11:13 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

I'd like to include pg_rewind in contrib. I originally wrote it as an
external project so that I could quickly get it working with the existing
versions, and because I didn't feel it was quite ready for production use
yet. Now, with the WAL format changes in master, it is a lot more
maintainable than before. Many bugs have been fixed since the first
prototypes, and I think it's fairly robust now.

I propose that we include pg_rewind in contrib/ now. Attached is a patch for
that. It just includes the latest sources from the current pg_rewind
repository at https://github.com/vmware/pg_rewind. It is released under the
PostgreSQL license.

For those who are not familiar with pg_rewind, it's a tool that allows
repurposing an old master server as a new standby server, after promotion,
even if the old master was not shut down cleanly. That's a very often
requested feature.

Indeed the code got quite cleaner with the new WAL API. Btw, gitignore
has many unnecessary entries.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Andres Freund (#2)
Re: pg_rewind in contrib

On 12/12/2014 04:20 PM, Andres Freund wrote:

Not sure if the copyright notices in the current form are actually ok?

Hmm. We do have such copyright notices in the source tree, but I know
that we're trying to avoid it in new code. They had to be there when the
code lived as a separate project, but now that I'm contributing this to
PostgreSQL proper, I can remove them if necessary.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind in contrib

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

I'd like to include pg_rewind in contrib.

I don't object to adding the tool as such, but let's wait to see what
happens with Peter's proposal to move contrib command-line tools into
src/bin/. If it should be there it'd be less code churn if it went
into there in the first place.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7David Fetter
david@fetter.org
In reply to: Tom Lane (#6)
Re: pg_rewind in contrib

On Fri, Dec 12, 2014 at 10:06:32AM -0500, Tom Lane wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

I'd like to include pg_rewind in contrib.

I don't object to adding the tool as such, but let's wait to see
what happens with Peter's proposal to move contrib command-line
tools into src/bin/. If it should be there it'd be less code churn
if it went into there in the first place.

+1 for putting it directly in src/bin.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Samrat Revagade
revagade.samrat@gmail.com
In reply to: David Fetter (#7)
Re: pg_rewind in contrib

On Sat, Dec 13, 2014 at 10:49 PM, David Fetter <david@fetter.org> wrote:

On Fri, Dec 12, 2014 at 10:06:32AM -0500, Tom Lane wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

I'd like to include pg_rewind in contrib.

I don't object to adding the tool as such, but let's wait to see
what happens with Peter's proposal to move contrib command-line
tools into src/bin/. If it should be there it'd be less code churn
if it went into there in the first place.

+1 for putting it directly in src/bin.

Yeah, +1 for putting it under src/bin.

#9Michael Paquier
michael.paquier@gmail.com
In reply to: Heikki Linnakangas (#5)
Re: pg_rewind in contrib

On Sat, Dec 13, 2014 at 12:01 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

On 12/12/2014 04:20 PM, Andres Freund wrote:

Not sure if the copyright notices in the current form are actually ok?

Hmm. We do have such copyright notices in the source tree, but I know that
we're trying to avoid it in new code. They had to be there when the code
lived as a separate project, but now that I'm contributing this to
PostgreSQL proper, I can remove them if necessary.

Yep, it is fine to remove those copyright notices and to keep only the
references to PGDG when code is integrated in the tree.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Michael Paquier
michael.paquier@gmail.com
In reply to: Michael Paquier (#9)
Re: pg_rewind in contrib

On Tue, Dec 16, 2014 at 9:32 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Sat, Dec 13, 2014 at 12:01 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

On 12/12/2014 04:20 PM, Andres Freund wrote:

Not sure if the copyright notices in the current form are actually ok?

Hmm. We do have such copyright notices in the source tree, but I know that
we're trying to avoid it in new code. They had to be there when the code
lived as a separate project, but now that I'm contributing this to
PostgreSQL proper, I can remove them if necessary.

Yep, it is fine to remove those copyright notices and to keep only the
references to PGDG when code is integrated in the tree.

In any case, I have a couple of comments about this patch as-is:
- If the move to src/bin is done, let's update the MSVC scripts accordingly
- contrib/pg_rewind/.gitignore should be cleaned up from its unnecessary entries
- README is incorrect, it is still mentioned for example that this
code should be copied inside PostgreSQL code tree as contrib/pg_rewind
- Code is going to need a brush to clean up things of this type:

--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Michael Paquier
michael.paquier@gmail.com
In reply to: Michael Paquier (#10)
Re: pg_rewind in contrib

On Tue, Dec 16, 2014 at 10:26 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

In any case, I have a couple of comments about this patch as-is:
- If the move to src/bin is done, let's update the MSVC scripts accordingly
- contrib/pg_rewind/.gitignore should be cleaned up from its unnecessary entries
- README is incorrect, it is still mentioned for example that this
code should be copied inside PostgreSQL code tree as contrib/pg_rewind

(Sorry email got sent...)
- Code is going to need a brush to clean up things of this type:
+ PG_9.4_201403261
+       printf("Report bugs to https://github.com/vmware/pg_rewind.\n");
- Versioning should be made the Postgres-way, aka not that:
+#define PG_REWIND_VERSION "0.1"
But a way similar to other utilities:
pg_rewind (PostgreSQL) 9.5devel
- Shouldn't we use $(SHELL) here at least?
+++ b/contrib/pg_rewind/launcher
@@ -0,0 +1,6 @@
+#!/bin/bash
+#
+# Normally, psql feeds the files in sql/ directory to psql, but we want to
+# run them as shell scripts instead.
+
+bash
I doubt that this would work for example with MinGW.
- There are a couple of TODO items which may be good to fix:
+        *
+        * TODO: This assumes that there are no timeline switches on the target
+        * cluster after the fork.
+        */
and:
+       /*
+        * TODO: move old file out of the way, if any. And perhaps create the
+        * file with temporary name first and rename in place after it's done.
+        */
- Regression tests, which have a good coverage btw are made using
shell scripts. There is some initialization process that could be
refactored IMO as this code is duplicated with pg_upgrade. For
example, listen_addresses initialization has checks fir MINGW,
environment variables PG* are unset, etc.
Regards,
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Satoshi Nagayasu
snaga@uptime.jp
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind in contrib

Hi,

On 2014/12/12 23:13, Heikki Linnakangas wrote:

Hi,

I'd like to include pg_rewind in contrib. I originally wrote it as an
external project so that I could quickly get it working with the
existing versions, and because I didn't feel it was quite ready for
production use yet. Now, with the WAL format changes in master, it is a
lot more maintainable than before. Many bugs have been fixed since the
first prototypes, and I think it's fairly robust now.

I propose that we include pg_rewind in contrib/ now. Attached is a patch
for that. It just includes the latest sources from the current pg_rewind
repository at https://github.com/vmware/pg_rewind. It is released under
the PostgreSQL license.

For those who are not familiar with pg_rewind, it's a tool that allows
repurposing an old master server as a new standby server, after
promotion, even if the old master was not shut down cleanly. That's a
very often requested feature.

I'm looking into pg_rewind with a very first scenario.
My scenario is here.

https://github.com/snaga/pg_rewind_test/blob/master/pg_rewind_test.sh

At least, I think a file descriptor "srcf" should be closed before
exiting copy_file_range(). I got "can't open file" error with
"too many open file" while running pg_rewind.

------------------------------------------------
diff --git a/contrib/pg_rewind/copy_fetch.c b/contrib/pg_rewind/copy_fetch.c
index bea1b09..5a8cc8e 100644
--- a/contrib/pg_rewind/copy_fetch.c
+++ b/contrib/pg_rewind/copy_fetch.c
@@ -280,6 +280,8 @@ copy_file_range(const char *path, off_t begin, off_t 
end, bool trunc)
                 write_file_range(buf, begin, readlen);
                 begin += readlen;
         }
+
+       close(srcfd);
  }

/*
------------------------------------------------

And I have one question here.

pg_rewind assumes that the source PostgreSQL has, at least, one
checkpoint after getting promoted. I think the target timeline id
in the pg_control file to be read is only available after the first
checkpoint. Right?

Regards,

- Heikki

--
NAGAYASU Satoshi <snaga@uptime.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Satoshi Nagayasu (#12)
Re: pg_rewind in contrib

On 12/16/2014 11:23 AM, Satoshi Nagayasu wrote:

Hi,

On 2014/12/12 23:13, Heikki Linnakangas wrote:

Hi,

I'd like to include pg_rewind in contrib. I originally wrote it as an
external project so that I could quickly get it working with the
existing versions, and because I didn't feel it was quite ready for
production use yet. Now, with the WAL format changes in master, it is a
lot more maintainable than before. Many bugs have been fixed since the
first prototypes, and I think it's fairly robust now.

I propose that we include pg_rewind in contrib/ now. Attached is a patch
for that. It just includes the latest sources from the current pg_rewind
repository at https://github.com/vmware/pg_rewind. It is released under
the PostgreSQL license.

For those who are not familiar with pg_rewind, it's a tool that allows
repurposing an old master server as a new standby server, after
promotion, even if the old master was not shut down cleanly. That's a
very often requested feature.

I'm looking into pg_rewind with a very first scenario.
My scenario is here.

https://github.com/snaga/pg_rewind_test/blob/master/pg_rewind_test.sh

At least, I think a file descriptor "srcf" should be closed before
exiting copy_file_range(). I got "can't open file" error with
"too many open file" while running pg_rewind.

------------------------------------------------
diff --git a/contrib/pg_rewind/copy_fetch.c b/contrib/pg_rewind/copy_fetch.c
index bea1b09..5a8cc8e 100644
--- a/contrib/pg_rewind/copy_fetch.c
+++ b/contrib/pg_rewind/copy_fetch.c
@@ -280,6 +280,8 @@ copy_file_range(const char *path, off_t begin, off_t
end, bool trunc)
write_file_range(buf, begin, readlen);
begin += readlen;
}
+
+       close(srcfd);
}

/*
------------------------------------------------

Yep, good catch. I pushed a fix to the pg_rewind repository at github.

And I have one question here.

pg_rewind assumes that the source PostgreSQL has, at least, one
checkpoint after getting promoted. I think the target timeline id
in the pg_control file to be read is only available after the first
checkpoint. Right?

Yes, it does assume that the source server (= old standby, new master)
has had at least one checkpoint after promotion. It probably should be
more explicit about it: If there hasn't been a checkpoint, you will
currently get an error "source and target cluster are both on the same
timeline", which isn't very informative.

I assume that by "target timeline ID" you meant the timeline ID of the
source server, i.e. the timeline that the target server should be
rewound to.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Satoshi Nagayasu
snaga@uptime.jp
In reply to: Heikki Linnakangas (#13)
Re: pg_rewind in contrib

On 2014/12/16 18:37, Heikki Linnakangas wrote:

On 12/16/2014 11:23 AM, Satoshi Nagayasu wrote:

pg_rewind assumes that the source PostgreSQL has, at least, one
checkpoint after getting promoted. I think the target timeline id
in the pg_control file to be read is only available after the first
checkpoint. Right?

Yes, it does assume that the source server (= old standby, new master)
has had at least one checkpoint after promotion. It probably should be
more explicit about it: If there hasn't been a checkpoint, you will
currently get an error "source and target cluster are both on the same
timeline", which isn't very informative.

Yes, I got the message, so I could find the checkpoint thing.
It could be more informative, or some hint message could be added.

I assume that by "target timeline ID" you meant the timeline ID of the
source server, i.e. the timeline that the target server should be
rewound to.

Yes.
Target timeline I mean here is the timeline id coming from the promoted
master (== source server == old standby).

I got it. Thanks.

I'm going to look into more details.

Regards,

- Heikki

--
NAGAYASU Satoshi <snaga@uptime.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Satoshi Nagayasu (#14)
1 attachment(s)
Re: pg_rewind in contrib

Here's an updated version of pg_rewind. The code itself is the same as
before, and corresponds to the sources in the current pg_rewind github
repository, as of commit a65e3754faf9ca9718e6b350abc736de586433b7. Based
mostly on Michael's comments, I have:

* replaced VMware copyright notices with PGDG ones.
* removed unnecessary cruft from .gitignore
* changed the --version line and "report bugs" notice in --help to match
other binaries in the PostgreSQL distribution
* moved documentation from README to the user manual.
* minor fixes to how the regression tests are launched so that they work
again

Some more work remains to be done on the regression tests. The way
they're launched now is quite weird. It was written that way to make it
work outside the PostgreSQL source tree, and also on Windows. Now that
it lives in contrib, it should be redesigned.

The documentation could also use some work; for now I just converted the
existing text from README to sgml format.

Anything else?

- Heikki

Attachments:

pg_rewind-contrib-2.patchtext/x-diff; name=pg_rewind-contrib-2.patchDownload
diff --git a/contrib/Makefile b/contrib/Makefile
index 195d447..2fe861f 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -32,6 +32,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_rewind	\
 		pg_standby	\
 		pg_stat_statements \
 		pg_test_fsync	\
diff --git a/contrib/pg_rewind/.gitignore b/contrib/pg_rewind/.gitignore
new file mode 100644
index 0000000..816a91d
--- /dev/null
+++ b/contrib/pg_rewind/.gitignore
@@ -0,0 +1,9 @@
+# Files generated during build
+/xlogreader.c
+
+# Generated by test suite
+/tmp_check/
+/regression.diffs
+/regression.out
+/results/
+/regress_log/
diff --git a/contrib/pg_rewind/Makefile b/contrib/pg_rewind/Makefile
new file mode 100644
index 0000000..241c775
--- /dev/null
+++ b/contrib/pg_rewind/Makefile
@@ -0,0 +1,51 @@
+# Makefile for pg_rewind
+#
+# Copyright (c) 2013-2014, PostgreSQL Global Development Group
+#
+
+PGFILEDESC = "pg_rewind - repurpose an old master server as standby"
+PGAPPICON = win32
+
+PROGRAM = pg_rewind
+OBJS	= pg_rewind.o parsexlog.o xlogreader.o util.o datapagemap.o timeline.o \
+	fetch.o copy_fetch.o libpq_fetch.o filemap.o
+
+REGRESS = basictest extrafiles databases
+REGRESS_OPTS=--use-existing --launcher=./launcher
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS = $(libpq_pgport)
+
+override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+
+EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c
+
+all: pg_rewind
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_rewind
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/%
+	rm -f $@ && $(LN_S) $< .
+
+# The regression tests can be run separately against, using the libpq or local
+# method to copy the files. For local mode, the makefile target is
+# "check-local", and for libpq mode, "check-remote". The target check-both
+# runs the tests in both modes.
+check-local:
+	echo "Running tests against local data directory, in copy-mode"
+	bindir=$(bindir) TEST_SUITE="local" $(MAKE) installcheck
+
+check-remote:
+	echo "Running tests against a running standby, via libpq"
+	bindir=$(bindir) TEST_SUITE="remote" $(MAKE) installcheck
+
+check-both: check-local check-remote
diff --git a/contrib/pg_rewind/copy_fetch.c b/contrib/pg_rewind/copy_fetch.c
new file mode 100644
index 0000000..5c40ec7
--- /dev/null
+++ b/contrib/pg_rewind/copy_fetch.c
@@ -0,0 +1,586 @@
+/*-------------------------------------------------------------------------
+ *
+ * copy_fetch.c
+ *	  Functions for copying a PostgreSQL data directory
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/catalog.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "pg_rewind.h"
+#include "fetch.h"
+#include "filemap.h"
+#include "datapagemap.h"
+#include "util.h"
+
+static void recurse_dir(const char *datadir, const char *path,
+			process_file_callback_t callback);
+
+static void execute_pagemap(datapagemap_t *pagemap, const char *path);
+
+static void remove_target_file(const char *path);
+static void create_target_dir(const char *path);
+static void remove_target_dir(const char *path);
+static void create_target_symlink(const char *path, const char *link);
+static void remove_target_symlink(const char *path);
+
+/*
+ * Traverse through all files in a data directory, calling 'callback'
+ * for each file.
+ */
+void
+traverse_datadir(const char *datadir, process_file_callback_t callback)
+{
+	/* should this copy config files or not? */
+	recurse_dir(datadir, NULL, callback);
+}
+
+/*
+ * recursive part of traverse_datadir
+ */
+static void
+recurse_dir(const char *datadir, const char *parentpath,
+			process_file_callback_t callback)
+{
+	DIR		   *xldir;
+	struct dirent *xlde;
+	char		fullparentpath[MAXPGPATH];
+
+	if (parentpath)
+		snprintf(fullparentpath, MAXPGPATH, "%s/%s", datadir, parentpath);
+	else
+		snprintf(fullparentpath, MAXPGPATH, "%s", datadir);
+
+	xldir = opendir(fullparentpath);
+	if (xldir == NULL)
+	{
+		fprintf(stderr, "could not open directory \"%s\": %s\n",
+				fullparentpath, strerror(errno));
+		exit(1);
+	}
+
+	while ((xlde = readdir(xldir)) != NULL)
+	{
+		struct stat fst;
+		char		fullpath[MAXPGPATH];
+		char		path[MAXPGPATH];
+
+		if (strcmp(xlde->d_name, ".") == 0 ||
+			strcmp(xlde->d_name, "..") == 0)
+			continue;
+
+		snprintf(fullpath, MAXPGPATH, "%s/%s", fullparentpath, xlde->d_name);
+
+		if (lstat(fullpath, &fst) < 0)
+		{
+			fprintf(stderr, "warning: could not stat file \"%s\": %s",
+					fullpath, strerror(errno));
+			/*
+			 * This is ok, if the new master is running and the file was
+			 * just removed. If it was a data file, there should be a WAL
+			 * record of the removal. If it was something else, it couldn't
+			 * have been critical anyway.
+			 *
+			 * TODO: But complain if we're processing the target dir!
+			 */
+		}
+
+		if (parentpath)
+			snprintf(path, MAXPGPATH, "%s/%s", parentpath, xlde->d_name);
+		else
+			snprintf(path, MAXPGPATH, "%s", xlde->d_name);
+
+		if (S_ISREG(fst.st_mode))
+			callback(path, FILE_TYPE_REGULAR, fst.st_size, NULL);
+		else if (S_ISDIR(fst.st_mode))
+		{
+			callback(path, FILE_TYPE_DIRECTORY, 0, NULL);
+			/* recurse to handle subdirectories */
+			recurse_dir(datadir, path, callback);
+		}
+		else if (S_ISLNK(fst.st_mode))
+		{
+			char		link_target[MAXPGPATH];
+			ssize_t		len;
+
+			len = readlink(fullpath, link_target, sizeof(link_target) - 1);
+			if (len == -1)
+			{
+				fprintf(stderr, "readlink() failed on \"%s\": %s\n",
+						fullpath, strerror(errno));
+				exit(1);
+			}
+			if (len == sizeof(link_target) - 1)
+			{
+				/* path was truncated */
+				fprintf(stderr, "symbolic link \"%s\" target path too long\n",
+						fullpath);
+				exit(1);
+			}
+
+			callback(path, FILE_TYPE_SYMLINK, 0, link_target);
+
+			/*
+			 * If it's a symlink within pg_tblspc, we need to recurse into it,
+			 * to process all the tablespaces.
+			 */
+			if (strcmp(parentpath, "pg_tblspc") == 0)
+				recurse_dir(datadir, path, callback);
+		}
+	}
+	closedir(xldir);
+}
+
+static int dstfd = -1;
+static char dstpath[MAXPGPATH] = "";
+
+void
+open_target_file(const char *path, bool trunc)
+{
+	int			mode;
+
+	if (dry_run)
+		return;
+
+	if (dstfd != -1 && !trunc &&
+		strcmp(path, &dstpath[strlen(datadir_target) + 1]) == 0)
+		return; /* already open */
+
+	if (dstfd != -1)
+		close_target_file();
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+
+	mode = O_WRONLY | O_CREAT | PG_BINARY;
+	if (trunc)
+		mode |= O_TRUNC;
+	dstfd = open(dstpath, mode, 0600);
+	if (dstfd < 0)
+	{
+		fprintf(stderr, "could not open destination file \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+void
+close_target_file(void)
+{
+	if (close(dstfd) != 0)
+	{
+		fprintf(stderr, "error closing destination file \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+
+	dstfd = -1;
+	/* fsync? */
+}
+
+void
+write_file_range(char *buf, off_t begin, size_t size)
+{
+	int			writeleft;
+	char	   *p;
+
+	if (dry_run)
+		return;
+
+	if (lseek(dstfd, begin, SEEK_SET) == -1)
+	{
+		fprintf(stderr, "could not seek in destination file \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+
+	writeleft = size;
+	p = buf;
+	while (writeleft > 0)
+	{
+		int		writelen;
+
+		writelen = write(dstfd, p, writeleft);
+		if (writelen < 0)
+		{
+			fprintf(stderr, "could not write file \"%s\": %s\n",
+					dstpath, strerror(errno));
+			exit(1);
+		}
+
+		p += writelen;
+		writeleft -= writelen;
+	}
+
+	/* keep the file open, in case we need to copy more blocks in it */
+}
+
+
+/*
+ * Copy a file from source to target, between 'begin' and 'end' offsets.
+ */
+static void
+copy_file_range(const char *path, off_t begin, off_t end, bool trunc)
+{
+	char		buf[BLCKSZ];
+	char		srcpath[MAXPGPATH];
+	int			srcfd;
+
+	snprintf(srcpath, sizeof(srcpath), "%s/%s", datadir_source, path);
+
+	srcfd = open(srcpath, O_RDONLY | PG_BINARY, 0);
+	if (srcfd < 0)
+	{
+		fprintf(stderr, "could not open source file \"%s\": %s\n", srcpath, strerror(errno));
+		exit(1);
+	}
+
+	if (lseek(srcfd, begin, SEEK_SET) == -1)
+	{
+		fprintf(stderr, "could not seek in source file: %s\n", strerror(errno));
+		exit(1);
+	}
+
+	open_target_file(path, trunc);
+
+	while (end - begin > 0)
+	{
+		int		readlen;
+		int		len;
+
+		if (end - begin > sizeof(buf))
+			len = sizeof(buf);
+		else
+			len = end - begin;
+
+		readlen = read(srcfd, buf, len);
+
+		if (readlen < 0)
+		{
+			fprintf(stderr, "could not read file \"%s\": %s\n", srcpath, strerror(errno));
+			exit(1);
+		}
+		else if (readlen == 0)
+		{
+			fprintf(stderr, "unexpected EOF while reading file \"%s\"\n", srcpath);
+			exit(1);
+		}
+
+		write_file_range(buf, begin, readlen);
+		begin += readlen;
+	}
+
+	if (close(srcfd) != 0)
+		fprintf(stderr, "error closing file \"%s\": %s\n", srcpath, strerror(errno));
+}
+
+/*
+ * Checks if two file descriptors point to the same file. This is used as
+ * a sanity check, to make sure the user doesn't try to copy a data directory
+ * over itself.
+ */
+void
+check_samefile(int fd1, int fd2)
+{
+	struct stat statbuf1,
+				statbuf2;
+
+	if (fstat(fd1, &statbuf1) < 0)
+	{
+		fprintf(stderr, "fstat failed: %s\n", strerror(errno));
+		exit(1);
+	}
+
+	if (fstat(fd2, &statbuf2) < 0)
+	{
+		fprintf(stderr, "fstat failed: %s\n", strerror(errno));
+		exit(1);
+	}
+
+	if (statbuf1.st_dev == statbuf2.st_dev &&
+		statbuf1.st_ino == statbuf2.st_ino)
+	{
+		fprintf(stderr, "old and new data directory are the same\n");
+		exit(1);
+	}
+}
+
+/*
+ * Copy all relation data files from datadir_source to datadir_target, which
+ * are marked in the given data page map.
+ */
+void
+copy_executeFileMap(filemap_t *map)
+{
+	file_entry_t *entry;
+	int			i;
+
+	for (i = 0; i < map->narray; i++)
+	{
+		entry = map->array[i];
+		execute_pagemap(&entry->pagemap, entry->path);
+
+		switch (entry->action)
+		{
+			case FILE_ACTION_NONE:
+				/* ok, do nothing.. */
+				break;
+
+			case FILE_ACTION_COPY:
+				copy_file_range(entry->path, 0, entry->newsize, true);
+				break;
+
+			case FILE_ACTION_TRUNCATE:
+				truncate_target_file(entry->path, entry->newsize);
+				break;
+
+			case FILE_ACTION_COPY_TAIL:
+				copy_file_range(entry->path, entry->oldsize, entry->newsize, false);
+				break;
+
+			case FILE_ACTION_CREATE:
+				create_target(entry);
+				break;
+
+			case FILE_ACTION_REMOVE:
+				remove_target(entry);
+				break;
+		}
+	}
+
+	if (dstfd != -1)
+		close_target_file();
+}
+
+
+void
+remove_target(file_entry_t *entry)
+{
+	Assert(entry->action == FILE_ACTION_REMOVE);
+
+	switch (entry->type)
+	{
+		case FILE_TYPE_DIRECTORY:
+			remove_target_dir(entry->path);
+			break;
+
+		case FILE_TYPE_REGULAR:
+			remove_target_symlink(entry->path);
+			break;
+
+		case FILE_TYPE_SYMLINK:
+			remove_target_file(entry->path);
+			break;
+	}
+}
+
+void
+create_target(file_entry_t *entry)
+{
+	Assert(entry->action == FILE_ACTION_CREATE);
+
+	switch (entry->type)
+	{
+		case FILE_TYPE_DIRECTORY:
+			create_target_dir(entry->path);
+			break;
+
+		case FILE_TYPE_SYMLINK:
+			create_target_symlink(entry->path, entry->link_target);
+			break;
+
+		case FILE_TYPE_REGULAR:
+			/* can't happen */
+			fprintf (stderr, "invalid action (CREATE) for regular file\n");
+			exit(1);
+			break;
+	}
+}
+
+static void
+remove_target_file(const char *path)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (unlink(dstpath) != 0)
+	{
+		fprintf(stderr, "could not remove file \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+void
+truncate_target_file(const char *path, off_t newsize)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (truncate(dstpath, newsize) != 0)
+	{
+		fprintf(stderr, "could not truncate file \"%s\" to %u bytes: %s\n",
+				dstpath, (unsigned int) newsize, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
+create_target_dir(const char *path)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (mkdir(dstpath, S_IRWXU) != 0)
+	{
+		fprintf(stderr, "could not create directory \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
+remove_target_dir(const char *path)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (rmdir(dstpath) != 0)
+	{
+		fprintf(stderr, "could not remove directory \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
+create_target_symlink(const char *path, const char *link)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (symlink(link, dstpath) != 0)
+	{
+		fprintf(stderr, "could not create symbolic link at \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
+remove_target_symlink(const char *path)
+{
+	char		dstpath[MAXPGPATH];
+
+	if (dry_run)
+		return;
+
+	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
+	if (unlink(dstpath) != 0)
+	{
+		fprintf(stderr, "could not remove symbolic link \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+}
+
+
+static void
+execute_pagemap(datapagemap_t *pagemap, const char *path)
+{
+	datapagemap_iterator_t *iter;
+	BlockNumber blkno;
+
+	iter = datapagemap_iterate(pagemap);
+	while (datapagemap_next(iter, &blkno))
+	{
+		off_t offset = blkno * BLCKSZ;
+
+		copy_file_range(path, offset, offset + BLCKSZ, false);
+		/* Ok, this block has now been copied from new data dir to old */
+	}
+	free(iter);
+}
+
+/*
+ * Read a file into memory. The file to be read is <datadir>/<path>.
+ * The file contents are returned in a malloc'd buffer, and *filesize
+ * is set to the length of the file.
+ *
+ * The returned buffer is always zero-terminated; the size of the returned
+ * buffer is actually *filesize + 1. That's handy when reading a text file.
+ * This function can be used to read binary files as well, you can just
+ * ignore the zero-terminator in that case.
+ *
+ * This function is used to implement the fetchFile function in the "fetch"
+ * interface (see fetch.c), but is also called directly.
+ */
+char *
+slurpFile(const char *datadir, const char *path, size_t *filesize)
+{
+	int			fd;
+	char	   *buffer;
+	struct stat statbuf;
+	char		fullpath[MAXPGPATH];
+	int			len;
+
+	snprintf(fullpath, sizeof(fullpath), "%s/%s", datadir, path);
+
+	if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) == -1)
+	{
+		fprintf(stderr, _("could not open file \"%s\" for reading: %s\n"),
+				fullpath, strerror(errno));
+		exit(2);
+	}
+
+	if (fstat(fd, &statbuf) < 0)
+	{
+		fprintf(stderr, _("could not open file \"%s\" for reading: %s\n"),
+				fullpath, strerror(errno));
+		exit(2);
+	}
+
+	len = statbuf.st_size;
+
+	buffer = pg_malloc(len + 1);
+
+	if (read(fd, buffer, len) != len)
+	{
+		fprintf(stderr, _("could not read file \"%s\": %s\n"),
+				fullpath, strerror(errno));
+		exit(2);
+	}
+	close(fd);
+
+	/* Zero-terminate the buffer. */
+	buffer[len] = '\0';
+
+	if (filesize)
+		*filesize = len;
+	return buffer;
+}
diff --git a/contrib/pg_rewind/datapagemap.c b/contrib/pg_rewind/datapagemap.c
new file mode 100644
index 0000000..c1a956b
--- /dev/null
+++ b/contrib/pg_rewind/datapagemap.c
@@ -0,0 +1,123 @@
+/*
+ * A data structure for keeping track of data pages that have changed.
+ *
+ * This is a fairly simple bitmap.
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ */
+
+#include "postgres_fe.h"
+
+#include "datapagemap.h"
+#include "util.h"
+
+struct datapagemap_iterator
+{
+	datapagemap_t *map;
+	BlockNumber	nextblkno;
+};
+
+/*****
+ * Public functions
+ */
+
+/*
+ * Add a block to the bitmap.
+ */
+void
+datapagemap_add(datapagemap_t *map, BlockNumber blkno)
+{
+	int			offset;
+	int			bitno;
+
+	offset = blkno / 8;
+	bitno = blkno % 8;
+
+	/* enlarge or create bitmap if needed */
+	if (map->bitmapsize <= offset)
+	{
+		int oldsize = map->bitmapsize;
+		int newsize;
+
+		/*
+		 * The minimum to hold the new bit is offset + 1. But add some
+		 * headroom, so that we don't need to repeatedly enlarge the bitmap
+		 * in the common case that blocks are modified in order, from beginning
+		 * of a relation to the end.
+		 */
+		newsize = offset + 1;
+		newsize += 10;
+
+		if (map->bitmap == NULL)
+			map->bitmap = pg_malloc(newsize);
+		else
+			map->bitmap = pg_realloc(map->bitmap, newsize);
+
+		/* zero out the newly allocated region */
+		memset(&map->bitmap[oldsize], 0, newsize - oldsize);
+
+		map->bitmapsize = newsize;
+	}
+
+	/* Set the bit */
+	map->bitmap[offset] |= (1 << bitno);
+}
+
+/*
+ * Start iterating through all entries in the page map.
+ *
+ * After datapagemap_iterate, call datapagemap_next to return the entries,
+ * until it returns NULL. After you're done, use free() to destroy the
+ * iterator.
+ */
+datapagemap_iterator_t *
+datapagemap_iterate(datapagemap_t *map)
+{
+	datapagemap_iterator_t *iter = pg_malloc(sizeof(datapagemap_iterator_t));
+	iter->map = map;
+	iter->nextblkno = 0;
+	return iter;
+}
+
+bool
+datapagemap_next(datapagemap_iterator_t *iter, BlockNumber *blkno)
+{
+	datapagemap_t *map = iter->map;
+
+	for (;;)
+	{
+		BlockNumber blk = iter->nextblkno;
+		int		nextoff = blk / 8;
+		int		bitno = blk % 8;
+
+		if (nextoff >= map->bitmapsize)
+			break;
+
+		iter->nextblkno++;
+
+		if (map->bitmap[nextoff] & (1 << bitno))
+		{
+			*blkno = blk;
+			return true;
+		}
+	}
+
+	/* no more set bits in this bitmap. */
+	return false;
+}
+
+/*
+ * A debugging aid. Prints out the contents of the page map.
+ */
+void
+datapagemap_print(datapagemap_t *map)
+{
+	datapagemap_iterator_t *iter = datapagemap_iterate(map);
+	BlockNumber blocknum;
+
+	while (datapagemap_next(iter, &blocknum))
+	{
+		printf("  blk %u\n", blocknum);
+	}
+	free(iter);
+}
diff --git a/contrib/pg_rewind/datapagemap.h b/contrib/pg_rewind/datapagemap.h
new file mode 100644
index 0000000..3f04d81
--- /dev/null
+++ b/contrib/pg_rewind/datapagemap.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * datapagemap.h
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATAPAGEMAP_H
+#define DATAPAGEMAP_H
+
+#include "storage/relfilenode.h"
+#include "storage/block.h"
+
+
+struct datapagemap
+{
+	char	   *bitmap;
+	int			bitmapsize;
+};
+
+typedef struct datapagemap datapagemap_t;
+typedef struct datapagemap_iterator datapagemap_iterator_t;
+
+extern datapagemap_t *datapagemap_create(void);
+extern void datapagemap_destroy(datapagemap_t *map);
+extern void datapagemap_add(datapagemap_t *map, BlockNumber blkno);
+extern datapagemap_iterator_t *datapagemap_iterate(datapagemap_t *map);
+extern bool datapagemap_next(datapagemap_iterator_t *iter, BlockNumber *blkno);
+extern void datapagemap_print(datapagemap_t *map);
+
+#endif   /* DATAPAGEMAP_H */
diff --git a/contrib/pg_rewind/expected/basictest.out b/contrib/pg_rewind/expected/basictest.out
new file mode 100644
index 0000000..b67ead5
--- /dev/null
+++ b/contrib/pg_rewind/expected/basictest.out
@@ -0,0 +1,27 @@
+Master initialized and running.
+CREATE TABLE tbl1 (d text);
+CREATE TABLE
+INSERT INTO tbl1 VALUES ('in master');
+INSERT 0 1
+CHECKPOINT;
+CHECKPOINT
+Standby initialized and running.
+INSERT INTO tbl1 values ('in master, before promotion');
+INSERT 0 1
+CHECKPOINT;
+CHECKPOINT
+Standby promoted.
+INSERT INTO tbl1 VALUES ('in master, after promotion');
+INSERT 0 1
+INSERT INTO tbl1 VALUES ('in standby, after promotion');
+INSERT 0 1
+Running pg_rewind...
+Old master restarted after rewind.
+SELECT * from tbl1
+              d              
+-----------------------------
+ in master
+ in master, before promotion
+ in standby, after promotion
+(3 rows)
+
diff --git a/contrib/pg_rewind/expected/databases.out b/contrib/pg_rewind/expected/databases.out
new file mode 100644
index 0000000..e486107
--- /dev/null
+++ b/contrib/pg_rewind/expected/databases.out
@@ -0,0 +1,24 @@
+Master initialized and running.
+CREATE DATABASE inmaster;
+CREATE DATABASE
+Standby initialized and running.
+CREATE DATABASE beforepromotion
+CREATE DATABASE
+Standby promoted.
+CREATE DATABASE master_afterpromotion
+CREATE DATABASE
+CREATE DATABASE standby_afterpromotion
+CREATE DATABASE
+Running pg_rewind...
+Old master restarted after rewind.
+SELECT datname from pg_database
+        datname         
+------------------------
+ template1
+ template0
+ postgres
+ inmaster
+ beforepromotion
+ standby_afterpromotion
+(6 rows)
+
diff --git a/contrib/pg_rewind/expected/extrafiles.out b/contrib/pg_rewind/expected/extrafiles.out
new file mode 100644
index 0000000..8e3f3f1
--- /dev/null
+++ b/contrib/pg_rewind/expected/extrafiles.out
@@ -0,0 +1,15 @@
+Master initialized and running.
+Standby initialized and running.
+Standby promoted.
+Running pg_rewind...
+Old master restarted after rewind.
+tst_both_dir
+tst_both_dir/both_file1
+tst_both_dir/both_file2
+tst_both_dir/both_subdir
+tst_both_dir/both_subdir/both_file3
+tst_standby_dir
+tst_standby_dir/standby_file1
+tst_standby_dir/standby_file2
+tst_standby_dir/standby_subdir
+tst_standby_dir/standby_subdir/standby_file3
diff --git a/contrib/pg_rewind/fetch.c b/contrib/pg_rewind/fetch.c
new file mode 100644
index 0000000..ed05f95
--- /dev/null
+++ b/contrib/pg_rewind/fetch.c
@@ -0,0 +1,59 @@
+/*-------------------------------------------------------------------------
+ *
+ * fetch.c
+ *	  Functions for fetching files from a local or remote data dir
+ *
+ * This file forms an abstraction of getting files from the "source".
+ * There are two implementations of this interface: one for copying files
+ * from a data directory via normal filesystem operations (copy_fetch.c),
+ * and another for fetching files from a remote server via a libpq
+ * connection (libpq_fetch.c)
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "pg_rewind.h"
+#include "fetch.h"
+#include "filemap.h"
+
+void
+fetchRemoteFileList(void)
+{
+	if (datadir_source)
+		traverse_datadir(datadir_source, &process_remote_file);
+	else
+		libpqProcessFileList();
+}
+
+/*
+ * Fetch all relation data files that are marked in the given data page map.
+ */
+void
+executeFileMap(void)
+{
+	if (datadir_source)
+		copy_executeFileMap(filemap);
+	else
+		libpq_executeFileMap(filemap);
+}
+
+/*
+ * Fetch a single file into a malloc'd buffer. The file size is returned
+ * in *filesize. The returned buffer is always zero-terminated.
+ */
+char *
+fetchFile(char *filename, size_t *filesize)
+{
+	if (datadir_source)
+		return slurpFile(datadir_source, filename, filesize);
+	else
+		return libpqGetFile(filename, filesize);
+}
diff --git a/contrib/pg_rewind/fetch.h b/contrib/pg_rewind/fetch.h
new file mode 100644
index 0000000..b368915
--- /dev/null
+++ b/contrib/pg_rewind/fetch.h
@@ -0,0 +1,56 @@
+/*-------------------------------------------------------------------------
+ *
+ * fetch.h
+ *	  Fetching data from a local or remote data directory.
+ *
+ * This file includes the prototypes for functions used to copy files from
+ * one data directory to another. The source to copy from can be a local
+ * directory (copy method), or a remote PostgreSQL server (libpq fetch
+ * method).
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FETCH_H
+#define FETCH_H
+
+#include "c.h"
+
+#include "filemap.h"
+
+/*
+ * Common interface. Calls the copy or libpq method depending on global
+ * config options.
+ */
+extern void fetchRemoteFileList(void);
+extern char *fetchFile(char *filename, size_t *filesize);
+extern void executeFileMap(void);
+
+/* in libpq_fetch.c */
+extern void libpqConnect(const char *connstr);
+extern void libpqProcessFileList(void);
+extern void libpq_executeFileMap(filemap_t *map);
+extern void libpqGetChangedDataPages(datapagemap_t *pagemap);
+extern void libpqGetOtherFiles(void);
+extern char *libpqGetFile(const char *filename, size_t *filesize);
+
+/* in copy_fetch.c */
+extern void copy_executeFileMap(filemap_t *map);
+
+extern void open_target_file(const char *path, bool trunc);
+extern void write_file_range(char *buf, off_t begin, size_t size);
+extern void close_target_file(void);
+
+extern char *slurpFile(const char *datadir, const char *path, size_t *filesize);
+
+typedef void (*process_file_callback_t) (const char *path, file_type_t type, size_t size, const char *link_target);
+extern void traverse_datadir(const char *datadir, process_file_callback_t callback);
+
+extern void truncate_target_file(const char *path, off_t newsize);
+extern void create_target(file_entry_t *t);
+extern void remove_target(file_entry_t *t);
+extern void check_samefile(int fd1, int fd2);
+
+
+#endif   /* FETCH_H */
diff --git a/contrib/pg_rewind/filemap.c b/contrib/pg_rewind/filemap.c
new file mode 100644
index 0000000..48baf60
--- /dev/null
+++ b/contrib/pg_rewind/filemap.c
@@ -0,0 +1,584 @@
+/*
+ * A data structure for keeping track of files that have changed.
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ */
+
+#include "postgres_fe.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <regex.h>
+
+#include "datapagemap.h"
+#include "filemap.h"
+#include "util.h"
+#include "pg_rewind.h"
+#include "storage/fd.h"
+
+filemap_t *filemap = NULL;
+
+static bool isRelDataFile(const char *path);
+static int path_cmp(const void *a, const void *b);
+static int final_filemap_cmp(const void *a, const void *b);
+static void filemap_list_to_array(void);
+
+
+/*****
+ * Public functions
+ */
+
+/*
+ * Create a new file map.
+ */
+filemap_t *
+filemap_create(void)
+{
+	filemap_t *map = pg_malloc(sizeof(filemap_t));
+	map->first = map->last = NULL;
+	map->nlist = 0;
+	map->array = NULL;
+	map->narray = 0;
+
+	Assert(filemap == NULL);
+	filemap = map;
+
+	return map;
+}
+
+static bool
+endswith(const char *haystack, const char *needle)
+{
+	int needlelen = strlen(needle);
+	int haystacklen = strlen(haystack);
+
+	if (haystacklen < needlelen)
+		return false;
+
+	return strcmp(&haystack[haystacklen - needlelen], needle) == 0;
+}
+
+/*
+ * Callback for processing remote file list.
+ */
+void
+process_remote_file(const char *path, file_type_t type, size_t newsize,
+					const char *link_target)
+{
+	bool		exists;
+	char		localpath[MAXPGPATH];
+	struct stat statbuf;
+	filemap_t  *map = filemap;
+	file_action_t action = FILE_ACTION_NONE;
+	size_t		oldsize = 0;
+	file_entry_t *entry;
+
+	Assert(map->array == NULL);
+
+	/*
+	 * Completely ignore some special files in source and destination.
+	 */
+	if (strcmp(path, "postmaster.pid") == 0 ||
+		strcmp(path, "postmaster.opts") == 0)
+		return;
+
+	/*
+	 * Skip temporary files, .../pgsql_tmp/... and .../pgsql_tmp.* in source.
+	 * This has the effect that all temporary files in the destination will
+	 * be removed.
+	 */
+	if (strstr(path, "/" PG_TEMP_FILE_PREFIX) != NULL)
+		return;
+	if (strstr(path, "/" PG_TEMP_FILES_DIR "/") != NULL)
+		return;
+
+	/*
+	 * sanity check: a filename that looks like a data file better be a
+	 * regular file
+	 */
+	if (type != FILE_TYPE_REGULAR && isRelDataFile(path))
+	{
+		fprintf(stderr, "data file in source \"%s\" is a directory.\n", path);
+		exit(1);
+	}
+
+	snprintf(localpath, sizeof(localpath), "%s/%s", datadir_target, path);
+
+	/* Does the corresponding local file exist? */
+	if (lstat(localpath, &statbuf) < 0)
+	{
+		/* does not exist */
+		if (errno != ENOENT)
+		{
+			fprintf(stderr, "could not stat file \"%s\": %s",
+					localpath, strerror(errno));
+			exit(1);
+		}
+
+		exists = false;
+	}
+	else
+		exists = true;
+
+	switch (type)
+	{
+		case FILE_TYPE_DIRECTORY:
+			if (exists && !S_ISDIR(statbuf.st_mode))
+			{
+				/* it's a directory in target, but not in source. Strange.. */
+				fprintf(stderr, "\"%s\" is not a directory.\n", localpath);
+				exit(1);
+			}
+
+			if (!exists)
+				action = FILE_ACTION_CREATE;
+			else
+				action = FILE_ACTION_NONE;
+			oldsize = 0;
+			break;
+
+		case FILE_TYPE_SYMLINK:
+			if (exists && !S_ISLNK(statbuf.st_mode))
+			{
+				/* it's a symbolic link in target, but not in source. Strange.. */
+				fprintf(stderr, "\"%s\" is not a symbolic link.\n", localpath);
+				exit(1);
+			}
+
+			if (!exists)
+				action = FILE_ACTION_CREATE;
+			else
+				action = FILE_ACTION_NONE;
+			oldsize = 0;
+			break;
+
+		case FILE_TYPE_REGULAR:
+			if (exists && !S_ISREG(statbuf.st_mode))
+			{
+				fprintf(stderr, "\"%s\" is not a regular file.\n", localpath);
+				exit(1);
+			}
+
+			if (!exists || !isRelDataFile(path))
+			{
+				/*
+				 * File exists in source, but not in target. Or it's a non-data
+				 * file that we have no special processing for. Copy it in toto.
+				 *
+				 * An exception: PG_VERSIONs should be identical, but avoid
+				 * overwriting it for paranoia.
+				 */
+				if (endswith(path, "PG_VERSION"))
+				{
+					action = FILE_ACTION_NONE;
+					oldsize = statbuf.st_size;
+				}
+				else
+				{
+					action = FILE_ACTION_COPY;
+					oldsize = 0;
+				}
+			}
+			else
+			{
+				/*
+				 * It's a data file that exists in both.
+				 *
+				 * If it's larger in target, we can truncate it. There will
+				 * also be a WAL record of the truncation in the source system,
+				 * so WAL replay would eventually truncate the target too, but
+				 * we might as well do it now.
+				 *
+				 * If it's smaller in the target, it means that it has been
+				 * truncated in the target, or enlarged in the source, or both.
+				 * If it was truncated locally, we need to copy the missing
+				 * tail from the remote system. If it was enlarged in the
+				 * remote system, there will be WAL records in the remote
+				 * system for the new blocks, so we wouldn't need to copy them
+				 * here. But we don't know which scenario we're dealing with,
+				 * and there's no harm in copying the missing blocks now, so do
+				 * it now.
+				 *
+				 * If it's the same size, do nothing here. Any locally modified
+				 * blocks will be copied based on parsing the local WAL, and
+				 * any remotely modified blocks will be updated after
+				 * rewinding, when the remote WAL is replayed.
+				 */
+				oldsize = statbuf.st_size;
+				if (oldsize < newsize)
+					action = FILE_ACTION_COPY_TAIL;
+				else if (oldsize > newsize)
+					action = FILE_ACTION_TRUNCATE;
+				else
+					action = FILE_ACTION_NONE;
+			}
+			break;
+	}
+
+	/* Create a new entry for this file */
+	entry = pg_malloc(sizeof(file_entry_t));
+	entry->path = pg_strdup(path);
+	entry->type = type;
+	entry->action = action;
+	entry->oldsize = oldsize;
+	entry->newsize = newsize;
+	entry->link_target = link_target ? pg_strdup(link_target) : NULL;
+	entry->next = NULL;
+	entry->pagemap.bitmap = NULL;
+	entry->pagemap.bitmapsize = 0;
+	entry->isrelfile = isRelDataFile(path);
+
+	if (map->last)
+	{
+		map->last->next = entry;
+		map->last = entry;
+	}
+	else
+		map->first = map->last = entry;
+	map->nlist++;
+}
+
+
+/*
+ * Callback for processing local file list.
+ *
+ * All remote files must be processed before calling this. This only marks
+ * local files that don't exist in the remote system for deletion.
+ */
+void
+process_local_file(const char *path, file_type_t type, size_t oldsize,
+				   const char *link_target)
+{
+	bool		exists;
+	char		localpath[MAXPGPATH];
+	struct stat statbuf;
+	file_entry_t key;
+	file_entry_t *key_ptr;
+	filemap_t  *map = filemap;
+	file_entry_t *entry;
+
+	snprintf(localpath, sizeof(localpath), "%s/%s", datadir_target, path);
+	if (lstat(localpath, &statbuf) < 0)
+	{
+		if (errno == ENOENT)
+			exists = false;
+		else
+		{
+			fprintf(stderr, "could not stat file \"%s\": %s",
+					localpath, strerror(errno));
+			exit(1);
+		}
+	}
+
+	if (map->array == NULL)
+	{
+		/* on first call, initialize lookup array */
+		if (map->nlist == 0)
+		{
+			/* should not happen */
+			fprintf(stderr, "remote file list is empty\n");
+			exit(1);
+		}
+
+		filemap_list_to_array();
+		qsort(map->array, map->narray, sizeof(file_entry_t *), path_cmp);
+	}
+
+	/*
+	 * Completely ignore some special files
+	 */
+	if (strcmp(path, "postmaster.pid") == 0 ||
+		strcmp(path, "postmaster.opts") == 0)
+		return;
+
+	key.path = (char *) path;
+	key_ptr = &key;
+	exists = bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
+					 path_cmp) != NULL;
+
+	/* Remove any file or folder that doesn't exist in the remote system. */
+	if (!exists)
+	{
+		entry = pg_malloc(sizeof(file_entry_t));
+		entry->path = pg_strdup(path);
+		entry->type = type;
+		entry->action = FILE_ACTION_REMOVE;
+		entry->oldsize = oldsize;
+		entry->newsize = 0;
+		entry->link_target = link_target ? pg_strdup(link_target) : NULL;
+		entry->next = NULL;
+		entry->pagemap.bitmap = NULL;
+		entry->pagemap.bitmapsize = 0;
+		entry->isrelfile = isRelDataFile(path);
+
+		if (map->last == NULL)
+			map->first = entry;
+		else
+			map->last->next = entry;
+		map->last = entry;
+		map->nlist++;
+	}
+	else
+	{
+		/*
+		 * We already handled all files that exist in the remote system
+		 * in process_remote_file().
+		 */
+	}
+}
+
+/*
+ * This callback gets called while we read the old WAL, for every block that
+ * have changed in the local system. It makes note of all the changed blocks
+ * in the pagemap of the file.
+ */
+void
+process_block_change(ForkNumber forknum, RelFileNode rnode, BlockNumber blkno)
+{
+	char	   *path;
+	file_entry_t key;
+	file_entry_t *key_ptr;
+	file_entry_t *entry;
+	BlockNumber	blkno_inseg;
+	int			segno;
+	filemap_t  *map = filemap;
+
+	Assert(filemap->array);
+
+	segno = blkno / RELSEG_SIZE;
+	blkno_inseg = blkno % RELSEG_SIZE;
+
+	path = datasegpath(rnode, forknum, segno);
+
+	key.path = (char *) path;
+	key_ptr = &key;
+
+	{
+		file_entry_t **e = bsearch(&key_ptr, map->array, map->narray,
+								   sizeof(file_entry_t *), path_cmp);
+		if (e)
+			entry = *e;
+		else
+			entry = NULL;
+	}
+	free(path);
+
+	if (entry)
+	{
+		Assert(entry->isrelfile);
+
+		switch (entry->action)
+		{
+			case FILE_ACTION_NONE:
+			case FILE_ACTION_COPY_TAIL:
+			case FILE_ACTION_TRUNCATE:
+				/* skip if we're truncating away the modified block anyway */
+				if ((blkno_inseg + 1) * BLCKSZ <= entry->newsize)
+					datapagemap_add(&entry->pagemap, blkno_inseg);
+				break;
+
+			case FILE_ACTION_COPY:
+			case FILE_ACTION_REMOVE:
+				return;
+
+			case FILE_ACTION_CREATE:
+				fprintf(stderr, "unexpected page modification for directory or symbolic link \"%s\"", entry->path);
+				exit(1);
+		}
+	}
+	else
+	{
+		/*
+		 * If we don't have any record of this file in the file map, it means
+		 * that it's a relation that doesn't exist in the remote system, and
+		 * it was also subsequently removed in the local system, too. We can
+		 * safely ignore it.
+		 */
+	}
+}
+
+/*
+ * Convert the linked list of entries in filemap->first/last to the array,
+ * filemap->array.
+ */
+static void
+filemap_list_to_array(void)
+{
+	int			narray;
+	file_entry_t *entry,
+				*next;
+
+	if (filemap->array == NULL)
+		filemap->array = pg_malloc(filemap->nlist * sizeof(file_entry_t));
+	else
+		filemap->array = pg_realloc(filemap->array,
+									(filemap->nlist + filemap->narray) * sizeof(file_entry_t));
+
+	narray = filemap->narray;
+	for (entry = filemap->first; entry != NULL; entry = next)
+	{
+		filemap->array[narray++] = entry;
+		next = entry->next;
+		entry->next = NULL;
+	}
+	Assert (narray == filemap->nlist + filemap->narray);
+	filemap->narray = narray;
+	filemap->nlist = 0;
+	filemap->first = filemap->last = NULL;
+}
+
+void
+filemap_finalize(void)
+{
+	filemap_list_to_array();
+	qsort(filemap->array, filemap->narray, sizeof(file_entry_t *),
+		  final_filemap_cmp);
+}
+
+static const char *
+action_to_str(file_action_t action)
+{
+	switch (action)
+	{
+		case FILE_ACTION_NONE:
+			return "NONE";
+		case FILE_ACTION_COPY:
+			return "COPY";
+		case FILE_ACTION_TRUNCATE:
+			return "TRUNCATE";
+		case FILE_ACTION_COPY_TAIL:
+			return "COPY_TAIL";
+		case FILE_ACTION_CREATE:
+			return "CREATE";
+		case FILE_ACTION_REMOVE:
+			return "REMOVE";
+
+		default:
+			return "unknown";
+	}
+}
+
+void
+print_filemap(void)
+{
+	file_entry_t *entry;
+	int			i;
+
+	for (i = 0; i < filemap->narray; i++)
+	{
+		entry = filemap->array[i];
+		if (entry->action != FILE_ACTION_NONE ||
+			entry->pagemap.bitmapsize > 0)
+		{
+			printf("%s (%s)\n", entry->path, action_to_str(entry->action));
+
+			if (entry->pagemap.bitmapsize > 0)
+				datapagemap_print(&entry->pagemap);
+		}
+	}
+	fflush(stdout);
+}
+
+/*
+ * Does it look like a relation data file?
+ */
+static bool
+isRelDataFile(const char *path)
+{
+	static bool	regexps_compiled = false;
+	static regex_t datasegment_regex;
+	int			rc;
+
+	/* Compile the regexp if not compiled yet. */
+	if (!regexps_compiled)
+	{
+		/*
+		 * Relation data files can be in one of the following directories:
+		 *
+		 * global/
+		 *		shared relations
+		 *
+		 * base/<db oid>/
+		 *		regular relations, default tablespace
+		 *
+		 * pg_tblspc/<tblspc oid>/PG_9.4_201403261/
+		 *		within a non-default tablespace (the name of the directory
+		 *		depends on version)
+		 *
+		 * And the relation data files themselves have a filename like:
+		 *
+		 * <oid>.<segment number>
+		 *
+		 * This regular expression tries to capture all of above.
+		 */
+		const char *datasegment_regex_str =
+			"("
+			"global"
+			"|"
+			"base/[0-9]+"
+			"|"
+			"pg_tblspc/[0-9]+/[PG_0-9.0-9_0-9]+/[0-9]+"
+			")/"
+			"[0-9]+(\\.[0-9]+)?$";
+		rc = regcomp(&datasegment_regex, datasegment_regex_str, REG_NOSUB | REG_EXTENDED);
+		if (rc != 0)
+		{
+			char errmsg[100];
+			regerror(rc, &datasegment_regex, errmsg, sizeof(errmsg));
+			fprintf(stderr, "could not compile regular expression: %s\n",
+					errmsg);
+			exit(1);
+		}
+	}
+
+	rc = regexec(&datasegment_regex, path, 0, NULL, 0);
+	if (rc == 0)
+	{
+		/* it's a data segment file */
+		return true;
+	}
+	else if (rc != REG_NOMATCH)
+	{
+		char errmsg[100];
+		regerror(rc, &datasegment_regex, errmsg, sizeof(errmsg));
+		fprintf(stderr, "could not execute regular expression: %s\n", errmsg);
+		exit(1);
+	}
+	return false;
+}
+
+static int
+path_cmp(const void *a, const void *b)
+{
+	file_entry_t *fa = *((file_entry_t **) a);
+	file_entry_t *fb = *((file_entry_t **) b);
+	return strcmp(fa->path, fb->path);
+}
+
+/*
+ * In the final stage, the filemap is sorted so that removals come last.
+ * From disk space usage point of view, it would be better to do removals
+ * first, but for now, safety first. If a whole directory is deleted, all
+ * files and subdirectories inside it need to removed first. On creation,
+ * parent directory needs to be created before files and directories inside
+ * it. To achieve that, the file_action_t enum is ordered so that we can
+ * just sort on that first. Furthermore, sort REMOVE entries in reverse
+ * path order, so that "foo/bar" subdirectory is removed before "foo".
+ */
+static int
+final_filemap_cmp(const void *a, const void *b)
+{
+	file_entry_t *fa = *((file_entry_t **) a);
+	file_entry_t *fb = *((file_entry_t **) b);
+
+	if (fa->action > fb->action)
+		return 1;
+	if (fa->action < fb->action)
+		return -1;
+
+	if (fa->action == FILE_ACTION_REMOVE)
+		return -strcmp(fa->path, fb->path);
+	else
+		return strcmp(fa->path, fb->path);
+}
diff --git a/contrib/pg_rewind/filemap.h b/contrib/pg_rewind/filemap.h
new file mode 100644
index 0000000..9c834fd
--- /dev/null
+++ b/contrib/pg_rewind/filemap.h
@@ -0,0 +1,98 @@
+/*-------------------------------------------------------------------------
+ *
+ * filemap.h
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *-------------------------------------------------------------------------
+ */
+#ifndef FILEMAP_H
+#define FILEMAP_H
+
+#include "storage/relfilenode.h"
+#include "storage/block.h"
+
+/*
+ * For every file found in the local or remote system, we have a file entry
+ * which says what we are going to do with the file. For relation files,
+ * there is also a page map, marking pages in the file that were changed
+ * locally.
+ *
+ * The enum values are sorted in the order we want actions to be processed.
+ */
+typedef enum
+{
+	FILE_ACTION_CREATE,		/* create local directory or symbolic link */
+	FILE_ACTION_COPY,		/* copy whole file, overwriting if exists */
+	FILE_ACTION_COPY_TAIL,	/* copy tail from 'oldsize' to 'newsize' */
+	FILE_ACTION_NONE,		/* no action (we might still copy modified blocks
+							 * based on the parsed WAL) */
+	FILE_ACTION_TRUNCATE,	/* truncate local file to 'newsize' bytes */
+	FILE_ACTION_REMOVE,		/* remove local file / directory / symlink */
+
+} file_action_t;
+
+typedef enum
+{
+	FILE_TYPE_REGULAR,
+	FILE_TYPE_DIRECTORY,
+	FILE_TYPE_SYMLINK
+} file_type_t;
+
+struct file_entry_t
+{
+	char	   *path;
+	file_type_t type;
+
+	file_action_t action;
+
+	/* for a regular file */
+	size_t		oldsize;
+	size_t		newsize;
+	bool		isrelfile;		/* is it a relation data file? */
+
+	datapagemap_t	pagemap;
+
+	/* for a symlink */
+	char		*link_target;
+
+	struct file_entry_t *next;
+};
+
+typedef struct file_entry_t file_entry_t;
+
+struct filemap_t
+{
+	/*
+	 * New entries are accumulated to a linked list, in process_remote_file
+	 * and process_local_file.
+	 */
+	file_entry_t *first;
+	file_entry_t *last;
+	int			nlist;
+
+	/*
+	 * After processing all the remote files, the entries in the linked list
+	 * are moved to this array. After processing local file, too, all the
+	 * local entries are added to the array by filemap_finalize, and sorted
+	 * in the final order. After filemap_finalize, all the entries are in
+	 * the array, and the linked list is empty.
+	 */
+	file_entry_t **array;
+	int			narray;
+};
+
+typedef struct filemap_t filemap_t;
+
+extern filemap_t * filemap;
+
+extern filemap_t *filemap_create(void);
+
+extern void print_filemap(void);
+
+/* Functions for populating the filemap */
+extern void process_remote_file(const char *path, file_type_t type, size_t newsize, const char *link_target);
+extern void process_local_file(const char *path, file_type_t type, size_t newsize, const char *link_target);
+extern void process_block_change(ForkNumber forknum, RelFileNode rnode, BlockNumber blkno);
+extern void filemap_finalize(void);
+
+#endif   /* FILEMAP_H */
diff --git a/contrib/pg_rewind/launcher b/contrib/pg_rewind/launcher
new file mode 100755
index 0000000..56f8cc0
--- /dev/null
+++ b/contrib/pg_rewind/launcher
@@ -0,0 +1,6 @@
+#!/bin/bash
+#
+# Normally, psql feeds the files in sql/ directory to psql, but we want to
+# run them as shell scripts instead.
+
+bash
diff --git a/contrib/pg_rewind/libpq_fetch.c b/contrib/pg_rewind/libpq_fetch.c
new file mode 100644
index 0000000..716814b
--- /dev/null
+++ b/contrib/pg_rewind/libpq_fetch.c
@@ -0,0 +1,407 @@
+/*-------------------------------------------------------------------------
+ *
+ * libpq_fetch.c
+ *	  Functions for fetching files from a remote server.
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/catalog.h"
+#include "catalog/pg_type.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include <libpq-fe.h>
+
+#include "pg_rewind.h"
+#include "fetch.h"
+#include "filemap.h"
+#include "datapagemap.h"
+
+static PGconn *conn = NULL;
+
+#define CHUNKSIZE 1000000
+
+static void receiveFileChunks(const char *sql);
+static void execute_pagemap(datapagemap_t *pagemap, const char *path);
+
+void
+libpqConnect(const char *connstr)
+{
+	conn = PQconnectdb(connstr);
+	if (PQstatus(conn) == CONNECTION_BAD)
+	{
+		fprintf(stderr, "could not connect to remote server: %s\n",
+				PQerrorMessage(conn));
+		exit(1);
+	}
+
+	if (verbose)
+		printf("connected to remote server\n");
+}
+
+/*
+ * Get a file list.
+ */
+void
+libpqProcessFileList(void)
+{
+	PGresult   *res;
+	const char *sql;
+	int			i;
+
+	sql =
+		"-- Create a recursive directory listing of the whole data directory\n"
+		"with recursive files (path, filename, size, isdir) as (\n"
+		"  select '' as path, filename, size, isdir from\n"
+		"  (select pg_ls_dir('.') as filename) as fn,\n"
+		"        pg_stat_file(fn.filename) as this\n"
+		"  union all\n"
+		"  select parent.path || parent.filename || '/' as path,\n"
+		"         fn, this.size, this.isdir\n"
+		"  from files as parent,\n"
+		"       pg_ls_dir(parent.path || parent.filename) as fn,\n"
+		"       pg_stat_file(parent.path || parent.filename || '/' || fn) as this\n"
+		"       where parent.isdir = 't'\n"
+		")\n"
+		"-- Using the cte, fetch a listing of the all the files.\n"
+		"--\n"
+		"-- For tablespaces, use pg_tablespace_location() function to fetch\n"
+		"-- the link target (there is no backend function to get a symbolic\n"
+		"-- link's target in general, so if the admin has put any custom\n"
+		"-- symbolic links in the data directory, they won't be copied\n"
+		"-- correctly)\n"
+		"select path || filename, size, isdir,\n"
+		"       pg_tablespace_location(pg_tablespace.oid) as link_target\n"
+		"from files\n"
+		"left outer join pg_tablespace on files.path = 'pg_tblspc/'\n"
+		"                             and oid::text = files.filename\n";
+	res = PQexec(conn, sql);
+
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		fprintf(stderr, "unexpected result while fetching file list: %s\n",
+				PQresultErrorMessage(res));
+		exit(1);
+	}
+
+	/* sanity check the result set */
+	if (!(PQnfields(res) == 4))
+	{
+		fprintf(stderr, "unexpected result set while fetching file list\n");
+		exit(1);
+	}
+
+	/* Read result to local variables */
+	for (i = 0; i < PQntuples(res); i++)
+	{
+		char	   *path = PQgetvalue(res, i, 0);
+		int			filesize = atoi(PQgetvalue(res, i, 1));
+		bool		isdir = (strcmp(PQgetvalue(res, i, 2), "t") == 0);
+		char	   *link_target = PQgetvalue(res, i, 3);
+		file_type_t type;
+
+		if (link_target[0])
+			type = FILE_TYPE_SYMLINK;
+		else if (isdir)
+			type = FILE_TYPE_DIRECTORY;
+		else
+			type = FILE_TYPE_REGULAR;
+
+		process_remote_file(path, type, filesize, link_target);
+	}
+}
+
+/*
+ * Runs a query, which returns pieces of files from the remote source data
+ * directory, and overwrites the corresponding parts of target files with
+ * the received parts. The result set is expected to be of format:
+ *
+ * path		text	-- path in the data directory, e.g "base/1/123"
+ * begin	int4	-- offset within the file
+ * chunk	bytea	-- file content
+ *
+ */
+static void
+receiveFileChunks(const char *sql)
+{
+	PGresult   *res;
+
+	if (PQsendQueryParams(conn, sql, 0, NULL, NULL, NULL, NULL, 1) != 1)
+	{
+		fprintf(stderr, "could not send query: %s\n", PQerrorMessage(conn));
+		exit(1);
+	}
+
+	if (verbose)
+		fprintf(stderr, "getting chunks: %s\n", sql);
+
+	if (PQsetSingleRowMode(conn) != 1)
+	{
+		fprintf(stderr, "could not set libpq connection to single row mode\n");
+		exit(1);
+	}
+
+	if (verbose)
+		fprintf(stderr, "sent query\n");
+
+	while ((res = PQgetResult(conn)) != NULL)
+	{
+		char   *filename;
+		int		filenamelen;
+		int		chunkoff;
+		int		chunksize;
+		char   *chunk;
+
+		switch(PQresultStatus(res))
+		{
+			case PGRES_SINGLE_TUPLE:
+				break;
+
+			case PGRES_TUPLES_OK:
+				continue; /* final zero-row result */
+			default:
+				fprintf(stderr, "unexpected result while fetching remote files: %s\n",
+						PQresultErrorMessage(res));
+				exit(1);
+		}
+
+		/* sanity check the result set */
+		if (!(PQnfields(res) == 3 && PQntuples(res) == 1))
+		{
+			fprintf(stderr, "unexpected result set size while fetching remote files\n");
+			exit(1);
+		}
+
+		if (!(PQftype(res, 0) == TEXTOID && PQftype(res, 1) == INT4OID && PQftype(res, 2) == BYTEAOID))
+		{
+			fprintf(stderr, "unexpected data types in result set while fetching remote files: %u %u %u\n", PQftype(res, 0), PQftype(res, 1), PQftype(res, 2));
+			exit(1);
+		}
+		if (!(PQfformat(res, 0) == 1 && PQfformat(res, 1) == 1 && PQfformat(res, 2) == 1))
+		{
+			fprintf(stderr, "unexpected result format while fetching remote files\n");
+			exit(1);
+		}
+
+		if (!(!PQgetisnull(res, 0, 0) && !PQgetisnull(res, 0, 1) && !PQgetisnull(res, 0, 2) &&
+			  PQgetlength(res, 0, 1) == sizeof(int32)))
+		{
+			fprintf(stderr, "unexpected result set while fetching remote files\n");
+			exit(1);
+		}
+
+		/* Read result set to local variables */
+		memcpy(&chunkoff, PQgetvalue(res, 0, 1), sizeof(int32));
+		chunkoff = ntohl(chunkoff);
+		chunksize = PQgetlength(res, 0, 2);
+
+		filenamelen = PQgetlength(res, 0, 0);
+		filename = pg_malloc(filenamelen + 1);
+		memcpy(filename, PQgetvalue(res, 0, 0), filenamelen);
+		filename[filenamelen] = '\0';
+
+		chunk = PQgetvalue(res, 0, 2);
+
+		if (verbose)
+			fprintf(stderr, "received chunk for file \"%s\", off %d, len %d\n",
+					filename, chunkoff, chunksize);
+
+		open_target_file(filename, false);
+
+		write_file_range(chunk, chunkoff, chunksize);
+	}
+}
+
+/*
+ * Receive a single file.
+ */
+char *
+libpqGetFile(const char *filename, size_t *filesize)
+{
+	PGresult   *res;
+	char	   *result;
+	int			len;
+	const char *paramValues[1];
+	paramValues[0] = filename;
+
+	res = PQexecParams(conn, "select pg_read_binary_file($1)",
+					   1, NULL, paramValues, NULL, NULL, 1);
+
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		fprintf(stderr, "unexpected result while fetching remote file \"%s\": %s\n",
+				filename, PQresultErrorMessage(res));
+		exit(1);
+	}
+
+
+	/* sanity check the result set */
+	if (!(PQntuples(res) == 1 && !PQgetisnull(res, 0, 0)))
+	{
+		fprintf(stderr, "unexpected result set while fetching remote file \"%s\"\n",
+				filename);
+		exit(1);
+	}
+
+	/* Read result to local variables */
+	len = PQgetlength(res, 0, 0);
+	result = pg_malloc(len + 1);
+	memcpy(result, PQgetvalue(res, 0, 0), len);
+	result[len] = '\0';
+
+	if (verbose)
+		printf("fetched file \"%s\", length %d\n", filename, len);
+
+	if (filesize)
+		*filesize = len;
+	return result;
+}
+
+static void
+copy_file_range(const char *path, unsigned int begin, unsigned int end)
+{
+	char linebuf[MAXPGPATH + 23];
+
+	/* Split the range into CHUNKSIZE chunks */
+	while (end - begin > 0)
+	{
+		unsigned int len;
+
+		if (end - begin > CHUNKSIZE)
+			len = CHUNKSIZE;
+		else
+			len = end - begin;
+
+		snprintf(linebuf, sizeof(linebuf), "%s\t%u\t%u\n", path, begin, len);
+
+		if (PQputCopyData(conn, linebuf, strlen(linebuf)) != 1)
+		{
+			fprintf(stderr, "error sending COPY data: %s\n",
+					PQerrorMessage(conn));
+			exit(1);
+		}
+		begin += len;
+	}
+}
+
+/*
+ * Fetch all changed blocks from remote source data directory.
+ */
+void
+libpq_executeFileMap(filemap_t *map)
+{
+	file_entry_t *entry;
+	const char *sql;
+	PGresult   *res;
+	int			i;
+
+	/*
+	 * First create a temporary table, and load it with the blocks that
+	 * we need to fetch.
+	 */
+	sql = "create temporary table fetchchunks(path text, begin int4, len int4);";
+	res = PQexec(conn, sql);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		fprintf(stderr, "error creating temporary table: %s\n",
+				PQresultErrorMessage(res));
+		exit(1);
+	}
+
+	sql = "copy fetchchunks from stdin";
+	res = PQexec(conn, sql);
+
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+	{
+		fprintf(stderr, "unexpected result while sending file list: %s\n",
+				PQresultErrorMessage(res));
+		exit(1);
+	}
+
+	for (i = 0; i < map->narray; i++)
+	{
+		entry = map->array[i];
+		execute_pagemap(&entry->pagemap, entry->path);
+
+		switch (entry->action)
+		{
+			case FILE_ACTION_NONE:
+				/* ok, do nothing.. */
+				break;
+
+			case FILE_ACTION_COPY:
+				/* Truncate the old file out of the way, if any */
+				open_target_file(entry->path, true);
+				copy_file_range(entry->path, 0, entry->newsize);
+				break;
+
+			case FILE_ACTION_TRUNCATE:
+				truncate_target_file(entry->path, entry->newsize);
+				break;
+
+			case FILE_ACTION_COPY_TAIL:
+				copy_file_range(entry->path, entry->oldsize, entry->newsize);
+				break;
+
+			case FILE_ACTION_REMOVE:
+				remove_target(entry);
+				break;
+
+			case FILE_ACTION_CREATE:
+				create_target(entry);
+				break;
+		}
+	}
+
+	if (PQputCopyEnd(conn, NULL) != 1)
+	{
+		fprintf(stderr, "error sending end-of-COPY: %s\n",
+				PQerrorMessage(conn));
+		exit(1);
+	}
+
+	while ((res = PQgetResult(conn)) != NULL)
+	{
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		{
+			fprintf(stderr, "unexpected result while sending file list: %s\n",
+					PQresultErrorMessage(res));
+			exit(1);
+		}
+	}
+
+	/* Ok, we've sent the file list. Now receive the files */
+	sql =
+		"-- fetch all the blocks listed in the temp table.\n"
+		"select path, begin, \n"
+		"  pg_read_binary_file(path, begin, len) as chunk\n"
+		"from fetchchunks\n";
+
+	receiveFileChunks(sql);
+}
+
+
+static void
+execute_pagemap(datapagemap_t *pagemap, const char *path)
+{
+	datapagemap_iterator_t *iter;
+	BlockNumber blkno;
+
+	iter = datapagemap_iterate(pagemap);
+	while (datapagemap_next(iter, &blkno))
+	{
+		off_t offset = blkno * BLCKSZ;
+
+		copy_file_range(path, offset, offset + BLCKSZ);
+	}
+	free(iter);
+}
diff --git a/contrib/pg_rewind/parsexlog.c b/contrib/pg_rewind/parsexlog.c
new file mode 100644
index 0000000..bc8ad10
--- /dev/null
+++ b/contrib/pg_rewind/parsexlog.c
@@ -0,0 +1,368 @@
+/*-------------------------------------------------------------------------
+ *
+ * parsexlog.c
+ *	  Functions for reading Write-Ahead-Log
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1996-2008, Nippon Telegraph and Telephone Corporation
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#define FRONTEND 1
+#include "c.h"
+#undef FRONTEND
+#include "postgres.h"
+
+#include "pg_rewind.h"
+#include "filemap.h"
+
+#include <unistd.h>
+
+#include "access/rmgr.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+#include "catalog/pg_control.h"
+#include "catalog/storage_xlog.h"
+#include "commands/dbcommands.h"
+
+
+/*
+ * RmgrNames is an array of resource manager names, to make error messages
+ * a bit nicer.
+ */
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup) \
+  name,
+
+static const char *RmgrNames[RM_MAX_ID + 1] = {
+#include "access/rmgrlist.h"
+};
+
+static void extractPageInfo(XLogReaderState *record);
+
+static int xlogreadfd = -1;
+static XLogSegNo xlogreadsegno = -1;
+static char xlogfpath[MAXPGPATH];
+
+typedef struct XLogPageReadPrivate
+{
+	const char	   *datadir;
+	TimeLineID		tli;
+} XLogPageReadPrivate;
+
+static int SimpleXLogPageRead(XLogReaderState *xlogreader,
+				   XLogRecPtr targetPagePtr,
+				   int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
+				   TimeLineID *pageTLI);
+
+/*
+ * Read all the WAL in the datadir/pg_xlog,  starting from 'startpoint' on
+ * timeline 'tli'. Make note of the data blocks touched by the WAL records,
+ * and return them in a page map.
+ */
+void
+extractPageMap(const char *datadir, XLogRecPtr startpoint, TimeLineID tli)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char *errormsg;
+	XLogPageReadPrivate private;
+
+	private.datadir = datadir;
+	private.tli = tli;
+	xlogreader = XLogReaderAllocate(&SimpleXLogPageRead, &private);
+
+	record = XLogReadRecord(xlogreader, startpoint, &errormsg);
+	if (record == NULL)
+	{
+		fprintf(stderr, "could not read WAL starting at %X/%X",
+				(uint32) (startpoint >> 32),
+				(uint32) (startpoint));
+		if (errormsg)
+			fprintf(stderr, ": %s", errormsg);
+		fprintf(stderr, "\n");
+		exit(1);
+	}
+
+	do
+	{
+		extractPageInfo(xlogreader);
+
+		record = XLogReadRecord(xlogreader, InvalidXLogRecPtr, &errormsg);
+
+		if (errormsg)
+			fprintf(stderr, "error reading xlog record: %s\n", errormsg);
+	} while(record != NULL);
+
+	XLogReaderFree(xlogreader);
+	if (xlogreadfd != -1)
+	{
+		close(xlogreadfd);
+		xlogreadfd = -1;
+	}
+}
+
+/*
+ * Reads one WAL record. Returns the end position of the record, without
+ * doing anything the record itself.
+ */
+XLogRecPtr
+readOneRecord(const char *datadir, XLogRecPtr ptr, TimeLineID tli)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+	XLogPageReadPrivate private;
+	XLogRecPtr	endptr;
+
+	private.datadir = datadir;
+	private.tli = tli;
+	xlogreader = XLogReaderAllocate(&SimpleXLogPageRead, &private);
+
+	record = XLogReadRecord(xlogreader, ptr, &errormsg);
+	if (record == NULL)
+	{
+		fprintf(stderr, "could not read WAL record at %X/%X",
+				(uint32) (ptr >> 32), (uint32) (ptr));
+		if (errormsg)
+			fprintf(stderr, ": %s", errormsg);
+		fprintf(stderr, "\n");
+		exit(1);
+	}
+	endptr = xlogreader->EndRecPtr;
+
+	XLogReaderFree(xlogreader);
+	if (xlogreadfd != -1)
+	{
+		close(xlogreadfd);
+		xlogreadfd = -1;
+	}
+
+	return endptr;
+}
+
+/*
+ * Find the previous checkpoint preceding given WAL position.
+ */
+void
+findLastCheckpoint(const char *datadir, XLogRecPtr forkptr, TimeLineID tli,
+				   XLogRecPtr *lastchkptrec, TimeLineID *lastchkpttli,
+				   XLogRecPtr *lastchkptredo)
+{
+	/* Walk backwards, starting from the given record */
+	XLogRecord *record;
+	XLogRecPtr searchptr;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+	XLogPageReadPrivate private;
+
+
+	/*
+	 * The given fork pointer points to the end of the last common record,
+	 * which is not necessarily the beginning of the next record, if the
+	 * previous record happens to end at a page boundary. Skip over the
+	 * page header in that case to find the next record.
+	 */
+	if (forkptr % XLOG_BLCKSZ == 0)
+		forkptr += (forkptr % XLogSegSize == 0) ? SizeOfXLogLongPHD : SizeOfXLogShortPHD;
+
+	private.datadir = datadir;
+	private.tli = tli;
+	xlogreader = XLogReaderAllocate(&SimpleXLogPageRead, &private);
+
+	searchptr = forkptr;
+	for (;;)
+	{
+		uint8		info;
+
+		record = XLogReadRecord(xlogreader, searchptr, &errormsg);
+
+		if (record == NULL)
+		{
+			fprintf(stderr, "could not find previous WAL record at %X/%X",
+					(uint32) (searchptr >> 32),
+					(uint32) (searchptr));
+			if (errormsg)
+				fprintf(stderr, ": %s", errormsg);
+			fprintf(stderr, "\n");
+			exit(1);
+		}
+
+		/*
+		 * Check if it is a checkpoint record. This checkpoint record
+		 * needs to be the latest checkpoint before WAL forked and not
+		 * the checkpoint where the master has been stopped to be
+		 * rewinded.
+		 */
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+		if (searchptr < forkptr &&
+			XLogRecGetRmid(xlogreader) == RM_XLOG_ID &&
+			(info == XLOG_CHECKPOINT_SHUTDOWN || info == XLOG_CHECKPOINT_ONLINE))
+		{
+			CheckPoint	checkPoint;
+
+			memcpy(&checkPoint, XLogRecGetData(xlogreader), sizeof(CheckPoint));
+			*lastchkptrec = searchptr;
+			*lastchkpttli = checkPoint.ThisTimeLineID;
+			*lastchkptredo = checkPoint.redo;
+			break;
+		}
+
+		/* Walk backwards to previous record. */
+		searchptr = record->xl_prev;
+	}
+
+	XLogReaderFree(xlogreader);
+	if (xlogreadfd != -1)
+	{
+		close(xlogreadfd);
+		xlogreadfd = -1;
+	}
+}
+
+/* XLogreader callback function, to read a WAL page */
+int
+SimpleXLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
+				   int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
+				   TimeLineID *pageTLI)
+{
+	XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data;
+	uint32		targetPageOff;
+	XLogSegNo	targetSegNo PG_USED_FOR_ASSERTS_ONLY;
+
+	XLByteToSeg(targetPagePtr, targetSegNo);
+	targetPageOff = targetPagePtr % XLogSegSize;
+
+	/*
+	 * See if we need to switch to a new segment because the requested record
+	 * is not in the currently open one.
+	 */
+	if (xlogreadfd >= 0 && !XLByteInSeg(targetPagePtr, xlogreadsegno))
+	{
+		close(xlogreadfd);
+		xlogreadfd = -1;
+	}
+
+	XLByteToSeg(targetPagePtr, xlogreadsegno);
+
+	if (xlogreadfd < 0)
+	{
+		char		xlogfname[MAXFNAMELEN];
+
+		XLogFileName(xlogfname, private->tli, xlogreadsegno);
+
+		snprintf(xlogfpath, MAXPGPATH, "%s/" XLOGDIR "/%s", private->datadir, xlogfname);
+
+		xlogreadfd = open(xlogfpath, O_RDONLY | PG_BINARY, 0);
+
+		if (xlogreadfd < 0)
+		{
+			fprintf(stderr, "could not open file \"%s\": %s\n", xlogfpath,
+					strerror(errno));
+			return -1;
+		}
+	}
+
+	/*
+	 * At this point, we have the right segment open.
+	 */
+	Assert(xlogreadfd != -1);
+
+	/* Read the requested page */
+	if (lseek(xlogreadfd, (off_t) targetPageOff, SEEK_SET) < 0)
+	{
+		fprintf(stderr, "could not seek in file \"%s\": %s\n", xlogfpath,
+				strerror(errno));
+		return -1;
+	}
+
+	if (read(xlogreadfd, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
+	{
+		fprintf(stderr, "could not read from file \"%s\": %s\n", xlogfpath,
+				strerror(errno));
+		return -1;
+	}
+
+	Assert(targetSegNo == xlogreadsegno);
+
+	*pageTLI = private->tli;
+	return XLOG_BLCKSZ;
+}
+
+static void
+extractPageInfo(XLogReaderState *record)
+{
+#define pageinfo_set_truncation(forkno, rnode, blkno) datapagemap_set_truncation(pagemap, forkno, rnode, blkno)
+
+	int			block_id;
+	RmgrId		rmid = XLogRecGetRmid(record);
+	uint8		info = XLogRecGetInfo(record);
+	uint8		rminfo = info & ~XLR_INFO_MASK;
+
+	/* Is this a special record type that I recognize? */
+
+	if (rmid == RM_DBASE_ID && rminfo == XLOG_DBASE_CREATE)
+	{
+		/*
+		 * New databases can be safely ignored. It won't be present in the
+		 * remote system, so it will be copied in toto. There's one
+		 * corner-case, though: if a new, different, database is also created
+		 * in the remote system, we'll see that the files already exist and
+		 * not copy them. That's OK, though; WAL replay of creating the new
+		 * database, from the remote WAL, will re-copy the new database,
+		 * overwriting the database created in the local system.
+		 */
+	}
+	else if (rmid == RM_DBASE_ID && rminfo == XLOG_DBASE_DROP)
+	{
+		/*
+		 * An existing database was dropped. We'll see that the files don't
+		 * exist in local system, and copy them in toto from the remote
+		 * system. No need to do anything special here.
+		 */
+	}
+	else if (rmid == RM_SMGR_ID && rminfo == XLOG_SMGR_CREATE)
+	{
+		/*
+		 * We can safely ignore these. The local file will be removed, if it
+		 * doesn't exist in remote system. If a file with same name is created
+		 * in remote system, too, there will be WAL records for all the blocks
+		 * in it.
+		 */
+	}
+	else if (rmid == RM_SMGR_ID && rminfo == XLOG_SMGR_TRUNCATE)
+	{
+		/*
+		 * We can safely ignore these. If a file is truncated locally, we'll
+		 * notice that when we compare the sizes, and will copy the missing
+		 * tail from remote system.
+		 *
+		 * TODO: But it would be nice to do some sanity cross-checking here..
+		 */
+	}
+	else if (info & XLR_SPECIAL_REL_UPDATE)
+	{
+		/*
+		 * This record type modifies a relation file in some special
+		 * way, but we don't recognize the type. That's bad - we don't
+		 * know how to track that change.
+		 */
+		fprintf(stderr, "WAL record modifies a relation, but record type is not recognized\n");
+		fprintf(stderr, "lsn: %X/%X, rmgr: %s, info: %02X\n",
+			(uint32) (record->ReadRecPtr >> 32), (uint32) (record->ReadRecPtr),
+				RmgrNames[rmid], info);
+		exit(1);
+	}
+
+	for (block_id = 0; block_id <= record->max_block_id; block_id++)
+	{
+		RelFileNode rnode;
+		ForkNumber forknum;
+		BlockNumber blkno;
+
+		if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno))
+			continue;
+		process_block_change(forknum, rnode, blkno);
+	}
+}
diff --git a/contrib/pg_rewind/pg_rewind.c b/contrib/pg_rewind/pg_rewind.c
new file mode 100644
index 0000000..501414f
--- /dev/null
+++ b/contrib/pg_rewind/pg_rewind.c
@@ -0,0 +1,541 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_rewind.c
+ *	  Synchronizes an old master server to a new timeline
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "pg_rewind.h"
+#include "fetch.h"
+#include "filemap.h"
+
+#include "access/timeline.h"
+#include "access/xlog_internal.h"
+#include "catalog/catversion.h"
+#include "catalog/pg_control.h"
+#include "storage/bufpage.h"
+
+#include "getopt_long.h"
+
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <time.h>
+#include <unistd.h>
+
+static void usage(const char *progname);
+
+static void createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli,
+				  XLogRecPtr checkpointloc);
+
+static void digestControlFile(ControlFileData *ControlFile, char *source, size_t size);
+static void updateControlFile(ControlFileData *ControlFile,
+							  char *datadir);
+static void sanityChecks(void);
+static void findCommonAncestorTimeline(XLogRecPtr *recptr, TimeLineID *tli);
+
+static ControlFileData ControlFile_target;
+static ControlFileData ControlFile_source;
+
+const char *progname;
+
+char *datadir_target = NULL;
+char *datadir_source = NULL;
+char *connstr_source = NULL;
+
+int verbose;
+int dry_run;
+
+static void
+usage(const char *progname)
+{
+	printf("%s resynchronizes a cluster with another copy of the cluster.\n\n", progname);
+	printf("Usage:\n  %s [OPTION]...\n\n", progname);
+	printf("Options:\n");
+	printf("  -D, --target-pgdata=DIRECTORY\n");
+	printf("                 existing data directory to modify\n");
+	printf("  --source-pgdata=DIRECTORY\n");
+	printf("                 source data directory to sync with\n");
+	printf("  --source-server=CONNSTR\n");
+	printf("                 source server to sync with\n");
+	printf("  -v             write a lot of progress messages\n");
+	printf("  -n, --dry-run  stop before modifying anything\n");
+	printf("  -V, --version  output version information, then exit\n");
+	printf("  -?, --help     show this help, then exit\n");
+	printf("\n");
+	printf("Report bugs to <pgsql-bugs@postgresql.org>.\n");
+}
+
+
+int
+main(int argc, char **argv)
+{
+	static struct option long_options[] = {
+		{"help", no_argument, NULL, '?'},
+		{"target-pgdata", required_argument, NULL, 'D'},
+		{"source-pgdata", required_argument, NULL, 1},
+		{"source-server", required_argument, NULL, 2},
+		{"version", no_argument, NULL, 'V'},
+		{"dry-run", no_argument, NULL, 'n'},
+		{"verbose", no_argument, NULL, 'v'},
+		{NULL, 0, NULL, 0}
+	};
+	int			option_index;
+	int			c;
+	XLogRecPtr	divergerec;
+	TimeLineID	lastcommontli;
+	XLogRecPtr	chkptrec;
+	TimeLineID	chkpttli;
+	XLogRecPtr	chkptredo;
+	size_t		size;
+	char	   *buffer;
+	bool		rewind_needed;
+	ControlFileData	ControlFile;
+
+	progname = get_progname(argv[0]);
+
+	/* Set default parameter values */
+	verbose = 0;
+	dry_run = 0;
+
+	/* Process command-line arguments */
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage(progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_rewind (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt_long(argc, argv, "D:vn", long_options, &option_index)) != -1)
+	{
+		switch (c)
+		{
+			case '?':
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+			case ':':
+				exit(1);
+			case 'v':
+				verbose = 1;
+				break;
+			case 'n':
+				dry_run = 1;
+				break;
+
+			case 'D':	/* -D or --target-pgdata */
+				datadir_target = pg_strdup(optarg);
+				break;
+
+			case 1:		/* --source-pgdata */
+				datadir_source = pg_strdup(optarg);
+				break;
+			case 2:		/* --source-server */
+				connstr_source = pg_strdup(optarg);
+				break;
+		}
+	}
+
+	/* No source given? Show usage */
+	if (datadir_source == NULL && connstr_source == NULL)
+	{
+		fprintf(stderr, "%s: no source specified\n", progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	if (datadir_target == NULL)
+	{
+		fprintf(stderr, "%s: no target data directory specified\n", progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	if (argc != optind)
+	{
+		fprintf(stderr, "%s: invalid arguments\n", progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	/*
+	 * Connect to remote server
+	 */
+	if (connstr_source)
+		libpqConnect(connstr_source);
+
+	/*
+	 * Ok, we have all the options and we're ready to start. Read in all the
+	 * information we need from both clusters.
+	 */
+	buffer = slurpFile(datadir_target, "global/pg_control", &size);
+	digestControlFile(&ControlFile_target, buffer, size);
+	pg_free(buffer);
+
+	buffer = fetchFile("global/pg_control", &size);
+	digestControlFile(&ControlFile_source, buffer, size);
+	pg_free(buffer);
+
+	sanityChecks();
+
+	/*
+	 * If both clusters are already on the same timeline, there's nothing
+	 * to do.
+	 */
+	if (ControlFile_target.checkPointCopy.ThisTimeLineID == ControlFile_source.checkPointCopy.ThisTimeLineID)
+	{
+		fprintf(stderr, "source and target cluster are both on the same timeline.\n");
+		exit(1);
+	}
+
+	findCommonAncestorTimeline(&divergerec, &lastcommontli);
+	printf("The servers diverged at WAL position %X/%X on timeline %u.\n",
+		   (uint32) (divergerec >> 32), (uint32) divergerec, lastcommontli);
+
+	/*
+	 * Check for the possibility that the target is in fact a direct ancestor
+	 * of the source. In that case, there is no divergent history in the
+	 * target that needs rewinding.
+	 */
+	if (ControlFile_target.checkPoint >= divergerec)
+	{
+		rewind_needed = true;
+	}
+	else
+	{
+		XLogRecPtr chkptendrec;
+
+		/* Read the checkpoint record on the target to see where it ends. */
+		chkptendrec = readOneRecord(datadir_target,
+									ControlFile_target.checkPoint,
+									ControlFile_target.checkPointCopy.ThisTimeLineID);
+
+		/*
+		 * If the histories diverged exactly at the end of the shutdown
+		 * checkpoint record on the target, there are no WAL records in the
+		 * target that don't belong in the source's history, and no rewind is
+		 * needed.
+		 */
+		if (chkptendrec == divergerec)
+			rewind_needed = false;
+		else
+			rewind_needed = true;
+	}
+
+	if (!rewind_needed)
+	{
+		printf("No rewind required.\n");
+		exit(0);
+	}
+	findLastCheckpoint(datadir_target, divergerec, lastcommontli,
+					   &chkptrec, &chkpttli, &chkptredo);
+	printf("Rewinding from Last common checkpoint at %X/%X on timeline %u\n",
+		   (uint32) (chkptrec >> 32), (uint32) chkptrec,
+		   chkpttli);
+
+	/*
+	 * Build the filemap, by comparing the remote and local data directories
+	 */
+	(void) filemap_create();
+	fetchRemoteFileList();
+	traverse_datadir(datadir_target, &process_local_file);
+
+	/*
+	 * Read the target WAL from last checkpoint before the point of fork,
+	 * to extract all the pages that were modified on the target cluster
+	 * after the fork.
+	 */
+	extractPageMap(datadir_target, chkptrec, lastcommontli);
+
+	filemap_finalize();
+
+	/* XXX: this is probably too verbose even in verbose mode */
+	if (verbose)
+		print_filemap();
+
+	/* Ok, we're ready to start copying things over. */
+	executeFileMap();
+
+	createBackupLabel(chkptredo, chkpttli, chkptrec);
+
+	/*
+	 * Update control file of target file and make it ready to
+	 * perform archive recovery when restarting.
+	 */
+	memcpy(&ControlFile, &ControlFile_source, sizeof(ControlFileData));
+	ControlFile.minRecoveryPoint = divergerec;
+	ControlFile.minRecoveryPointTLI = ControlFile_target.checkPointCopy.ThisTimeLineID;
+	ControlFile.state = DB_IN_ARCHIVE_RECOVERY;
+	updateControlFile(&ControlFile, datadir_target);
+
+	printf("Done!\n");
+
+	return 0;
+}
+
+static void
+sanityChecks(void)
+{
+	/* Check that there's no backup_label in either cluster */
+	/* Check system_id match */
+	if (ControlFile_target.system_identifier != ControlFile_source.system_identifier)
+	{
+		fprintf(stderr, "source and target clusters are from different systems\n");
+		exit(1);
+	}
+	/* check version */
+	if (ControlFile_target.pg_control_version != PG_CONTROL_VERSION ||
+		ControlFile_source.pg_control_version != PG_CONTROL_VERSION ||
+		ControlFile_target.catalog_version_no != CATALOG_VERSION_NO ||
+		ControlFile_source.catalog_version_no != CATALOG_VERSION_NO)
+	{
+		fprintf(stderr, "clusters are not compatible with this version of pg_rewind\n");
+		exit(1);
+	}
+
+	/*
+	 * Target cluster need to use checksums or hint bit wal-logging, this to
+	 * prevent from data corruption that could occur because of hint bits.
+	 */
+	if (ControlFile_target.data_checksum_version != PG_DATA_CHECKSUM_VERSION &&
+		!ControlFile_target.wal_log_hints)
+	{
+		fprintf(stderr, "target master need to use either data checksums or \"wal_log_hints = on\".\n");
+		exit(1);
+	}
+
+	/*
+	 * Target cluster better not be running. This doesn't guard against someone
+	 * starting the cluster concurrently. Also, this is probably more strict
+	 * than necessary; it's OK if the master was not shut down cleanly, as
+	 * long as it isn't running at the moment.
+	 */
+	if (ControlFile_target.state != DB_SHUTDOWNED)
+	{
+		fprintf(stderr, "target master must be shut down cleanly.\n");
+		exit(1);
+	}
+}
+
+/*
+ * Determine the TLI of the last common timeline in the histories of the two
+ * clusters. *tli is set to the last common timeline, and *recptr is set to
+ * the position where the histories diverged (ie. the first WAL record that's
+ * not the same in both clusters).
+ *
+ * Control files of both clusters must be read into ControlFile_target/source
+ * before calling this.
+ */
+static void
+findCommonAncestorTimeline(XLogRecPtr *recptr, TimeLineID *tli)
+{
+	TimeLineID	targettli;
+	TimeLineHistoryEntry *sourceHistory;
+	int			nentries;
+	int			i;
+	TimeLineID	sourcetli;
+
+	targettli = ControlFile_target.checkPointCopy.ThisTimeLineID;
+	sourcetli = ControlFile_source.checkPointCopy.ThisTimeLineID;
+
+	/* Timeline 1 does not have a history file, so no need to check */
+	if (sourcetli == 1)
+	{
+		sourceHistory = (TimeLineHistoryEntry *) pg_malloc(sizeof(TimeLineHistoryEntry));
+		sourceHistory->tli = sourcetli;
+		sourceHistory->begin = sourceHistory->end = InvalidXLogRecPtr;
+		nentries = 1;
+	}
+	else
+	{
+		char		path[MAXPGPATH];
+		char	   *histfile;
+
+		TLHistoryFilePath(path, sourcetli);
+		histfile = fetchFile(path, NULL);
+
+		sourceHistory = rewind_parseTimeLineHistory(histfile,
+													ControlFile_source.checkPointCopy.ThisTimeLineID,
+													&nentries);
+		pg_free(histfile);
+	}
+
+	/*
+	 * Trace the history backwards, until we hit the target timeline.
+	 *
+	 * TODO: This assumes that there are no timeline switches on the target
+	 * cluster after the fork.
+	 */
+	for (i = nentries - 1; i >= 0; i--)
+	{
+		TimeLineHistoryEntry *entry = &sourceHistory[i];
+		if (entry->tli == targettli)
+		{
+			/* found it */
+			*recptr = entry->end;
+			*tli = entry->tli;
+
+			free(sourceHistory);
+			return;
+		}
+	}
+
+	fprintf(stderr, "could not find common ancestor of the source and target cluster's timelines\n");
+	exit(1);
+}
+
+
+/*
+ * Create a backup_label file that forces recovery to begin at the last common
+ * checkpoint.
+ */
+static void
+createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli, XLogRecPtr checkpointloc)
+{
+	XLogSegNo	startsegno;
+	char		BackupLabelFilePath[MAXPGPATH];
+	FILE	   *fp;
+	time_t		stamp_time;
+	char		strfbuf[128];
+	char		xlogfilename[MAXFNAMELEN];
+	struct tm  *tmp;
+
+	if (dry_run)
+		return;
+
+	XLByteToSeg(startpoint, startsegno);
+	XLogFileName(xlogfilename, starttli, startsegno);
+
+	/*
+	 * TODO: move old file out of the way, if any. And perhaps create the
+	 * file with temporary name first and rename in place after it's done.
+	 */
+	snprintf(BackupLabelFilePath, MAXPGPATH,
+			 "%s/backup_label" /* BACKUP_LABEL_FILE */, datadir_target);
+
+	/*
+	 * Construct backup label file
+	 */
+
+	fp = fopen(BackupLabelFilePath, "wb");
+
+	stamp_time = time(NULL);
+	tmp = localtime(&stamp_time);
+	strftime(strfbuf, sizeof(strfbuf), "%Y-%m-%d %H:%M:%S %Z", tmp);
+	fprintf(fp, "START WAL LOCATION: %X/%X (file %s)\n",
+			(uint32) (startpoint >> 32), (uint32) startpoint, xlogfilename);
+	fprintf(fp, "CHECKPOINT LOCATION: %X/%X\n",
+			(uint32) (checkpointloc >> 32), (uint32) checkpointloc);
+	fprintf(fp, "BACKUP METHOD: pg_rewind\n");
+	fprintf(fp, "BACKUP FROM: standby\n");
+	fprintf(fp, "START TIME: %s\n", strfbuf);
+	/* omit LABEL: line */
+
+	if (fclose(fp) != 0)
+	{
+		fprintf(stderr, _("could not write backup label file \"%s\": %s\n"),
+				BackupLabelFilePath, strerror(errno));
+		exit(2);
+	}
+}
+
+/*
+ * Check CRC of control file
+ */
+static void
+checkControlFile(ControlFileData *ControlFile)
+{
+	pg_crc32	crc;
+
+	/* Calculate CRC */
+	INIT_CRC32C(crc);
+	COMP_CRC32C(crc,
+			   (char *) ControlFile,
+			   offsetof(ControlFileData, crc));
+	FIN_CRC32C(crc);
+
+	/* And simply compare it */
+	if (!EQ_CRC32C(crc, ControlFile->crc))
+	{
+		fprintf(stderr, "unexpected control file CRC\n");
+		exit(1);
+	}
+}
+
+/*
+ * Verify control file contents in the buffer src, and copy it to *ControlFile.
+ */
+static void
+digestControlFile(ControlFileData *ControlFile, char *src, size_t size)
+{
+	if (size != PG_CONTROL_SIZE)
+	{
+		fprintf(stderr, "unexpected control file size %d, expected %d\n",
+				(int) size, PG_CONTROL_SIZE);
+		exit(1);
+	}
+	memcpy(ControlFile, src, sizeof(ControlFileData));
+
+	/* Additional checks on control file */
+	checkControlFile(ControlFile);
+}
+
+/*
+ * Update a control file with fresh content
+ */
+static void
+updateControlFile(ControlFileData *ControlFile, char *datadir)
+{
+	char    path[MAXPGPATH];
+	char	buffer[PG_CONTROL_SIZE];
+	FILE    *fp;
+
+	if (dry_run)
+		return;
+
+	/* Recalculate CRC of control file */
+	INIT_CRC32C(ControlFile->crc);
+	COMP_CRC32C(ControlFile->crc,
+			   (char *) ControlFile,
+			   offsetof(ControlFileData, crc));
+	FIN_CRC32C(ControlFile->crc);
+
+	/*
+	 * Write out PG_CONTROL_SIZE bytes into pg_control by zero-padding
+	 * the excess over sizeof(ControlFileData) to avoid premature EOF
+	 * related errors when reading it.
+	 */
+	memset(buffer, 0, PG_CONTROL_SIZE);
+	memcpy(buffer, ControlFile, sizeof(ControlFileData));
+
+	snprintf(path, MAXPGPATH,
+			 "%s/global/pg_control", datadir);
+
+	if ((fp  = fopen(path, "wb")) == NULL)
+	{
+		fprintf(stderr,"Could not open the pg_control file\n");
+		exit(1);
+	}
+
+	if (fwrite(buffer, 1,
+			   PG_CONTROL_SIZE, fp) != PG_CONTROL_SIZE)
+	{
+		fprintf(stderr,"Could not write the pg_control file\n");
+		exit(1);
+	}
+
+	if (fclose(fp))
+	{
+		fprintf(stderr,"Could not close the pg_control file\n");
+		exit(1);
+	}
+ }
diff --git a/contrib/pg_rewind/pg_rewind.h b/contrib/pg_rewind/pg_rewind.h
new file mode 100644
index 0000000..c89b1ec
--- /dev/null
+++ b/contrib/pg_rewind/pg_rewind.h
@@ -0,0 +1,43 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_rewind.h
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_REWIND_H
+#define PG_REWIND_H
+
+#include "c.h"
+
+#include "datapagemap.h"
+#include "util.h"
+
+#include "access/timeline.h"
+#include "storage/block.h"
+#include "storage/relfilenode.h"
+
+/* Configuration options */
+extern char *datadir_target;
+extern char *datadir_source;
+extern char *connstr_source;
+extern int verbose;
+extern int dry_run;
+
+
+/* in parsexlog.c */
+extern void extractPageMap(const char *datadir, XLogRecPtr startpoint, TimeLineID tli);
+extern void findLastCheckpoint(const char *datadir, XLogRecPtr searchptr, TimeLineID tli,
+				   XLogRecPtr *lastchkptrec, TimeLineID *lastchkpttli,
+				   XLogRecPtr *lastchkptredo);
+extern XLogRecPtr readOneRecord(const char *datadir, XLogRecPtr ptr, TimeLineID tli);
+
+
+/* in timeline.c */
+extern TimeLineHistoryEntry *rewind_parseTimeLineHistory(char *buffer,
+		    TimeLineID targetTLI, int *nentries);
+
+#endif   /* PG_REWIND_H */
diff --git a/contrib/pg_rewind/sql/basictest.sql b/contrib/pg_rewind/sql/basictest.sql
new file mode 100644
index 0000000..cee59c2
--- /dev/null
+++ b/contrib/pg_rewind/sql/basictest.sql
@@ -0,0 +1,53 @@
+#!/bin/bash
+
+# This file has the .sql extension, but it is actually launched as a shell
+# script. This contortion is necessary because pg_regress normally uses
+# psql to run the input scripts, and requires them to have the .sql
+# extension, but we use a custom launcher script that runs the scripts using
+# a shell instead.
+
+TESTNAME=basictest
+
+. sql/config_test.sh
+
+# Do an insert in master.
+function before_standby
+{
+$MASTER_PSQL <<EOF
+CREATE TABLE tbl1 (d text);
+INSERT INTO tbl1 VALUES ('in master');
+CHECKPOINT;
+EOF
+}
+
+function standby_following_master
+{
+# Insert additional data on master that will be replicated to standby
+$MASTER_PSQL -c "INSERT INTO tbl1 values ('in master, before promotion');"
+
+# Launch checkpoint after standby has been started
+$MASTER_PSQL -c "CHECKPOINT;"
+}
+
+# This script runs after the standby has been promoted. Old Master is still
+# running.
+function after_promotion
+{
+# Insert a row in the old master. This causes the master and standby to have
+# "diverged", it's no longer possible to just apply the standy's logs over
+# master directory - you need to rewind.
+$MASTER_PSQL -c "INSERT INTO tbl1 VALUES ('in master, after promotion');"
+
+# Also insert a new row in the standby, which won't be present in the old
+# master.
+$STANDBY_PSQL -c "INSERT INTO tbl1 VALUES ('in standby, after promotion');"
+}
+
+# Compare results generated by querying new master after rewind
+function after_rewind
+{
+$MASTER_PSQL -c "SELECT * from tbl1"
+}
+
+# Run the test
+. sql/run_test.sh
diff --git a/contrib/pg_rewind/sql/config_test.sh b/contrib/pg_rewind/sql/config_test.sh
new file mode 100755
index 0000000..0baa468
--- /dev/null
+++ b/contrib/pg_rewind/sql/config_test.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+#
+# Initialize some variables, before running pg_rewind.sh.
+
+set -e
+
+mkdir -p "regress_log"
+log_path="regress_log/pg_rewind_log_${TESTNAME}_${TEST_SUITE}"
+: ${MAKE=make}
+
+rm -f $log_path
+
+# Guard against parallel make issues (see comments in pg_regress.c)
+unset MAKEFLAGS
+unset MAKELEVEL
+
+# Check at least that the option given is suited
+if [ "$TEST_SUITE" = 'remote' ]; then
+	echo "Running tests with libpq connection as source" >>$log_path 2>&1
+	TEST_SUITE="remote"
+elif [ "$TEST_SUITE" = 'local' ]; then
+	echo "Running tests with local data folder as source" >>$log_path 2>&1
+	TEST_SUITE="local"
+else
+	echo "TEST_SUITE environment variable must be set to either \"local\" or \"remote\""
+	exit 1
+fi
+
+# Set listen_addresses desirably
+testhost=`uname -s`
+case $testhost in
+	MINGW*) LISTEN_ADDRESSES="localhost" ;;
+	*)      LISTEN_ADDRESSES="" ;;
+esac
+
+# Indicate of binaries
+PATH=$bindir:$PATH
+export PATH
+
+# Adjust these paths for your environment
+TESTROOT=$PWD/tmp_check
+TEST_MASTER=$TESTROOT/data_master
+TEST_STANDBY=$TESTROOT/data_standby
+
+# Create the root folder for test data
+mkdir -p $TESTROOT
+
+# Clear out any environment vars that might cause libpq to connect to
+# the wrong postmaster (cf pg_regress.c)
+#
+# Some shells, such as NetBSD's, return non-zero from unset if the variable
+# is already unset. Since we are operating under 'set -e', this causes the
+# script to fail. To guard against this, set them all to an empty string first.
+PGDATABASE="";        unset PGDATABASE
+PGUSER="";            unset PGUSER
+PGSERVICE="";         unset PGSERVICE
+PGSSLMODE="";         unset PGSSLMODE
+PGREQUIRESSL="";      unset PGREQUIRESSL
+PGCONNECT_TIMEOUT=""; unset PGCONNECT_TIMEOUT
+PGHOST="";            unset PGHOST
+PGHOSTADDR="";        unset PGHOSTADDR
+
+export PGDATABASE="postgres"
+
+# Define non conflicting ports for both nodes, this could be a bit
+# smarter with for example dynamic port recognition using psql but
+# this will make it for now.
+PG_VERSION_NUM=90401
+PORT_MASTER=`expr $PG_VERSION_NUM % 16384 + 49152`
+PORT_STANDBY=`expr $PORT_MASTER + 1`
+
+MASTER_PSQL="psql -a --no-psqlrc -p $PORT_MASTER"
+STANDBY_PSQL="psql -a --no-psqlrc -p $PORT_STANDBY"
diff --git a/contrib/pg_rewind/sql/databases.sql b/contrib/pg_rewind/sql/databases.sql
new file mode 100644
index 0000000..60520d2
--- /dev/null
+++ b/contrib/pg_rewind/sql/databases.sql
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+# This file has the .sql extension, but it is actually launched as a shell
+# script. This contortion is necessary because pg_regress normally uses
+# psql to run the input scripts, and requires them to have the .sql
+# extension, but we use a custom launcher script that runs the scripts using
+# a shell instead.
+
+TESTNAME=databases
+
+. sql/config_test.sh
+
+# Create a database in master.
+function before_standby
+{
+$MASTER_PSQL <<EOF
+CREATE DATABASE inmaster;
+EOF
+}
+
+function standby_following_master
+{
+# Create another database after promotion
+$MASTER_PSQL -c "CREATE DATABASE beforepromotion"
+}
+
+# This script runs after the standby has been promoted. Old Master is still
+# running.
+function after_promotion
+{
+$MASTER_PSQL -c "CREATE DATABASE master_afterpromotion"
+
+$STANDBY_PSQL -c "CREATE DATABASE standby_afterpromotion"
+}
+
+# Compare results generated by querying new master after rewind
+function after_rewind
+{
+$MASTER_PSQL -c "SELECT datname from pg_database"
+}
+
+# Run the test
+. sql/run_test.sh
diff --git a/contrib/pg_rewind/sql/extrafiles.sql b/contrib/pg_rewind/sql/extrafiles.sql
new file mode 100644
index 0000000..8512369
--- /dev/null
+++ b/contrib/pg_rewind/sql/extrafiles.sql
@@ -0,0 +1,52 @@
+#!/bin/bash
+
+# This file has the .sql extension, but it is actually launched as a shell
+# script. This contortion is necessary because pg_regress normally uses
+# psql to run the input scripts, and requires them to have the .sql
+# extension, but we use a custom launcher script that runs the scripts using
+# a shell instead.
+
+# Test how pg_rewind reacts to extra files and directories in the data dirs.
+
+TESTNAME=extrafiles
+
+. sql/config_test.sh
+
+# Create a subdir that will be present in both
+function before_standby
+{
+  mkdir $TEST_MASTER/tst_both_dir
+  echo "in both1" > $TEST_MASTER/tst_both_dir/both_file1
+  echo "in both2" > $TEST_MASTER/tst_both_dir/both_file2
+  mkdir $TEST_MASTER/tst_both_dir/both_subdir/
+  echo "in both3" > $TEST_MASTER/tst_both_dir/both_subdir/both_file3
+}
+
+# Create subdirs that will be present only in one data dir.
+function standby_following_master
+{
+  mkdir $TEST_STANDBY/tst_standby_dir
+  echo "in standby1" > $TEST_STANDBY/tst_standby_dir/standby_file1
+  echo "in standby2" > $TEST_STANDBY/tst_standby_dir/standby_file2
+  mkdir $TEST_STANDBY/tst_standby_dir/standby_subdir/
+  echo "in standby3" > $TEST_STANDBY/tst_standby_dir/standby_subdir/standby_file3
+  mkdir $TEST_MASTER/tst_master_dir
+  echo "in master1" > $TEST_MASTER/tst_master_dir/master_file1
+  echo "in master2" > $TEST_MASTER/tst_master_dir/master_file2
+  mkdir $TEST_MASTER/tst_master_dir/master_subdir/
+  echo "in master3" > $TEST_MASTER/tst_master_dir/master_subdir/master_file3
+}
+
+function after_promotion
+{
+  :
+}
+
+# See what files and directories are present after rewind.
+function after_rewind
+{
+    (cd $TEST_MASTER; find tst_* | sort)
+}
+
+# Run the test
+. sql/run_test.sh
diff --git a/contrib/pg_rewind/sql/run_test.sh b/contrib/pg_rewind/sql/run_test.sh
new file mode 100755
index 0000000..a6b934c
--- /dev/null
+++ b/contrib/pg_rewind/sql/run_test.sh
@@ -0,0 +1,149 @@
+#!/bin/bash
+#
+# pg_rewind.sh
+#
+# Test driver for pg_rewind. This test script initdb's and configures a
+# cluster and creates a table with some data in it. Then, it makes a
+# standby of it with pg_basebackup, and promotes the standby.
+#
+# The result is two clusters, so that the old "master" cluster can be
+# resynchronized with pg_rewind to catch up with the new "standby" cluster.
+# This test can be run with either a local data folder or a remote
+# connection as source.
+#
+# Before running this script, the calling script should've included
+# config_test.sh, and defined four functions to define the test case:
+#
+#  before_standby  - runs after initializing the master, before creating the
+#                    standby
+#  standby_following_master - runs after standby has been created and started
+#  after_promotion - runs after standby has been promoted, but old master is
+#                    still running
+#  after_rewind    - runs after pg_rewind and after restarting the rewound
+#                    old master
+#
+# In those functions, the test script can use $MASTER_PSQL and $STANDBY_PSQL
+# to run psql against the master and standby servers, to cause the servers
+# to diverge.
+
+PATH=$bindir:$PATH
+export PATH
+
+# Initialize master, data checksums are mandatory
+rm -rf $TEST_MASTER
+initdb -N -A trust -D $TEST_MASTER >>$log_path
+
+# Custom parameters for master's postgresql.conf
+cat >> $TEST_MASTER/postgresql.conf <<EOF
+wal_level = hot_standby
+max_wal_senders = 2
+wal_keep_segments = 20
+checkpoint_segments = 50
+shared_buffers = 1MB
+wal_log_hints = on
+log_line_prefix = 'M  %m %p '
+hot_standby = on
+autovacuum = off
+max_connections = 50
+listen_addresses = '$LISTEN_ADDRESSES'
+port = $PORT_MASTER
+EOF
+
+# Accept replication connections on master
+cat >> $TEST_MASTER/pg_hba.conf <<EOF
+local replication all trust
+host replication all 127.0.0.1/32 trust
+host replication all ::1/128 trust
+EOF
+
+pg_ctl -w -D $TEST_MASTER start >>$log_path 2>&1
+
+#### Now run the test-specific parts to initialize the master before setting
+# up standby
+echo "Master initialized and running."
+before_standby
+
+# Set up standby with necessary parameter
+rm -rf $TEST_STANDBY
+
+# Base backup is taken with xlog files included
+pg_basebackup -D $TEST_STANDBY -p $PORT_MASTER -x >>$log_path 2>&1
+echo "port = $PORT_STANDBY" >> $TEST_STANDBY/postgresql.conf
+
+cat > $TEST_STANDBY/recovery.conf <<EOF
+primary_conninfo='port=$PORT_MASTER'
+standby_mode=on
+recovery_target_timeline='latest'
+EOF
+
+# Start standby
+pg_ctl -w -D $TEST_STANDBY start >>$log_path 2>&1
+
+#### Now run the test-specific parts to run after standby has been started
+# up standby
+echo "Standby initialized and running."
+standby_following_master
+
+# sleep a bit to make sure the standby has caught up.
+sleep 1
+
+# Now promote slave and insert some new data on master, this will put
+# the master out-of-sync with the standby.
+pg_ctl -w -D $TEST_STANDBY promote >>$log_path 2>&1
+sleep 1
+
+#### Now run the test-specific parts to run after promotion
+echo "Standby promoted."
+after_promotion
+
+# Stop the master and be ready to perform the rewind
+pg_ctl -w -D $TEST_MASTER stop -m fast >>$log_path 2>&1
+
+# At this point, the rewind processing is ready to run.
+# We now have a very simple scenario with a few diverged WAL record.
+# The real testing begins really now with a bifurcation of the possible
+# scenarios that pg_rewind supports.
+
+# Keep a temporary postgresql.conf for master node or it would be
+# overwritten during the rewind.
+cp $TEST_MASTER/postgresql.conf $TESTROOT/master-postgresql.conf.tmp
+
+# Now run pg_rewind
+echo "Running pg_rewind..."
+echo "Running pg_rewind..." >> $log_path
+if [ $TEST_SUITE == "local" ]; then
+	# Do rewind using a local pgdata as source
+	pg_rewind \
+		--source-pgdata=$TEST_STANDBY \
+		--target-pgdata=$TEST_MASTER >>$log_path 2>&1
+elif [ $TEST_SUITE == "remote" ]; then
+	# Do rewind using a remote connection as source
+	pg_rewind \
+		--source-server="port=$PORT_STANDBY dbname=postgres" \
+		--target-pgdata=$TEST_MASTER >>$log_path 2>&1
+else
+	# Cannot come here normally
+	echo "Incorrect test suite specified"
+	exit 1
+fi
+
+# Now move back postgresql.conf with old settings
+mv $TESTROOT/master-postgresql.conf.tmp $TEST_MASTER/postgresql.conf
+
+# Plug-in rewound node to the now-promoted standby node
+cat > $TEST_MASTER/recovery.conf <<EOF
+primary_conninfo='port=$PORT_STANDBY'
+standby_mode=on
+recovery_target_timeline='latest'
+EOF
+
+# Restart the master to check that rewind went correctly
+pg_ctl -w -D $TEST_MASTER start >>$log_path 2>&1
+
+#### Now run the test-specific parts to check the result
+echo "Old master restarted after rewind."
+after_rewind
+
+# Stop remaining servers
+pg_ctl stop -D $TEST_MASTER -m fast -w >>$log_path 2>&1
+pg_ctl stop -D $TEST_STANDBY -m fast -w >>$log_path 2>&1
diff --git a/contrib/pg_rewind/timeline.c b/contrib/pg_rewind/timeline.c
new file mode 100644
index 0000000..b626eb2
--- /dev/null
+++ b/contrib/pg_rewind/timeline.c
@@ -0,0 +1,131 @@
+/*-------------------------------------------------------------------------
+ *
+ * timeline.c
+ *	  timeline-related functions.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "pg_rewind.h"
+
+#include "access/timeline.h"
+#include "access/xlog_internal.h"
+
+/*
+ * This is copy-pasted from the backend readTimeLineHistory, modified to
+ * return a malloc'd array and to work without backend functions.
+ */
+/*
+ * Try to read a timeline's history file.
+ *
+ * If successful, return the list of component TLIs (the given TLI followed by
+ * its ancestor TLIs).	If we can't find the history file, assume that the
+ * timeline has no parents, and return a list of just the specified timeline
+ * ID.
+ */
+TimeLineHistoryEntry *
+rewind_parseTimeLineHistory(char *buffer, TimeLineID targetTLI, int *nentries)
+{
+	char	   *fline;
+	TimeLineHistoryEntry *entry;
+	TimeLineHistoryEntry *entries = NULL;
+	int			nlines = 0;
+	TimeLineID	lasttli = 0;
+	XLogRecPtr	prevend;
+	char	   *bufptr;
+	bool		lastline = false;
+
+	/*
+	 * Parse the file...
+	 */
+	prevend = InvalidXLogRecPtr;
+	bufptr = buffer;
+	while (!lastline)
+	{
+		char	   *ptr;
+		TimeLineID	tli;
+		uint32		switchpoint_hi;
+		uint32		switchpoint_lo;
+		int			nfields;
+
+		fline = bufptr;
+		while (*bufptr && *bufptr != '\n')
+			bufptr++;
+		if (!(*bufptr))
+			lastline = true;
+		else
+			*bufptr++ = '\0';
+
+		/* skip leading whitespace and check for # comment */
+		for (ptr = fline; *ptr; ptr++)
+		{
+			if (!isspace((unsigned char) *ptr))
+				break;
+		}
+		if (*ptr == '\0' || *ptr == '#')
+			continue;
+
+		nfields = sscanf(fline, "%u\t%X/%X", &tli, &switchpoint_hi, &switchpoint_lo);
+
+		if (nfields < 1)
+		{
+			/* expect a numeric timeline ID as first field of line */
+			fprintf(stderr, "syntax error in history file: %s\n", fline);
+			fprintf(stderr, "Expected a numeric timeline ID.\n");
+		}
+		if (nfields != 3)
+		{
+			fprintf(stderr, "syntax error in history file: %s\n", fline);
+			fprintf(stderr, "Expected an XLOG switchpoint location.\n");
+		}
+		if (entries && tli <= lasttli)
+		{
+			fprintf(stderr, "invalid data in history file: %s\n", fline);
+			fprintf(stderr, "Timeline IDs must be in increasing sequence.\n");
+		}
+
+		lasttli = tli;
+
+		nlines++;
+		if (entries)
+			entries = pg_realloc(entries, nlines * sizeof(TimeLineHistoryEntry));
+		else
+			entries = pg_malloc(1 * sizeof(TimeLineHistoryEntry));
+
+		entry = &entries[nlines - 1];
+		entry->tli = tli;
+		entry->begin = prevend;
+		entry->end = ((uint64) (switchpoint_hi)) << 32 | (uint64) switchpoint_lo;
+		prevend = entry->end;
+
+		/* we ignore the remainder of each line */
+	}
+
+	if (entries && targetTLI <= lasttli)
+	{
+		fprintf(stderr, "invalid data in history file\n");
+		fprintf(stderr, "Timeline IDs must be less than child timeline's ID.\n");
+		exit(1);
+	}
+
+	/*
+	 * Create one more entry for the "tip" of the timeline, which has no
+	 * entry in the history file.
+	 */
+	nlines++;
+	if (entries)
+		entries = pg_realloc(entries, nlines * sizeof(TimeLineHistoryEntry));
+	else
+		entries = pg_malloc(1 * sizeof(TimeLineHistoryEntry));
+
+	entry = &entries[nlines - 1];
+	entry->tli = targetTLI;
+	entry->begin = prevend;
+	entry->end = InvalidXLogRecPtr;
+
+	*nentries = nlines;
+	return entries;
+}
diff --git a/contrib/pg_rewind/util.c b/contrib/pg_rewind/util.c
new file mode 100644
index 0000000..6e884bb
--- /dev/null
+++ b/contrib/pg_rewind/util.c
@@ -0,0 +1,32 @@
+/*-------------------------------------------------------------------------
+ *
+ * util.c
+ *		Misc utility functions
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "common/relpath.h"
+#include "catalog/catalog.h"
+#include "catalog/pg_tablespace.h"
+
+#include "pg_rewind.h"
+
+char *
+datasegpath(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
+{
+	char *path = relpathperm(rnode, forknum);
+
+	if (segno > 0)
+	{
+		char *segpath = pg_malloc(strlen(path) + 13);
+		sprintf(segpath, "%s.%u", path, segno);
+		pg_free(path);
+		return segpath;
+	}
+	else
+		return path;
+}
diff --git a/contrib/pg_rewind/util.h b/contrib/pg_rewind/util.h
new file mode 100644
index 0000000..c722648
--- /dev/null
+++ b/contrib/pg_rewind/util.h
@@ -0,0 +1,15 @@
+/*-------------------------------------------------------------------------
+ *
+ * util.h
+ *		Prototypes for functions in util.c
+ *
+ * Copyright (c) 2013-2014, PostgreSQL Global Development Group
+ *-------------------------------------------------------------------------
+ */
+#ifndef UTIL_H
+#define UTIL_H
+
+extern char *datasegpath(RelFileNode rnode, ForkNumber forknum,
+	    BlockNumber segno);
+
+#endif   /* UTIL_H */
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index a698d0f..10e6e2b 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -203,6 +203,7 @@ pages.
   </para>
 
  &pgarchivecleanup;
+ &pgrewind;
  &pgstandby;
  &pgtestfsync;
  &pgtesttiming;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index f03b72a..d42de27 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -130,6 +130,7 @@
 <!ENTITY pgcrypto        SYSTEM "pgcrypto.sgml">
 <!ENTITY pgfreespacemap  SYSTEM "pgfreespacemap.sgml">
 <!ENTITY pgprewarm       SYSTEM "pgprewarm.sgml">
+<!ENTITY pgrewind        SYSTEM "pgrewind.sgml">
 <!ENTITY pgrowlocks      SYSTEM "pgrowlocks.sgml">
 <!ENTITY pgstandby       SYSTEM "pgstandby.sgml">
 <!ENTITY pgstatstatements SYSTEM "pgstatstatements.sgml">
diff --git a/doc/src/sgml/pgrewind.sgml b/doc/src/sgml/pgrewind.sgml
new file mode 100644
index 0000000..d6711da
--- /dev/null
+++ b/doc/src/sgml/pgrewind.sgml
@@ -0,0 +1,135 @@
+<!-- doc/src/sgml/pgrewind.sgml -->
+
+<refentry id="pgrewind">
+ <indexterm zone="pgrewind">
+  <primary>pg_rewind</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_rewind</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_rewind</refname>
+  <refpurpose>synchronize a <productname>PostgreSQL</productname> data directory with another data directory that was forked from the first one</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_rewind </command>
+   <arg choice="plain"><option>-D</option> --target-pgdata=<replaceable>DIRECTORY</replaceable></arg>
+   <arg choice="plain"><option>--source-pgdata=<replaceable>DIRECTORY</replaceable></option></arg>
+   <arg choice="plain"><option>--source-server=<replaceable>CONNSTR</replaceable></option></arg>
+   <arg choice="opt"><option>-v</option></arg>
+   <arg choice="opt"><option>-n</option> --dry-run</arg>
+   <arg choice="opt"><option>--help</option></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+    pg_rewind is a tool for synchronizing a PostgreSQL data directory with another
+PostgreSQL data directory that was forked from the first one. The result is
+equivalent to rsyncing the first data directory (referred to as the old cluster
+from now on) with the second one (the new cluster). The advantage of pg_rewind
+over rsync is that pg_rewind uses the WAL to determine changed data blocks,
+and does not require reading through all files in the cluster. That makes it
+a lot faster when the database is large and only a small portion of it differs
+between the clusters.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_rewind</application> accepts the following command-line arguments:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem><para>enable verbose progress information</para></listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-V</option></term>
+      <term><option>--version</option></term>
+      <listitem><para>display version information, then exit</para></listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+      <listitem><para>show help, then exit</para></listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Usage</title>
+
+  <para>
+<synopsis>
+    pg_rewind --target-pgdata=<replaceable>path</replaceable> --source-server=<replaceable>new server's conn string</replaceable>
+</synopsis>
+The contents of the old data directory will be overwritten with the new data
+so that after pg_rewind finishes, the old data directory is equal to the new
+one.
+  </para>
+
+  <para>
+   pg_rewind expects to find all the necessary WAL files in the pg_xlog
+   directories of the clusters. This includes all the WAL on both clusters
+   starting from the last common checkpoint preceding the fork. Fetching
+   missing files from a WAL archive is currently not supported. However, you
+   can copy any missing files manually from the WAL archive to the pg_xlog
+   directory.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Theory of operation</title>
+
+  <para>
+
+The basic idea is to copy everything from the new cluster to the old cluster,
+except for the blocks that we know to be the same.
+
+1. Scan the WAL log of the old cluster, starting from the last checkpoint before
+the point where the new cluster's timeline history forked off from the old cluster.
+For each WAL record, make a note of the data blocks that were touched. This yields
+a list of all the data blocks that were changed in the old cluster, after the new
+cluster forked off.
+
+2. Copy all those changed blocks from the new cluster to the old cluster.
+
+3. Copy all other files like clog, conf files etc. from the new cluster to old cluster.
+Everything except the relation files.
+
+4. Apply the WAL from the new cluster, starting from the checkpoint created at
+failover. (pg_rewind doesn't actually apply the WAL, it just creates a backup
+label file indicating that when PostgreSQL is started, it will start replay
+from that checkpoint and apply all the required WAL)
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Restrictions</title>
+
+  <para>
+   <application>pg_reind</> needs that cluster uses either data checksums that
+   can be enabled at server initialization with initdb or WAL logging of hint
+   bits that can be enabled by settings "wal_log_hints = on" in postgresql.conf.
+  </para>
+ </refsect1>
+
+</refentry>
#16Michael Paquier
michael.paquier@gmail.com
In reply to: Heikki Linnakangas (#15)
Re: pg_rewind in contrib

On Tue, Jan 6, 2015 at 2:38 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Here's an updated version of pg_rewind. The code itself is the same as
before, and corresponds to the sources in the current pg_rewind github
repository, as of commit a65e3754faf9ca9718e6b350abc736de586433b7. Based
mostly on Michael's comments, I have:

* replaced VMware copyright notices with PGDG ones.
* removed unnecessary cruft from .gitignore
* changed the --version line and "report bugs" notice in --help to match
other binaries in the PostgreSQL distribution
* moved documentation from README to the user manual.
* minor fixes to how the regression tests are launched so that they work
again

Some more work remains to be done on the regression tests. The way they're
launched now is quite weird. It was written that way to make it work outside
the PostgreSQL source tree, and also on Windows. Now that it lives in
contrib, it should be redesigned.

The documentation could also use some work; for now I just converted the
existing text from README to sgml format.

Some more comments:
- The binary pg_rewind needs to be ignored in contrib/pg_rewind/
- Be careful that ./launcher permission should be set to 755 when
doing a git add in it... Or regression tests will fail.
- It would be good to get make check working so as it includes both
check-local and check-remote
- installcheck should be a target ignored.
- Nitpicking: the header formats of filemap.c, datapagemap.c,
datapagemap.h and util.h are incorrect (I pushed a fix about that in
pg_rewind itself, feel free to pick it up).
- parsexlog.c has a copyright mention to Nippon Telegraph and
Telephone Corporation. Cannot we drop it safely?
- MSVC build is not supported yet. You need to do something similar to
pg_xlogdump, aka some magic with for example xlogreader.c.
- Error codes needs to be generated before building pg_rewind. If I do
for example a simple configure followed by make I get a failure:
$ ./configure
$ cd contrib/pg_rewind && make
In file included from parsexlog.c:16:
In file included from ../../src/include/postgres.h:48:
../../src/include/utils/elog.h:69:10: fatal error: 'utils/errcodes.h'
file not found
#include "utils/errcodes.h"
- Build fails with MinGW as there is visibly some unportable code:
copy_fetch.c: In function 'recurse_dir':
copy_fetch.c:112:3: warning: implicit declaration of function 'S_ISLNK' [-Wimpli
cit-function-declaration]
else if (S_ISLNK(fst.st_mode))
^
copy_fetch.c: In function 'check_samefile':
copy_fetch.c:298:2: warning: passing argument 2 of '_fstat64i32' from incompatib
le pointer type [enabled by default]
if (fstat(fd1, &statbuf1) < 0)
^
In file included from ../../src/include/port.h:283:0,
from ../../src/include/c.h:1050,
from ../../src/include/postgres_fe.h:25,
from copy_fetch.c:10:
c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *' but arg
ument is of type 'struct stat *'
__CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32 *_stat)
{
^
copy_fetch.c:304:2: warning: passing argument 2 of '_fstat64i32' from incompatib
le pointer type [enabled by default]
if (fstat(fd2, &statbuf2) < 0)
^
In file included from ../../src/include/port.h:283:0,
from ../../src/include/c.h:1050,
from ../../src/include/postgres_fe.h:25,
from copy_fetch.c:10:
c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *' but arg
ument is of type 'struct stat *'
__CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32 *_stat)
{
^
copy_fetch.c: In function 'truncate_target_file':
copy_fetch.c:436:2: warning: implicit declaration of function 'truncate' [-Wimpl
icit-function-declaration]
if (truncate(dstpath, newsize) != 0)
^
copy_fetch.c: In function 'slurpFile':
copy_fetch.c:561:2: warning: passing argument 2 of '_fstat64i32' from incompatib
le pointer type [enabled by default]
if (fstat(fd, &statbuf) < 0)
^
In file included from ../../src/include/port.h:283:0,
from ../../src/include/c.h:1050,
from ../../src/include/postgres_fe.h:25,
from copy_fetch.c:10:
c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *' but arg
ument is of type 'struct stat *'
__CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32 *_stat)
{
^
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We
ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f
wrapv -fexcess-precision=standard -g -DG -fno-omit-frame-pointer -I../../src/int
erfaces/libpq -I. -I. -I../../src/include -DFRONTEND "-I../../src/include/port/
win32" -c -o libpq_fetch.o libpq_fetch.c -MMD -MP -MF .deps/libpq_fetch.Po
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We
ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f
wrapv -fexcess-precision=standard -g -DG -fno-omit-frame-pointer -I../../src/int
erfaces/libpq -I. -I. -I../../src/include -DFRONTEND "-I../../src/include/port/
win32" -c -o filemap.o filemap.c -MMD -MP -MF .deps/filemap.Po
filemap.c:12:19: fatal error: regex.h: No such file or directory
#include <regex.h>
^
compilation terminated.
../../src/Makefile.global:732: recipe for target 'filemap.o' failed
make: *** [filemap.o] Error 1
Hm. I think that this is something we should try to fix first
upstream. I feel as well that regression tests are going to need some
tuning as well to work properly.

Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#15)
Re: pg_rewind in contrib

I applaud the ingenuity on all levels in this patch. But it seems to me
that there is way too much backend knowledge encoded and/or duplicated
in a front-end program.

If this ends up shipping, it's going to be a massively popular tool. I
see it as a companion to pg_basebackup. So it should sort of work the
same way. One problem is that it doesn't use the replication protocol,
so the setup is going to be inconsistent with pg_basebackup. Maybe the
replication protocol could be extended to provide the required data.
Maybe something as simple as "give me this file" would work.

That might lose the local copy mode, but how important is that?
pg_basebackup doesn't have that mode. In any case, the documentation
doesn't explain this distinction. The option documentation is a bit
short in any case, but it's not clear that you can choose between local
and remote mode.

The test suite should probably be reimplemented in Perl. (I might be
able to help.) Again, ingenious, but it's very hard to follow the
sequence of what is being tested. And some Windows person is going to
complain. ;-)

Also, since you have been maintaining this tool for a while, what is the
effort for maintaining it from version to version?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Michael Paquier
michael.paquier@gmail.com
In reply to: Peter Eisentraut (#17)
Re: pg_rewind in contrib

On Wed, Jan 7, 2015 at 5:39 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

Also, since you have been maintaining this tool for a while, what is the
effort for maintaining it from version to version?

From my own experience, the main source of maintenance across major
versions is the modification of the WAL record format to be able to
track the blocks that need to be copied from the newly promoted master
to the node rewound. That has been an ongoing effort on REL9_4_STABLE
and REL9_3_STABLE since this tool has been created when both versions
were in development to keep the tool compatible all the time. This
problem does not exist anymore on master thanks to the new WAL format
able to track easily the blocks modified, limiting the maintenance
necessary to actual bugs.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andrew Dunstan
andrew@dunslane.net
In reply to: Peter Eisentraut (#17)
Re: pg_rewind in contrib

On 01/06/2015 03:39 PM, Peter Eisentraut wrote:

I applaud the ingenuity on all levels in this patch. But it seems to me
that there is way too much backend knowledge encoded and/or duplicated
in a front-end program.

If this ends up shipping, it's going to be a massively popular tool. I
see it as a companion to pg_basebackup. So it should sort of work the
same way. One problem is that it doesn't use the replication protocol,
so the setup is going to be inconsistent with pg_basebackup. Maybe the
replication protocol could be extended to provide the required data.
Maybe something as simple as "give me this file" would work.

That might lose the local copy mode, but how important is that?
pg_basebackup doesn't have that mode. In any case, the documentation
doesn't explain this distinction. The option documentation is a bit
short in any case, but it's not clear that you can choose between local
and remote mode.

The test suite should probably be reimplemented in Perl. (I might be
able to help.) Again, ingenious, but it's very hard to follow the
sequence of what is being tested. And some Windows person is going to
complain. ;-)

Also, since you have been maintaining this tool for a while, what is the
effort for maintaining it from version to version?

I also think it's a great idea. But I think we should consider the name
carefully. pg_resync might be a better name. Strictly, you might not be
quite rewinding, AIUI.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@2ndquadrant.com
In reply to: Peter Eisentraut (#17)
Re: pg_rewind in contrib

On 2015-01-06 15:39:29 -0500, Peter Eisentraut wrote:

I applaud the ingenuity on all levels in this patch. But it seems to me
that there is way too much backend knowledge encoded and/or duplicated
in a front-end program.

Hm. There's really not that much in the current version anymore? Sure,
there's some xlog record specific knowledge, some about the whole data
directory layout and a bunch of timeline specific stuff. But that's not
that much.

Don't get me wrong - I personally think this shouldn't be in contrib but
in bin. The amount of prerequisite work to allow this to be
maintainable (2c03216d, 2076db2, ...) is a hint of how closely this is
linked and how much effort core community people have put into this. I
don't think contrib/ is the right place for that. Even though we haven't
found something we can agree on wrt moving other stuff (apprently at
least?) from contrib, we can still place new stuff in src/bin instead of
contrib.

It wouldn't hurt if we could share SimpleXLogPageRead() between
pg_xlogdump and pg_rewind as the differences are more or less
superficial, but that seems simple enough to achieve by putting a
frontend variant in xlogreader.c/h.

If this ends up shipping, it's going to be a massively popular tool. I
see it as a companion to pg_basebackup. So it should sort of work the
same way. One problem is that it doesn't use the replication protocol,
so the setup is going to be inconsistent with pg_basebackup. Maybe the
replication protocol could be extended to provide the required data.

I'm not particularly bothered by the requirement of also requiring a
normal, not replication, connection. In most cases that'll already be
allowed.

Maybe something as simple as "give me this file" would work.

Well, looking at libpq_fetch.c it seems there'd be a bit more required.

Not having to create a temporary table on the other side would be nice -
afaics there's otherwise not much stopping this from working against a
standby?

That might lose the local copy mode, but how important is that?
pg_basebackup doesn't have that mode.

But we have plain pg_start/stop_backup for that case. That alternative
doesn't really exist here.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Andres Freund (#20)
Re: pg_rewind in contrib

On 01/07/2015 02:17 AM, Andres Freund wrote:

On 2015-01-06 15:39:29 -0500, Peter Eisentraut wrote:
It wouldn't hurt if we could share SimpleXLogPageRead() between
pg_xlogdump and pg_rewind as the differences are more or less
superficial, but that seems simple enough to achieve by putting a
frontend variant in xlogreader.c/h.

It's not that much code. And although they're almost identical now, it's
not clear that pg_rewind and pg_xlogdump would want the same behaviour
in the future. pg_rewind might want to read files from a WAL archive,
for example. (Not that I have any plans for that right now)

If this ends up shipping, it's going to be a massively popular tool. I
see it as a companion to pg_basebackup. So it should sort of work the
same way. One problem is that it doesn't use the replication protocol,
so the setup is going to be inconsistent with pg_basebackup. Maybe the
replication protocol could be extended to provide the required data.

I'm not particularly bothered by the requirement of also requiring a
normal, not replication, connection. In most cases that'll already be
allowed.

Yeah, it's not a big problem in practice, but it sure would be nice for
usability if it wasn't required. One less thing to document and prepare for.

Maybe something as simple as "give me this file" would work.

Well, looking at libpq_fetch.c it seems there'd be a bit more required.

Not having to create a temporary table on the other side would be nice -
afaics there's otherwise not much stopping this from working against a
standby?

Hmm, that could be done. The temporary table is convenient, but it could
be refactored to work without it.

That might lose the local copy mode, but how important is that?
pg_basebackup doesn't have that mode.

But we have plain pg_start/stop_backup for that case. That alternative
doesn't really exist here.

Also, the local mode works when the server is shut down. The
pg_basebackup analogue of that is just "cp -a $PGDATA backup".
- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Andrew Dunstan (#19)
Re: pg_rewind in contrib

On 01/07/2015 01:54 AM, Andrew Dunstan wrote:

I also think it's a great idea. But I think we should consider the name
carefully. pg_resync might be a better name. Strictly, you might not be
quite rewinding, AIUI.

pg_resync sounds too generic. It's true that if the source server has
changes of its own, then it's more of a sideways movement than
rewinding, but I think it's nevertheless a good name.

It does always rewind the control file, so that after startup, WAL
replay begins from the last common point in history between the servers.
WAL replay will catch up with the source server, which might be ahead of
last common point, but strictly speaking pg_rewind is not involved at
that point anymore.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Peter Eisentraut (#17)
Re: pg_rewind in contrib

On 01/06/2015 10:39 PM, Peter Eisentraut wrote:

If this ends up shipping, it's going to be a massively popular tool. I
see it as a companion to pg_basebackup. So it should sort of work the
same way. One problem is that it doesn't use the replication protocol,
so the setup is going to be inconsistent with pg_basebackup. Maybe the
replication protocol could be extended to provide the required data.
Maybe something as simple as "give me this file" would work.

Yeah, that would be nice. But I think we can live with it as it is for
now, and add that later.

That might lose the local copy mode, but how important is that?
pg_basebackup doesn't have that mode. In any case, the documentation
doesn't explain this distinction. The option documentation is a bit
short in any case, but it's not clear that you can choose between local
and remote mode.

Changing the libpq mode to use additional replication protocol commands
would be a localized change to libpq_fetch.c. No need to touch the local
copy mode.

The test suite should probably be reimplemented in Perl. (I might be
able to help.) Again, ingenious, but it's very hard to follow the
sequence of what is being tested. And some Windows person is going to
complain. ;-)

Yeah, totally agreed.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Andrew Dunstan
andrew@dunslane.net
In reply to: Heikki Linnakangas (#22)
Re: pg_rewind in contrib

On 01/07/2015 03:22 AM, Heikki Linnakangas wrote:

On 01/07/2015 01:54 AM, Andrew Dunstan wrote:

I also think it's a great idea. But I think we should consider the name
carefully. pg_resync might be a better name. Strictly, you might not be
quite rewinding, AIUI.

pg_resync sounds too generic. It's true that if the source server has
changes of its own, then it's more of a sideways movement than
rewinding, but I think it's nevertheless a good name.

It does always rewind the control file, so that after startup, WAL
replay begins from the last common point in history between the
servers. WAL replay will catch up with the source server, which might
be ahead of last common point, but strictly speaking pg_rewind is not
involved at that point anymore.

I understand, but I think "pg_rewind" is likely to be misleading to many
users who will say "but I don't want just to rewind".

I'm not wedded to the name I suggested, but I think we should look at
possible alternative names. We do have experience of misleading names
causing confusion (e.g. "initdb").

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Peter Eisentraut (#17)
1 attachment(s)
Re: pg_rewind in contrib

On 01/06/2015 10:39 PM, Peter Eisentraut wrote:

The test suite should probably be reimplemented in Perl. (I might be
able to help.) Again, ingenious, but it's very hard to follow the
sequence of what is being tested. And some Windows person is going to
complain. ;-)

I took a shot at rewriting the tests in perl. It works, but I would
appreciate help in reviewing and fixing any bad perl I may have written.
I have not added new tests or changed the things they test, this is just
a straightforward translation from the mix of bash scripts and
pg_regress to perl.

- Heikki

Attachments:

pg_rewind-contrib-3.patch.gzapplication/gzip; name=pg_rewind-contrib-3.patch.gzDownload
��9�Tpg_rewind-contrib-3.patch�;m{�������R'@@`�4I��[b�:�/8��m���
t-$E��4�����J+!����n�������.����0f^���i�M{��k�z>g���/p�-��������nw���^�]���{����apv��v�_~a����
�/�N��9>O��}�a�Z43�K���m�s^Smn�yY6_XQ60����0�M��
�fs��
��ZO��/x�&��'��&����Nw2��.tx� �uNUu�|�������Y��{���}�a��w��=�������XW	����c�~�'`�N�{��J��<
�,c/�������v��g1�w��6N8��N�y��������Ed������%�]�MRl
�)5$K?M��&,�{��B���#��������ddA%��I>fH�a��9���Ga����<eM�����?����.�$�'�}�N�pj����p?�P��I.#���<y{z6:M�@����`1��q&�Y}�-�$�1Kx|_V��D������������	���d<|M�o�9�i-��EV�px��"d������R+�f�k��z�{�GT4����m`��^|o}���}4�m<:�&��S+�l�4~���K��AO������d`����KRd���eb�=�(j]^�=����nS`�����-pv�;UW4��8m��X{ X�r�v|q~5:?�)����>\�����hxN���N������������zf����;m�u���OF�������������7W��a��e������w6���f�	`���������T8YN�v�!�;�4�L�
bD���vu@z�\���;#�.���M~btwq
X`
�x�=�	;(�����}
z�m�%������=���p��/��S�xvn�wf]M��Y����f��
3�z����
$�-*MFa�V&i������x��I*�������0����d������#sbMD�du�\����`z���e��6���S���N45�K����>�wXy��y��:�X&�$)�-P+�f��m���uB	OY<��9}?\�t��E���9��2_�*�5�/���`?!�#pp�6z�S� �^8��lA��@7b�N�Jk����]���+.q�YQ�!I�m��lf�Hk����D<�/����
�_
4�Cp�o�c/������4��-f)	N��PD��\�r�\�
�(X���s+E�hiX�6<�9#�J�`x��;H4������H
��I\�O=�)vi�@<���-y�qW�R����O�^�l
�y�����
�9 >��Q����1<8n
QA��J.RS�=��
��D�������(#��r�h��P�F��������Z,�F	VI�l@�	8��_m��iF�g_����
R+QTD�������6EF��;�]
��_��F���Z�`vrlj`�6_�����{�b��&L�I�MD�p[Z�m�w�Y*�������4����w���|��l!��J�	��Ar'��$r�C�W��aF�i>�������w�$����0�|�7\�������00��%�^�a��{�oeG���
o�O
���~}��_���y;mV��BS��������lP�(�h�BS��LF|-�"��{��������L�M�n�I������;��Z��a�au7�a����A=���&��/Q�1^����%"!}�Y8c"���i��~W��k�R]�:&+��000�J���)��� �IW��]�����__.Op�����%�}0k�>-Z�A�_JRV`�D�xx5|3���R�#R�D�X�/�!�������sk�F-�06����:-��p.
|�mf8�����:B�T��D}U������
���fY)���]|���]@��b��C��1w�a���:Q����qV��%��d��SH��B>�"0�)����1���
T��DP�H]�=Q�C��������]��8���/
�W�L`�
��iV/���;�5c��p�Y���t���n�S ~�NU���+��@$����l��Y�f�����$3��C��h���j�qx��Vo)4�`F���x�������)��w�.���X5�UK�J�@�>��|����<Ls���nM�L4�0`�b�5�������A��L���L��z�9����<l��{#W���;m|���\�Z�{�`O"��ik��	�2
o,{�\`��
���@��C���r�D������	$����K
�	���v`)'D2zV���W�/��=P:�
|�CP��
��(-Hk�n���_v���~���{�{��Wj�"k���L;[��S:������6t�e��O����;�D���\�vB4/WF-���$�)����(O�����yB�2 �U~73P_����f���Q����\�i�+^�JX�HU�������\�GS��2���d����,�I���Vc��a�m���!�%p=���
�.�)8������p�(9X�
�SA2APf<5U�p��-\��k��|���2#�@�U�q�VA�<�:s�r�j1�����r�K�]y�&����Yl������%��j%sJ��X�^F]�*��+ H�L{�u�	/��8�E����9f�� �+L}#��Ms�ej����'B�5|�m_��R]$����U6�K�*��TX��HF��y�����qaY!R�>�`��INpc�\h�[��0�E.��-���m��	�3��/]��f+ �	�Z�H"!�Ll�'c/�k1%��3���g�>v�+Z<AP|�Em����`������
]BB�����].+�[y9%YF�u%��
��B3��b��EQ�5a�:���y*C�����ur��)����,-Uc���.��B�s��J��P7/������������(�����w|�UBE��b"��f��x�DUO���8�c|�F���<tot�����!���R���p�
���<��!�7�<����w�XU��h�%t�Ad�1�����T��x�����W���I��4�T�+3g�Y�@	d���\�J�1<7b�1GV�9����	0`���
�g�~��c�L;8f��r���"�o��(0��M2X���_��$�nn�CJ����@^�����,�
� 
�h��2���
����6%�����(,�����L�O��+<�{�oumx���+K�5�R��E��l�E^��Et���6���Ew�c��[YWY�x����+E���A���`m���*��G>��$�V\�
��(��=r���T���������hF�X�P&R��4��i��N%?�/�+�\K������`i���Pv:_����(M�|}�\X��`W���:��~����[�
�k��?�zA���g�����f��+�	
������8p���]j�^
*��%�X�8������8#�����@��5��4
tw�u���"Tk�����3�h/]G<����x����<
����~�mxI�����5��D�.o�Xj��s�����b�	D#9`�oB�{'�z�x�������L�F
J(T�Y9QSYj3���������R�M�]S��=����mBRe�N��WM���#5�Lk���"��(������7f	�!��|���P���Bw�%P������J
^����~CmW���U�x��.	I�B�v��\AY�jYyP*���=c���6�q�1q]J"b����_�%4[����Y� m_5~�)��R@BYi	I��w
��
���PA���}{v��zc
L��oaO�����R���U��~7�pyr9���"B��w���tT�6�S;,������$����q9��"����C���5}������-���a��,Xp��`5y��O�29���	�Z>�q�1�+Nq�M���H����6�Zv����o"F�eRU�&[-�]	E�g(=��\�����J�Vn"� ����7S�����g���'Qw�(5�����,��^���D��7}�����Dk���-��\�����++�����[)x�\�wQ�a-_�}5Ts��������k+S�3��&���2-���.&��t��K%+QF�S�
���4A�o�3AT�T��@Y�
��`�	��6�F�`P�N����"Yx�g�����#2z/�/��%e����j���A�RNA��;9���*�����br���2�q?����51O'��I��+�>-)���J�x����x92a����������7�!��)�`~7a��3:����=0�1�20����a�5������}�4F�SB����;Q&������V�� 7��@�����"�5
(L����b�K�� �3�b��H�[��l����MHTY���^���U:&w���{F)9�%����<r�6���k�! P��R,���!��`VE�]X�/n����N�%a�IQ���>�0
4U2H�"�G�t�'����P���
�0���������L^����$��2nK'�[�z�b�=N��]j�=ov�E�ix�IRo��<l��:J�/+�sd�a�po��a�0��\�x
�:P������&!c�.�@�#��C��n�w�d� G~S�� ��=���Y���r�k�����!����i
��@��G���D!Sv����3���5Q�g�o������Ox<��W�tyb�9=�?�PO�T�����������B��S�(GJ\Q�{����b�6���JN���\es><,~��i������r4@ M1�=h��'�������L��v�~)'r,�(<<�"���`JB�1]���

��7�*�
�9]�
��h;�����:j��h��9]�ro���#�?��cF�n�������<��Gj�N�]��W�>*����T�A��"��4�n� ������X{�4*�=gF�G�a�;gy�����
���������_e��}M��3�2|�C2�37�t�N�������4H��4 m�'���7��l�����(h�x��C�g ��9;�u�?�y}�Y�P �s�����3K-��,�`D�RDe_3P��X��\�������c+��B�XN�VJ������r9V[!�7J�V����~�I�n�=�Ra�!�Hnq��g���5.�������2��������K���������e���;���qX��6ct��	��d��F:6w��
�G��'Q���L��\^��'���!O�X��U�#��Le����.:<�
O��C�H<���T��J&�+;,�(H!��K�����y/
C��$h�x��rq���y7����n��L�MD�	t�~���}i8��~��������y����By���7[�(>Q6�H[��������]Q��E��P�+VM*��zAXMw���7�Gf�B��n'���w��������WVKK��.:+�r�A��IW��k��	��w�%L<6bx�U3t���	����������KJi=��x;�=���{B��T����{�<�f�mV[khh�3��tG��w������Sjf�^�{��k	$���u����U�;3,��N/���������r`A�	�Pc����jS0^���������R��4�8!�A\D�
�������P�7�R��:L��V+.��"H�y5<=���* h)��Z���\S`EZ�LP,����������`5������b�������J�fH�&x��(�hh6�W�gQ���=����b�UP�y[R�K�di�
�:V}<Y���W���r�� ��j!o�����������*�+8��`����ook����>E��F!n��&C;<c$�Y'��FjA�%���L�I���unU���u�v&������S�S���wZ1#���'��O��ki�C�M�c�x���=Ln�]s�������-�``2����/�E%�t���P�a�>��|6E#�}t�����R�_u����Li�#$M�y\��Y�����Q�W����<���84Z��5�������1�!�HkG��yr���~�}���S��^���_{8�=5�ss��9���������S��c^����\ ?���t6���G�ae����`@��HC���#������
�~j��q�Bm����=x����_��c�YM�����F��� �;�t�������I�-"Zp��gvS����7�=�pg�9$���qe��
Rs���@k$��M�{p�DI�@���$����3k`H�7��<�?������_��^�7��R
�[��#>�m�H���d��vDA��l��f�#��bm\7��@�L��D�����<����]|�G���5������.�e ����HK}��1p�6,�01����= ����~����P��$�:QP]f�M����H)yt�t�u@y������0��	����|}�r�_��Y�a\#:��8���+����0"�%��L����>����Z!�(�<C���L)�F�K�]������W��Z�2-�R;����=�TJ��J���d��!�����?���K���*����������M��4k�x',���� 3���s2k��P�:M�dE���6��P��k����,bHcA��U���
�9l�A�6
8���:Wa�yP�H�d��-4���<>������H���4�t��XV}�I���A���"�O����������L@������_���W@�[������9��`{�%�st�B����V��!��=j��9����v�a���_�#Q�[m���p�i���S�!�D*�H��$�jD:�c�d��2��A� {�2mH$��Z
��1�V��v(���W��<�O��c�����k=z���}��vK�]��!p���d0�o��8�5�����K����:bU��3\�])���-��������n�Y�m�����E,4�2��C'i�pkX�0�z��d����}3e��e �Y��u7OL���V��e�i�����JY��l6l#E�
�
��m��xQ$( ��<�K��]�b��a%�����U�@,,x/�1��?���PLq��)��l���29]4�e�;4��R����?�'��;^��%Hz�&�F�	��K�h�o���Q�^E8����`m3���p��>a�� �w���"��B[i�cn����T���D� ZH����}���e�
�8�
��H���o�rRGpM�����c��	�/s<�9U���7I	/��z��5����'!~����cL�,�E��;�����R~��":�}>;}��i���j��]g����L.���}v��<2��8�0cz�X�c�������]�
A����l���%z�me~*o�%����8	��f���=��Z�I�FN'�D�!	���,����o�i��N��52L��^�
�^�e�A���������v����Pa�a�z)���?�-&bq���CJ��*E��$�nw6��_l>@J��R���gS���T2�'m'��w���@������7�7�	�;|����O��u#= �'`Z�������2p��t��&��I�NB#�����"-o��Z]�����MBY5�%��`���$�������u1>�����f��wK������#�~��M�^%K����2�%a���a�N�9�01���H�_u��jF>S����g�0�00���a�o��k��q�D�($9������:������
����Hrx[��e�+]��Rf�R����*,w"#8z�H�:I-��>�1c>QQ�a�(&C5��B",����'�(#�i�����#y��s��k<���'�"���(4�O�!���)��SI���\��Xuc���6zP���'��9����W��JP��bN���szo'8��4�`I�"���&��T0
)�^��Q�jL����1�(�NX;�n9p�*3D��������p�-A=��=���d�$�?�*�>�p��&_o?���|��'	����6>����V�|0�
f��kPr��G2+b��({z�;6O����(��T��M��l�6��o	�(�+�f��:t�(����B
���>��s�^_~��
��NUP��\�:��.�J��S�M��I-#s9G���g����g@��8��@��Yy�89�����&q�88+�#�cQ�����C3<+,�[����Te��I���*"��L0Gf���u������3�QPc^>�#�]�<�2�^8�3DL��~3_<}���4*�uC�������4V^{Z������
U���W��M�D�"��L>�7��	>���L�5(�4�K}f&;�
��l�X��$]<{qw�������!��oV�t�)��3�~�}�=&7�o�;R��X��%&���,Q�u�������,t����[�_$]���`�j�GM��r��������u�
fHlJ�����q�R��1��)69�e��L��j�6����!s?���E�yf����� 3�?(�%[R�1���A��A�7�l^��<eC,��9~�����[_h���J���s���[sa�S�������"z nA��eX�	!��<T�A�����otb�q]bh�'��;Me�Q���4Mp>��(��F#c�W���/�1�s�J��--T��4�0Q����r�J�W�_�
�;��e
�y�$=�) ![�@M$��\��b�R(�����W�[�h�3��[��������ZbB��.(_�@���RHD�E<�"d���u��}p������/���&_�O*�6�5����m�z�u���=m<u=��������&�����	���:D���b ?
Q�N,2`8(�����K��xu�����|~q���'4��y����O+(�Q�1�;���y����,{����>�`��By����� 8�����
�<@���=<����]�n��O%:�9R]�a!����=�������U`.��<!-jFweR�`{�U|����/U��~��6�%��%�`W�?J���������6&�e�����x �.>$�F��e��EZ:GL)��(�0*�W�HC#��vN]�T@}�#����^ U�{a�Z�o����r�>��#����rw�/�q'���r���CmK�#gb?���G���P{�%C
P��z�i���2�o����9��fa����)�U�����P �-�RK��g�7/������D?#	6$%��(kK��o����i�������\a��m#
2.55&��	H�'��5"�36�5Kc[�Z�Z"I��`W��������S��aA`J���HP+��7<Z��8�'�@�����R�'A�D��v����bP��x�t�`�5�JD#g9�E�oUL�d��["�}=��r{%��S��m�B8e�	K�<�A��1H�������u�P��%�X��$��v%�����C�
>0}�Mn�}�W�(�hGA��^����@)�/��8�:71���j(��A�s����D�f>�lZy���I#��v�������������YP%��[o���eS���uu�NbL`�����&��^�T
{��Pl7q7`��%2m�X4-��:���A�^����["\��,9B�P�E6<��w�j��	U�y����x��m^%5���2����V
L1Z�a��uG��i<x���SX
�����!'pBf&2�$��������~������V�TqH��Tw%ww��	}`$������^�^�jC=���?�7%��z�B��}q"���oU�<��C���
8��ac&��'Hsv�_��<�T\Vh�!��m���Z�:C�T_v*5���S1�kvEZ��R�H��"�>z�X�X� �����LY�%O,?�������{��r��E������L��t���(����z8��b�U��>�&����'�E��y��������n�J�0���i	^zZ��K�]��A����#���ag�R�������x����hm���AwJa�t�_��[����a��ZUn-�"����|�k @U�����5�(�<z��?Qk�J��)���`�w��#��	�3���$�nkO�L=�w�x����U�]|?8���dfOD�G���l��FQ)pa�q�O?�+���y^���J�������^q�O=����s���^)��;��8�yD��T�2�/�����Ll����V	����t�1	��Qb�ZQ��*��������RP��#�@�c�7���9�9~�4�i�i'���8B�_TL!��$����!����!�1)H�i��^eC	}0�������]X��[1�c����i�R�Y��B@	�����i?On��jV��A��������\�W7�P1��o.�_7/O�7��W�x��L1�= g�����#fG+ZyS������d..�������ExX�an��o�s�<6�.�|&u���<H��G`�N���z�=m�.��A �]�Z�~�)�<MfM���M�������S4bES����O�u��4��I�����JE��X���Q���TZ^(m�����SM!?%]�Ss:�k��T��L�u���u��	2VD�*LUz�|lv��c����X��=`�g
8�Zq���qG	����P:���t��^XPSBKM�U���~x\�����	�"���xHi2�	�KH�����;��� 86�W�I�+%�WtAyZ��V��2b���S|s!�h�
�ueB���us
>�!�8x���L���K��H���9����wDyu�O��"	d��?����vB���:��g����a~���~e
�V�!���Q/�g+�2��1���/EQ���F I�3�eY\����Z�/��X���y8�<�[�'K����1 ��IEa�<�����!����U����~��*>���|@����j'�x��E�}0Y��vC�c<�d��3o�pW�f!�)g"�"���>*��%�l��n�����<�=�W����}�����D�M�P�/�Pdt ��Gy���� Fd�Q��@)�J��T1H�[�l"o�)�����~��^���	�[S��)5��"�������gnI[V)rH�XF���QnX����N�6Ai�X�(���P.�H�[��Nf��;Ps�Nw�
��x�u�e�m38-D5�^Y0�����#�3��'�a��G��T�J��MD��G�����o��������|���3���@;��=�}�o���(P�(�#����m�'.�}h���$g�����b��sz���u�����A�����!�"@�D������nW�k#&j�0��[��y���	�jK�/�1������|����/��7=�z����W������V��zZ�����������-�muK�����zq/A7��^�������~����:>=:>rRC�����q1��^~�nsc�s�QT�[��������S��P�y'4H��'i�e� ��t�lZt8������*�Y��8"�}*r�J������{'z�
�G�����H�	C�A��PK��\��9�g
�(;�;1�>j�0�t%����
_W�(�����a��v�_}�������9K���.�06m���|��<��j8V�F���xJ��1�q�u�Gf���4��;���v�O2��e�*_��N_ ��������:����m������h�L�u�u�����H��;��i;A�/H�g���O�mfy���r�������)���p]��:��]eFF�M��������a��8�j��&/��a�"�'p^���P_u*@����-��:5"J=K�,[���Kz���k�F�;���<n�9#�>��E@l%�}�J	�;hs�Xv��,f�-/O�����MCH�){z���
8y;S��7���:�9C=H�o^|>H[�_XVG*�}<����X�Nl'�-��}�8��/)n$$��y�Hr��p\n2tYA^���
a�V� �D����_�����X+V��
�����t=(��X�9R�	���1������b�-��r��#�LB��$1����N�����Fu�je�P��$�I��6�r�ws��Go)T��-���&�e��2���D@@��g6y����G�j���='D���[]����S�k���:�**����H��*5�~(
�j�#Nu���������Z���6�����r��U ���d�TnW�*Q�Y�@�aU\��Xv��)wA����]�QYR�"����X���Mz���K{����86�(�K�
���G8��4gq�9eo��9D�j�{f�F$���Q�0�%���t.~0QP�^x��S���Z��j!<a�l(�z�WX�������p��umXd�)���R�|��R�Co�Al��*~���3 �t��B,�(��"��YS�-�[������7�g�g�����yj
	���r��_��kW�l��V*�xm�c0���(Q'�2��p8�����\�r!��'������<MZ���B	����v�$"��I����^�����7���|6k�
�����m|���u����:=$��:(�,Y2��w��u�7���-�R���X�����N�?���\������D����/V�p�@��K�������N�S���[�j%��:��������0*����L�S�X��N���1T��5i����@����>:���������L)��h���U���@��[�>����������������(�9����]���u2�{f(���F����P`�1r��k�SOk��j��Ec��E�c��x��^�Kh�����.����K��A���5�fD��|���5�<2����Y^���C�>��wf�usL>��XF��5����7�?�����ox�A&t6`�~#v�KJQr6��w�i
Z�����Q�h��p6y�{�J"�g}j���\�'��b(����G��N�d��w���/i�^F��e)�"������kA��c�+O$^���\5��a��k��%/8Xk+.Y���X��$2��������E��lD��M��%���D���7I?B4w��.����a;���}��#��
Ow��[�}���O:e�� �����o�0�t�n+Y*e�=nw����S!�V�<�&�d�0��gp���"Q��;��3�.vS���d���3"�+�&}���V���
�����yh���t��q�����y��9 �����Rn�������p��G?��^q���8|;�
!��s�0=�w���������,�b��\Ul�QY�0C)1{�n��a���w����4v4F(4�w���F8�f���
�e,v]I<���z�(Kk%�7yxEDL"�J>��`����z��CTJ>�f�m���"��w��G��4n4�by����������c�����L�5]Wex�Qp9
�j/�}���:N�y�Z]�G�e*q(���Q p
�LJ����c�3���P
 ��V^$�6�`r��A�"��4��P�8�q|�11�i���a��6+�v����{ds}sk��",��g�9'Wc�9	�D��
�P� ������/U��������x�r�3�7E����4�����M8cqmX����I�
��V2}3��r�����\��12�J
aV��;�^T����������Dm^��j��8�<da�9�DQ�1Ky�9H�!��Y���'�����9);F��:R/ON_����C#������`���q�(R�b�m^�K�y�>���j��9i��]�;��]yzo>�N��M=�3�x:N���hbi_�:�6�%W�?]��Q3��M|{rz�������_W�����AK����Q%�38��cn���6�����
�zH6����f���M(Q���|�\$�A��>$X����W[�
k��7QZx���%{���m���	?���|���KeT�����?���
Z(~g�G�7�1�n�5y�^��Z6H[�?�tE�
']��=����9�����
��
�������z����~�H��K��Ws���C%yj!�=z�����_��]���������+�}�r%X��R%���K�����Im���Z-(7z���|��j\#����#�6�+U?���/�,-7hO&\r�9��n^�yxO#�������M�U]��'^�5���.^���2�-}�{zx�W��������\���]��q*�\�)���;Q�J�$<���+0�*�=�C���	5�9���q%�.������������=�?M�2JU��I�8ln��M��l����4)���'����2�����uL�tS
k�)���#��i�K����j	N�����z,����R'�N��>�5�R}/�	�ND�s�D����~In��'f�!s��d0�����k9�9/����J�;*^_a?�SJ�����.[ i�V�J��*���YAJ���$H����d���Wbv*9�<K�x��"�$1E9�r��R������w%w��>S|����(>.h�27s�  O�i��*+�
�X��H���Vv?������������-#r������S�h`�2�wZ0���P��;������l�O�U�SCH������+8���@6A���T���BR�+
h98���\���%ff�N�2��Vo
Fx^y7�!��,�BJS4y�<�� �zrG|p��U���,P��2[�5P��x��J�Yg���~��3���3�����u�p�=�ibn��uvU�Lg�.�<��l���H���9/~���NQ!I�@k�V���m����
}�t24J��V�q���XAF.
����d=��"�(�I��O�}�[I�r����������qA�k����2�?!oH�0)7[g�J��D�p�U.G�l�v�����������Q���\���z?[������9kN7M]�Mn?�|1N��2�IX����	J���[�s�Mv��spO����0�t0������7�xp��,����a6��L!��uoL���?�S������������s�r�8���S��6�]�St)�-�V:N*'�R��-0N�{7����5��T��8L�����IB�	7A�������5�������k����4����qlB�Af~���5�|C��C@�C� ���=:�(�8��!�7m������������/j�}���a����$o�)�D��nx�p4�[�$��+���V]�@��xw���������o@��������|/TH�8���'�NV������X����2���U~	.�����B]
�:Ed�B��B������������q���_�������##�v�>�G��W�)�����q��RS���h�~����J�a�������������w���3t~_�9��Vv�m+� ��s�0�;
]�"\m(�a������������h��r#zK_�;��(�1*���m����b0�r�m��\ ��Q�4#�Ih��������6��eG����}Z:������F��7�A�f P������F���P^�fp/��W�y�t�Ew������h��V�D=s�SWkjmzC�D�� 8�Bp��n���0�v����O��Q�6&#�rI���~dM(���Ql+�o/y})I�T���9Hp=7n�a�e�O��L�v�2��E��TP�"��z���dy��zz"��/l�b���';?�|�^� �|��#T���
��|�(���$Ef#T����lL��m������������pu�c�o��}�Oxv��
����O��Y��k����K��Aa],�O��3�������1��bm������S���5n����'RC�i6���I�w��*7�&�������Ffm������x�^����T*a�q��'-��f�o�����4�g�}��w1�lCB������I���
D�f`�O��=���m"%aAt4��K�3�Nf��%
��L:U'B������������KBj��*��R��R���+�G��a`K������k�����
�����
��%��2d*[�n$�U��O���*zlV���&�N���z��_��h�����8���7Y�����hG=����#��?�{�Ro��F���������<A;���%^!����l/8{v�L`�Z���������-2����^�����^�I1��VT���>g��6�(A�i�3��L5${���2��`�S���c��!�0��lJv���I������,�BZ��Q(�FO�����E����Y���������lO���:r����3+��m7-��b�5�%m�����~~vrz���������B`TU���7'���;'�Fh�e�����7u�}t>P��N
���?����t�H���#���H��V�������JyF%Z���30h1[jFBo���6���!����<3H�oDb���:�H���� ���*�����]���)sxg*}��	�\�	�
������3��:5~E�����.�����.����W��i�%��w���*3ek��D���Wr����z�e�P�	���y��S�@y^'�"F�N��8��N�\4z*���Z������n����V�>���=��I����*r��r��!�+��VaEKgc#�"��O����^��=~s|��c8��b��l���k� /���<����Y
+��[Bv{tr-Q�9K����N!��L����8�������������,���.s�����:��H�9�4��o>�Uw0"|"��	..S%�
qK�s�X�lR�����9N��$�s�Es�J���zty|��������obXT�^2fL��A���98@;����:z
����l���j2O�f�/_��a���6���y������jD��C,DP+�>�<5].k�������M���X�~��4�H`E)�y��t�6#g9OPD��}���(P�zT�)2i���fS�]�c��x��n����Hp���(n}wpyL�f$���W��S��l6Q��`�^��6f/����l���.f��3r���0K~zJn���R����o���@6���A189�8%$Ga�j�,�	����K�MK7��t>0�w���?��9���1`r��]����g�>I����4�^�+�tr8>�d������3

J)F'B.K7��?><��������Z�4V��@ZY��C��A������NL�&`�s'�M����p���$�3�L$.s��s�K�5|�p\�}}Q>��j�P�� ��ij�P,������p�u�e�B	SS��BftK	�1�
������`u�y���\��0W�C�N-�<�a�Cq�l�,�h�Y�� �B�u���z�
����mN������l��x�����k.tuvt�}7y�}� �����a��kx��	����8".��O�4/��48?*;NE��	���r��������=:���M�fO���3�{�1FHI��x�[������
�C��� �(uYI��Y�k|�
l[]V�t7���3G�9�At���k������Z:���	X{x^jo)��sA�-�U6�C\$r�����������-��@�X]rWx)R���a��U|Su��[�����"9*�D�J� �w�4f�$�\��V���{?���W��6�ufu9T����W�6?�����\/����a����������������s���:,q����gO�Fr"�y��,p����B��,����S
;Y/Y�$	]�.�]���B������ZM�.Kr�w���o���[������5S}����W�k�7I>:����d�7������h-�o�+K�����\��	�N��C�t��=�(�����dy�7��,�+^�z �~����o���������,�R.[�Bw��K1*����F����Yx�Z���cw�W{x�4�^�W'�TRB�,y��x32��%
�1�:�2�@6s���}
�T���Z��s?22E���Be~i4&~s6@��#�<�����9�h8�70�/-NV�x�?��`4�AD���
�h����4�5�f�qL'�F��K@P�����.w�M����)�$�c�ag0�t��)����%�f�\�E1O�(j������0� �
p�7���BV���<XB�[�r�t4(R�t�&|��H�y�z|����M�kw
~�M��_��lx�����lj��ipC"� [�L���T���������j���;��^X��4�l�>�k��PqC�
��.�Q�B��_��z���v��#����'}�|Aj�[�O`E��m�uSQ������(�A�qb8�!��VaI��DY�=m�>;m���1�L|�<������s�Gb�
>�nyi2�Jf���~@F���kA���MM�J��X������YU�)?k�&���O/�y$�%�#V��h�
�3 )�����D�q�#�XC�o����*`{D�*���[�i��K������
��������#|��PA����T���:
��kN��j�2�-������Ks�U4���l�d�[#f-0��|���y��I�uM���.���0����
0�fm���N`9a�����z�|�1e����x�p];RW������M���l_���U���w`a�AN�@R�1�������f�<����A��@,(P�B�4�Bw+��G��-���z������/�mt	&�@��4l<���������8�,fK@��!��p�|��X
�����������2�G/Q���'�!������&�����Q{d��yW��@��I(��s��^��%����a��
�c�a7������A���1���$w������?��x8��������N;%���'��xiR����*�����!�q���G�6�<H�����l�{�`�N:�RB&6&rB64����-0�� m���<��j8?9���u�������To7ft�������������5�#�&�=�8q���'�zW���6�#�=m�?i��q��x�}�I�2��Z�����:�?���,����Fk�,~D��-�'�k��F�h��'�����(#7����lRz����\~Kn�$����+|�� ������d��r�~�\�~�n���3P��X��t#C2���8������~�:�)`X���[X������&��E2�����$�u�.���#D�DTDC~;���>��2<�.`S������[�^mv�\�h��������t��cB4��[5tz"�uS\4��[!�MXW�ZyE���pj�*7��m8��b`Ox�Un���<��:�i;�2���}��V�K�@��J��u�.�.�.�F����i�����!h��.�eBC��ZxL!�����m�6N��c���R�����vc2�&a�3���|�W8��SdP!���tJ��Y�0{�Ya�r����6����\$21�O�15��
B��~��Tv�r�9��O?��CZBp���b��L����t��7D�P����]�)T6QqN��7G��t���cZE���+��� �`�@sM��#�y9p�".E:�m���D����$�gF?�a"�rn�$v��-�J�5�+��S=l���WK'�f��
�/��'RH5�'��w��������'?B�����)+G�(����rWDEGL�(�'�\Z���-�-��l�!�����2�}��.,X�IJ�Kq�����������(���?�Z�J%�B�H�tA@e}��y������SbQ��w!jJq�,k��^]��5^���PUI7?��,I2���M3y0�W���7O���0+�	��z�����k��Z\��iJ�L$��I����\�"�:�X��|��AYw������5���`(���KC��#^p�B��1YUH�G����m��i������@�)�
����(��]���N,E�=*�e:��y������5��)z|x�RO�X�����h���K���q����]��c��XEq����y�`D�8r@gR	�V�`8rl�M#:��Y�x��:)
�c������c@�$�aI���.��`����Q���1&�
�������G���C/]��u@��2�5<���'�g���vVM��?KLdYh/e%��V��+��5q��]F��D_��0	�4��"]f�������������\�j��������yq���H��*ELG<����������E���60��N��s7�p>�,���e���zB'�2
RK�6(���+�1���#?s�>gwO<�6}H�W]U�,�h~5��e��Md&8�������I9 �Z�G�����)�)�,��S��gYA���b�u������AE����d2��*�(���j�r���u��w��
Wstn>dK�r��|�����*�,k�1������P4��	�"�D-M�TWv�
���F��cld�.�RO�h�H�	��:�qn\��z�#�~��
\I_s���W\s-��]kfED)�����j/_�����2��&p��]��K�I����8��dk�����lV3����pT�6�������� "�D}���-�'@
W�{/p��z������p[�4�I������8	��Y~PG&���I��Q���~�.�����I��$�R�C�j�'�*�L�-e�M���Y��*6�^���bh%'�C���i�3�z�]g�%C>.B�>��h=4�s8���f�l��������!�h��^�$���rvV������.��`3�s">��m��G���%S��qGa��^�K�H������p�|s����&���\�PFY�"��� -����2�]����k�`�u*i�OM��3��j�����+��k��/x�Y=���!�Z{�[{�����������-��E�����t������J�o��E���8���D������"��S��]�n�@V^R���������_�z�����������������;��~���� 
����c�x��a�$��gN\k;��EG�v�3@�����fM�}�B-�C?�[��J�aiT7[e�,R|^b��_�2q(<���8V�4[����������nk�>X3����������uX3EIz��\=�}�j�-����T^T��# �B��4���|.��{��$W��������#��k����T���7��a�������6�:�;��0��
W�J����`	�JGJ���]}���R�o|�����u�<p9^�,�aUe:Q��{}���P���h���wYr��R\%�6��xR�2�d�$5Rd~��#�6gJz�2o���\�P���.W+/EV^�"�����m���@ag�� ,�9XCY#�����(2CA���0N-VR��;��m��f���w��*"�a�`��b3�It|��h`�+x���6E����M����}i�|���R�^��M����8q���JD�J2gfd�����V��,.�p7��s3kmr������v�#U��X������	?��41CXv�:�9�ho�����E�z���f��;��VG�>���ho������?�>�����1 ���?ON�������bR
�DA����Q��3����Blx�;�
�>bS_��]'�S���L~�X��,Fq����y���.������@El$\�Ur��0��G�����W�Q�\�b�tU��a��I�����r��$�05��E���*�Uv�����������
��:��-r�
�4�Q��766��q���n)7���z�$[�y�6�*������KF�]�q�6���+sy�^�_;;o����G/��8L��\��>��G�y^�pS03��P3���S|"�,e��S��3���`%�	��e2�5YW�
����H�����nm��~������<��n��t�����>9�������
�x�������l�_.��iV����F��c�-�i����Q���K��u��.0��^������]��I�����
�)A���T1��l��]��m��/�7����� �{\���5\���t���-����W�y�l��6�:3r�[ty��Fp.v$0��N�v�b�e�I"L^��^�v�e�F�u4,.������g�g�����l�E����q<�9�@z��s���ks�?k�}6�vJ�������i�����������p����>_�=��@�������S�<�Q�<2�&���>���!�<Bd�� ��]�����6���nZ�\�����#��Z�\*��ak�4���"�PU*u��Zk92�\�9>���go�����R��Z����{_V���*�)�q�6��6Y��,7�l��wv��i*^[����Y���?�y<P��y)O"6�Y���4'��~�����7�o6f*�D����B���`|�!���,x���!#g��<'6�����_�X��b~�����c�Ky[��_��fg�f������Z�d|�fNP�m.������<QXg�*P�[2t!��j��6��V���[,:�M��A�������������Y��(n�����@� ���0���������-:m <��O�}�Y����HY�	��\�����N�S�{���%V��E�&��#��4?���`:���Ff�l����F�]�Fm~4�-Em����"BC��Q
a�Q��Q�Keg}���X��bd'����������/t����s9��u�5��l���Cw��[6����a��$���7t���O]��h�m�>\�?�<��y7W9�-���[�H�a����j���@o�U�V�����'@��������$��p����=�1����������/���!��N������OaO��`V�(��G��w�\�U3���N�A���37A���f��N�;'�w[��V��?�8�44���|�/?�����s}�����RC���k�}�,��%_gP(E�������[����Y�\G\�����|X�H���`M���7>�Q���1�	��������?�� �86����n%��Uw������qDa ���4����<�� u&'����Zw�"�p�=&�sH����t�Q�]���u��7<��� �+�j.�(~�������r��4N0}V+=�M����P�
c����bLXa��^�)�c�L����F��<����H��di�g4�d-t�\��
�9!Nb�k��a�]�����?`!)bC���"T���>�#��#PG�G�A{,:��s �3�h8?&�]WD���D�I�~�0h�$%�Qj�$��(�
N���W]JD�#gVH���<��������������e��s��P���*�WT��Q��X
�CIy*�G@scy�����u���b�-�p�Z��E_`��MN��:�n�_�u��lh�#�Zm����>8x�A_�O��X~��S{����b��7~�H����u(�s�����T����?����M���{q�b���R�����#f�NB��5�
��P������\$4�����&*P8�����`�?�~�U�=)��le���`c �E��[(��l	`1{/#�6��+Px
{Ps�����Z�e����sA":s�x��;ta�;��u��T�8.
���������M��=���9	$"atL�7`-��H:E'	����*�/��Y�0���3�/�6��J��moE�G�H	Gul���cNf�~��b��1d.�[�j:�J����
Z����eV�]�tiu����[���D)�q�%I#�>��9d��Q:X���6��L�$������2�z%��S�cY����-b
���7%�[�eB�)�T�3S��%�.l}��-zF����l�����g��3����#no}�+"W����i����j�
�@��Up���}�c�g��`���D0�3����']W�%��
�Z+$z�G�,O�H�x�HQ �(a�;�d�{5����>~��y��� o;��a���W��hs�m��5���m<`/����W�$�	�6����e*��_�������l��_om=�b�����o~�y7�-m��a6� ]e��[��a?����Q��?\���~�����{�
'��_q�����Te�K���������^W&�?��|�)~����F����H�'[���L�'Q�V�����v�y�
�|mVA6M_��!��bd~<�0�b��!��������_�#�@���HFx��02��@�R:4�K������[q��~��N���06��%����������`t��G��W'W�2L�5��2Is���������9�X���N�����y�wF
����BM�\>YU��y��	�%��;L�Y����}#~K�7�����IQV��B>�^)^_���^��i�����v\��'��T��������9���I�%��K;30���
����1B}�6�3�c/���3�lU~_�����j�%��>�jG�������oJpMO?�[��A2���!�������~H=#�v���#�t�l�k��������u������FK�U��(����zr��#	��=��^f�0S:Y�x0���wR��I��Ik�o�7�=��l����b���g��<!�6�iCJ�Q	s_1�X��`et�wi�"��������Bj�=�5�J@����������������f#��[�����\`s�`^+��,��C���f�sG��f&��~\4���c�&}D��(�[�t@�J�%q<�W1���!L?� ��vn� ���&l����r�7e�&_Pte�ms���H�&�O���*j���C�7�saQ��NH)��W�P����F�HM��'+����!�� ���\\��lf����a��9,P!e�e���6�#z
����������-�G{�$�T�(#@q��)E]�� o�x"�����Q���`q���+��zj�T���Kut�$^�h��B��M^�U����+�[��8XD��a��U�"��W�}��)0��jzO��F@{�x,�|p��[�@���PSjd�}�s���2A��	UZ��u��4|=h�<�q�����:�Ns�p���(�<}�v�>�vb�0Yb�'q���c����iu����A�Y��/"(.4�����Brjz��3����\��r���a����������r�!U���~`�;
C���<#>�����2����G��o8���qW�\
I��W����}���K7�
������5���c	i�o?l�q{���-e��(�H���#T���Gs�k%m{d�b�
�,X�P2�6����H.�z�<J�����8  ����.������<xBq�> m��zC�6Y��3R6^�9c���7��
�a\-���ds�Jb<�3������F+�@���:���X��:�%�C�DDH�8������.a�d\� ��"�+'>LD���2Q8�a�x���8$�e�6m���MC_�^�����)%Bk#�6+�Fe�ec��6��{4,���>3�����IE��aqQR��e�\�p�����z�����f�e$����u��y7};�����S����}��n��Z"Ce�`)(Z��HH�KL�/[2j�H#d�2�q�v)�F�1=5�[#����������&tqU��Aj��
�#F�=0�)	�9@9@�P�g���U470al����m[y��H(��(���(��p������<�x(G��r�i^�A�NL8��,�y6J!%������������b?��~n�O����G�L�	�E\
gX��Y���:8Pj�*w����:�+5�����U�m�����F�������|Mm�����|M:lRa��8���:?��2�����l��c������O�iXF`��������rq�����$o������b���m�7�C����#-�?�/�Q;����m��'�W
���Nr���0��j�p�;[�\��l��e��3&k�Jy[�<V���|O	q�Q]�_�G?�/���o�!|����w����P�6
m'�91�y�Kq�X^�� ?*>�J�94���c]���/�_�e���,M�&��
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#25)
Re: pg_rewind in contrib

What is this define good for? I couldn't understand where it fits; is
it just a leftover?

+#define pageinfo_set_truncation(forkno, rnode, blkno) datapagemap_set_truncation(pagemap, forkno, rnode, blkno)

Please don't name files with generic names such as util.c/h. It's
annoying later; for instance when I want to open analyze.c I have to
choose the one in parser or commands. (pg_upgrade in particular is
already a mess; let's not copy that.)

Is the final destiny of this still contrib? There are several votes for
putting it in src/bin/ right from the start, which sounds reasonable to
me as well. We didn't put pg_basebackup in contrib initially for instance.
If we do that, we will need more attention to translatability markers of
user-visible error messages; if you want to continue using fprintf()
then you will need _() around the literals, but it might be wise to use
another function instead so that you can avoid them (say, have something
like pg_upgrade's pg_fatal() which you can then list in nls.mk).
...
Uh, I notice that you have _() in some places such as slurpFile().

I agree with Andres that it would be better to avoid a second copy of
the xlogreader helper stuff. A #FRONTEND define in xlogreader.c seems
acceptable, and pg_rewind can have a wrapper function to add extra
functionality later, if necessary -- or we can add a flags argument,
currently unused, which could later be extended to indicate to read from
archive, or something.

Copyright lines need updated to 2015 (meh)

Our policy on header inclusion says that postgres_fe.h comes first, then
system includes, then other postgres headers. So at least this one is
wrong:

+++ b/contrib/pg_rewind/copy_fetch.c
@@ -0,0 +1,586 @@
[...]
+#include "postgres_fe.h"
+
+#include "catalog/catalog.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "pg_rewind.h"
+#include "fetch.h"
+#include "filemap.h"
+#include "datapagemap.h"
+#include "util.h"

The reason for this IIRC is that system includes could modify behavior
of some calls in obscure platforms under obscure circumstances.

+	while ((xlde = readdir(xldir)) != NULL)
+	{

You need to reset errno here, see commit 6f03927fce038096f53ca67eeab9a.

+static int dstfd = -1;
+static char dstpath[MAXPGPATH] = "";
+
+void
+open_target_file(const char *path, bool trunc)
+{

I think these file-level static vars ought to be declared at top of
file, probably with more distinctive names.

+void
+close_target_file(void)
+{
+	if (close(dstfd) != 0)
+	{
+		fprintf(stderr, "error closing destination file \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+
+	dstfd = -1;
+	/* fsync? */
+}

Did you find an answer for the above question?

+static bool
+endswith(const char *haystack, const char *needle)
+{
+	int needlelen = strlen(needle);
+	int haystacklen = strlen(haystack);
+
+	if (haystacklen < needlelen)
+		return false;
+
+	return strcmp(&haystack[haystacklen - needlelen], needle) == 0;
+}

We already have this function elsewhere.

+	if (type != FILE_TYPE_REGULAR && isRelDataFile(path))
+	{
+		fprintf(stderr, "data file in source \"%s\" is a directory.\n", path);
+		exit(1);
+	}

Here you have error messages ending with periods; but not in most other
places.

+/*
+ * Does it look like a relation data file?
+ */
+static bool
+isRelDataFile(const char *path)

This doesn't consider forks ... why not? If correct, probably it deserves a comment.

+struct filemap_t
+{
+	/*
+	 * New entries are accumulated to a linked list, in process_remote_file
+	 * and process_local_file.
+	 */
+	file_entry_t *first;
+	file_entry_t *last;
+	int			nlist;

Uh, this is like the seventh open-coded list implementation in frontend
code. Can't we have this using ilist for a change?

Existing practice is that SQL keywords are uppercase:

+	sql =
+		"-- Create a recursive directory listing of the whole data directory\n"
+		"with recursive files (path, filename, size, isdir) as (\n"
+		"  select '' as path, filename, size, isdir from\n"
+		"  (select pg_ls_dir('.') as filename) as fn,\n"
+		"        pg_stat_file(fn.filename) as this\n"
+		"  union all\n"
+		"  select parent.path || parent.filename || '/' as path,\n"
+		"         fn, this.size, this.isdir\n"
+		"  from files as parent,\n"
+		"       pg_ls_dir(parent.path || parent.filename) as fn,\n"
+		"       pg_stat_file(parent.path || parent.filename || '/' || fn) as this\n"
+		"       where parent.isdir = 't'\n"
+		")\n"
+		"-- Using the cte, fetch a listing of the all the files.\n"
+		"--\n"
+		"-- For tablespaces, use pg_tablespace_location() function to fetch\n"
+		"-- the link target (there is no backend function to get a symbolic\n"
+		"-- link's target in general, so if the admin has put any custom\n"
+		"-- symbolic links in the data directory, they won't be copied\n"
+		"-- correctly)\n"
+		"select path || filename, size, isdir,\n"
+		"       pg_tablespace_location(pg_tablespace.oid) as link_target\n"
+		"from files\n"
+		"left outer join pg_tablespace on files.path = 'pg_tblspc/'\n"
+		"                             and oid::text = files.filename\n";

This FRONTEND game is pretty nasty -- why do you need it? Can we fix
the headers to avoid needing it?

+/*-------------------------------------------------------------------------
+ *
+ * parsexlog.c
+ *	  Functions for reading Write-Ahead-Log
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1996-2008, Nippon Telegraph and Telephone Corporation
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#define FRONTEND 1
+#include "c.h"
+#undef FRONTEND
+#include "postgres.h"
+
+#include "pg_rewind.h"
+#include "filemap.h"
+
+#include <unistd.h>
+
+#include "access/rmgr.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+#include "catalog/pg_control.h"
+#include "catalog/storage_xlog.h"
+#include "commands/dbcommands.h"

(I think the answer is in dbcommands.h; we could split it to a new
dbcommands_xlog.h)

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Michael Paquier (#16)
1 attachment(s)
Re: pg_rewind in contrib

On 01/06/2015 09:13 AM, Michael Paquier wrote:

Some more comments:
- Nitpicking: the header formats of filemap.c, datapagemap.c,
datapagemap.h and util.h are incorrect (I pushed a fix about that in
pg_rewind itself, feel free to pick it up).

Ah, fixed.

- parsexlog.c has a copyright mention to Nippon Telegraph and
Telephone Corporation. Cannot we drop it safely?

Removed. The file used to contain code for handling different WAL record
types that was originally copied from pg_lesslog, hence the NTT
copyright. However, that code is no longer there.

- Error codes needs to be generated before building pg_rewind. If I do
for example a simple configure followed by make I get a failure:
$ ./configure
$ cd contrib/pg_rewind && make
In file included from parsexlog.c:16:
In file included from ../../src/include/postgres.h:48:
../../src/include/utils/elog.h:69:10: fatal error: 'utils/errcodes.h'
file not found
#include "utils/errcodes.h"

Many other contrib modules have the same problem. And other
subdirectories too. It's not something that this patch needs to fix.

- MSVC build is not supported yet. You need to do something similar to
pg_xlogdump, aka some magic with for example xlogreader.c.
- Build fails with MinGW as there is visibly some unportable code:

Fixed all the errors I got on MSVC. The biggest change was rewriting the
code that determines if a file is a relation file, based on its
filename. It used a regular expression, which I replaced with a bunch of
sscanf calls, and a cross-check that GetRelationPath() returns the same
filename.

copy_fetch.c: In function 'check_samefile':
copy_fetch.c:298:2: warning: passing argument 2 of '_fstat64i32' from incompatib
le pointer type [enabled by default]
if (fstat(fd1, &statbuf1) < 0)
^
In file included from ../../src/include/port.h:283:0,
from ../../src/include/c.h:1050,
from ../../src/include/postgres_fe.h:25,
from copy_fetch.c:10:
c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *' but arg
ument is of type 'struct stat *'
__CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32 *_stat)
{

Strange. There isn't anything special about the fstat() calls in
pg_rewind. Do you get these from other modules that call fstat, e.g.
pg_stat_statements?

I did not see these warnings when building with MSVC, and don't have
MinGW installed currently.

Hm. I think that this is something we should try to fix first
upstream.

Yeah, possibly. But here's a new patch anyway.

- Heikki

Attachments:

pg_rewind-contrib-4.patch.gzapplication/gzip; name=pg_rewind-contrib-4.patch.gzDownload
#28Peter Eisentraut
peter_e@gmx.net
In reply to: Andres Freund (#20)
Re: pg_rewind in contrib

On 1/6/15 7:17 PM, Andres Freund wrote:

One problem is that it doesn't use the replication protocol,

so the setup is going to be inconsistent with pg_basebackup. Maybe the
replication protocol could be extended to provide the required data.

I'm not particularly bothered by the requirement of also requiring a
normal, not replication, connection. In most cases that'll already be
allowed.

I don't agree. We have separated out replication access, especially in
pg_hba.conf and with the replication role attribute, for a reason. (I
hope there was a reason; I don't remember.) It is not unreasonable to
set things up so that non-replication access is only from the
application tier, and replication access is only from within the
database tier.

Now we're saying, well, we didn't really mean that, in order to use the
latest replication management tools, you also need to open up
non-replication access, but we assume you already do that anyway.

Now I understand that making pg_rewind work over a replication
connection is a lot more work, and maybe we don't want to spend it, at
least right now. But then we either need to document this as an
explicit deficiency and think about fixing it later, or we should
revisit how replication access control is handled.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Michael Paquier
michael.paquier@gmail.com
In reply to: Heikki Linnakangas (#27)
Re: pg_rewind in contrib

On Fri, Jan 9, 2015 at 1:02 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Fixed all the errors I got on MSVC. The biggest change was rewriting the
code that determines if a file is a relation file, based on its filename. It
used a regular expression, which I replaced with a bunch of sscanf calls,
and a cross-check that GetRelationPath() returns the same filename.

It looks definitely better like that. Thanks.

copy_fetch.c: In function 'check_samefile':
copy_fetch.c:298:2: warning: passing argument 2 of '_fstat64i32' from
incompatib
le pointer type [enabled by default]
if (fstat(fd1, &statbuf1) < 0)
^
In file included from ../../src/include/port.h:283:0,
from ../../src/include/c.h:1050,
from ../../src/include/postgres_fe.h:25,
from copy_fetch.c:10:
c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *'
but arg
ument is of type 'struct stat *'
__CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32
*_stat)
{

Strange. There isn't anything special about the fstat() calls in pg_rewind.
Do you get these from other modules that call fstat, e.g.
pg_stat_statements?

I did not see these warnings when building with MSVC, and don't have MinGW
installed currently.

Don't worry about those ones, it is discussed here already:
/messages/by-id/CAB7nPqTrmmZo2y92DfZEd-mWo1cenEoaUhCZppv=OB84-4C9vA@mail.gmail.com

MSVC build still has a warning:
"C:\Users\mpaquier\git\postgres\pg_rewind.vcxproj" (default target) (60) ->
(Link target) ->
xlogreader.obj : warning LNK4049: locally defined symbol
pg_crc32c_table imported
[C:\Users\ioltas\git\postgres\pg_rewind.vcxproj]

The documentation needs some more polishing:
1) s/pg_reind/pg_rewind
2) by settings "wal_log_hints = on" => by setting
<varname>wal_log_hints</> to <literal>on</>
3) You should avoid using an hardcoded list of items in a block <para>
to list how pg_rewind works. Each item in the list should be changed
to use <orderedlist>:
<para>
The basic idea is to copy everything from the blah...

<orderedlist>
<listitem>
<para>
Scan the WAL log of blah..
</para>
</listitem>
<listitem>
<para>
paragraph2
</para>
<listitem>
[blah.]
</orderedlist>
</para>
4) --source-server and --target-pgdata are not listed in the list of options.
5) The synopsis should be written like that IMO: pg_rewind [option...]
6) pg_rewind is listed several times, doesn't it need <application>?
7) pg_xlog should use <filename>
8) Perhaps a section "See also" at the bottom could be added to
mention pg_basebackup?

Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Peter Eisentraut (#28)
Re: pg_rewind in contrib

On 01/08/2015 10:44 PM, Peter Eisentraut wrote:

On 1/6/15 7:17 PM, Andres Freund wrote:

One problem is that it doesn't use the replication protocol,

so the setup is going to be inconsistent with pg_basebackup. Maybe the
replication protocol could be extended to provide the required data.

I'm not particularly bothered by the requirement of also requiring a
normal, not replication, connection. In most cases that'll already be
allowed.

I don't agree. We have separated out replication access, especially in
pg_hba.conf and with the replication role attribute, for a reason. (I
hope there was a reason; I don't remember.) It is not unreasonable to
set things up so that non-replication access is only from the
application tier, and replication access is only from within the
database tier.

I agree. A big difference between a replication connection and a normal
database connection is that a replication connection is not bound to any
particular database. It also cannot run transactions, and many other things.

It would nevertheless be handy to be able to do more stuff in a
replication connection. For example, you might want to run functions
like pg_create_restore_point(), pg_xlog_replay_pause/resume etc. in a
replication connection. We should extend the replication protocol to
allow such things. I'm not sure what it would look like; we can't run
arbitrary queries when not being connected to a database, or arbitrary
functions, but there are a lot of functions that would make sense.

Now I understand that making pg_rewind work over a replication
connection is a lot more work, and maybe we don't want to spend it, at
least right now. But then we either need to document this as an
explicit deficiency and think about fixing it later, or we should
revisit how replication access control is handled.

Right, that's how I've been thinking about this. I don't want to start
working on the replication protocol right now, I think we can live with
it as it is, but it sure would be nicer if it worked over a replication
connection.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#30)
Re: pg_rewind in contrib

There are two ways in which access control for replication connections
is separate:

- replication pseudo-database in pg_hba.conf

- replication role attribute

If someone has a restrictive setup for replication and pg_basebackup,
and then pg_rewind enters, they will have to

- add a line to pg_hba.conf to allow non-replication access

- connect as superuser

The first is an inconvenience, although it might be useful for this and
other reasons to have a variant of "all" includes "replication".

The second, however, is a problem, because you have to change from a
restricted, quasi-ready-only user to full superuser access.

One way to address that might be if we added backend functions that are
variants of pg_ls_dir() and pg_stat_file() that are accessible only with
the replication attribute. Or we allow calling pg_ls_dir() and
pg_stat_file() with relative paths with the replication attribute.
(While we're at it, add a recursive variant of pg_ls_dir().) Or some
variant of that.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Alvaro Herrera (#26)
1 attachment(s)
Re: pg_rewind in contrib

On 01/07/2015 06:19 PM, Alvaro Herrera wrote:

Fixed most of the issues you listed. More on a few remaining ones below.

Please don't name files with generic names such as util.c/h. It's
annoying later; for instance when I want to open analyze.c I have to
choose the one in parser or commands. (pg_upgrade in particular is
already a mess; let's not copy that.)

There was only one function in util.c, with only one caller, so I just
moved it to the calling file and removed util.c and util.h altogether.

There are few other files with fairly generic names: fetch.c, timeline.c
and (a new file I just added) logging.c. I agree it's annoying when
there are multiple files with same name, although I don't think it's
reasonable to totally avoid it either. We could call e.g. logging.c
pg_rewind_logging.c instead, but I'm not convinced that that is better.

Is the final destiny of this still contrib? There are several votes for
putting it in src/bin/ right from the start, which sounds reasonable to
me as well.

Seems that the majority opinion is to put this straight into src/bin.
Works for me, moved.

If we do that, we will need more attention to translatability markers of
user-visible error messages; if you want to continue using fprintf()
then you will need _() around the literals, but it might be wise to use
another function instead so that you can avoid them (say, have something
like pg_upgrade's pg_fatal() which you can then list in nls.mk).
...
Uh, I notice that you have _() in some places such as slurpFile().

I copy-pasted pg_fatal and pg_log from pg_upgrade, replaced all the
fprintf(stderr) calls with them, and created an nls.mk file. It's now
ready for translation.

I also removed the --verbose option and replaced it with --progress,
which shows a progress indicator similar to pg_basebackup's, and
--debug, which spews out the list of actions on each file and other
debugging output.

+void
+close_target_file(void)
+{
+	if (close(dstfd) != 0)
+	{
+		fprintf(stderr, "error closing destination file \"%s\": %s\n",
+				dstpath, strerror(errno));
+		exit(1);
+	}
+
+	dstfd = -1;
+	/* fsync? */
+}

Did you find an answer for the above question?

No. The question is, should pg_rewind fsync() every file that it
modifies? It would be a reasonable thing to do, to make sure that if you
crash immediately after running pg_rewind, the cluster is in a
consistent state. However, pg_basebackup for example doesn't do that.
initdb does, but that was added fairly recently.

I'm not thrilled about sprinkling fsync() calls everywhere that we touch
files. But I guess that would be the right thing to do. I'm planning to
do that as an add-on patch later, fixing also pg_basebackup and any
other utilities that need it.

+/*
+ * Does it look like a relation data file?
+ */
+static bool
+isRelDataFile(const char *path)

This doesn't consider forks ... why not? If correct, probably it deserves a comment.

Added a comment. (Non-main forks are always copied in toto, so they are
not considered as relation data files for pg_rewind's purposes.)

+struct filemap_t
+{
+	/*
+	 * New entries are accumulated to a linked list, in process_remote_file
+	 * and process_local_file.
+	 */
+	file_entry_t *first;
+	file_entry_t *last;
+	int			nlist;

Uh, this is like the seventh open-coded list implementation in frontend
code. Can't we have this using ilist for a change?

ilist is backend code. I'm not eager to move it to src/common. A linked
list is a trivial data structure, I don't think it's too bad to
re-invent that wheel.

In this case, it might make sense to get rid of the linked list
altogether. pg_rewind currently accumulates new entries in a list, and
turns that into a sorted array after all remote files have been
accumulated. But it could be rewritten to use an array to begin with.

This FRONTEND game is pretty nasty -- why do you need it? Can we fix
the headers to avoid needing it?

+/*-------------------------------------------------------------------------
+ *
+ * parsexlog.c
+ *	  Functions for reading Write-Ahead-Log
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1996-2008, Nippon Telegraph and Telephone Corporation
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#define FRONTEND 1
+#include "c.h"
+#undef FRONTEND
+#include "postgres.h"
+
+#include "pg_rewind.h"
+#include "filemap.h"
+
+#include <unistd.h>
+
+#include "access/rmgr.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+#include "catalog/pg_control.h"
+#include "catalog/storage_xlog.h"
+#include "commands/dbcommands.h"

(I think the answer is in dbcommands.h; we could split it to a new
dbcommands_xlog.h)

Ah, nice, that was the only thing left that needed the FRONTEND hack. Done.

Here is a new version. There are now a fair amount of code changes
compared to the version on github, like the logging and progress
information, and a lot of comment changes. I also improved the
documentation.

This patch is also available at my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch
"pg_rewind". The commit history of that isn't very clean, so don't pay
too much attention to that.

- Heikki

Attachments:

pg_rewind-bin-5.patch.gzapplication/gzip; name=pg_rewind-bin-5.patch.gzDownload
#33Andres Freund
andres@2ndquadrant.com
In reply to: Heikki Linnakangas (#32)
Re: pg_rewind in contrib

On 2015-01-14 11:43:00 +0200, Heikki Linnakangas wrote:

No. The question is, should pg_rewind fsync() every file that it
modifies?

Not immediately, but before the end, yes.

It would be a reasonable thing to do, to make sure that if you crash
immediately after running pg_rewind, the cluster is in a consistent state.
However, pg_basebackup for example doesn't do that. initdb does, but that
was added fairly recently.

initdb -S can be used for this if you want to - that's why Bruce added
it. It only works correctly for tablespaces since a couple weeks
however.

I'm not thrilled about sprinkling fsync() calls everywhere that we touch
files. But I guess that would be the right thing to do. I'm planning to do
that as an add-on patch later, fixing also pg_basebackup and any other
utilities that need it.

Yea, we really need to do this. We also need it in the server, right now
there's a bunch of rather ugly corner cases where we rely on not fsynced
files being present after e.g. two consecutive crashes. Abhijit has sent
a patch.

+struct filemap_t
+{
+	/*
+	 * New entries are accumulated to a linked list, in process_remote_file
+	 * and process_local_file.
+	 */
+	file_entry_t *first;
+	file_entry_t *last;
+	int			nlist;

Uh, this is like the seventh open-coded list implementation in frontend
code. Can't we have this using ilist for a change?

ilist is backend code. I'm not eager to move it to src/common. A linked list
is a trivial data structure, I don't think it's too bad to re-invent that
wheel.

Not a fan. The amount of bugs in open coded list manipulations tends to
be high.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Andres Freund
andres@2ndquadrant.com
In reply to: Heikki Linnakangas (#30)
Re: pg_rewind in contrib

On 2015-01-09 10:48:39 +0200, Heikki Linnakangas wrote:

It would nevertheless be handy to be able to do more stuff in a replication
connection. For example, you might want to run functions like
pg_create_restore_point(), pg_xlog_replay_pause/resume etc. in a replication
connection. We should extend the replication protocol to allow such things.
I'm not sure what it would look like; we can't run arbitrary queries when
not being connected to a database, or arbitrary functions, but there are a
lot of functions that would make sense.

Well, it's possible now to have replication connections connect to a
database if you choose now (replication=database dbname=...). We could
just add e.g. the fastpath function interface and add allow a couple of
builtin functions to be called. That'd probably be fairly simple.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#33)
Re: pg_rewind in contrib

On 2015-01-14 10:53:40 +0100, Andres Freund wrote:

On 2015-01-14 11:43:00 +0200, Heikki Linnakangas wrote:

I'm not thrilled about sprinkling fsync() calls everywhere that we touch
files. But I guess that would be the right thing to do. I'm planning to do
that as an add-on patch later, fixing also pg_basebackup and any other
utilities that need it.

Yea, we really need to do this. We also need it in the server, right now
there's a bunch of rather ugly corner cases where we rely on not fsynced
files being present after e.g. two consecutive crashes. Abhijit has sent
a patch.

explanation
http://archives.postgresql.org/message-id/20140918083148.GA17265%40alap3.anarazel.de
Abhijit's patch
/messages/by-id/20141106122653.GA18963@toroid.org

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#24)
Re: pg_rewind in contrib

On Wed, Jan 7, 2015 at 8:44 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

I understand, but I think "pg_rewind" is likely to be misleading to many
users who will say "but I don't want just to rewind".

I'm not wedded to the name I suggested, but I think we should look at
possible alternative names. We do have experience of misleading names
causing confusion (e.g. "initdb").

FWIW, pg_rewind sounds pretty good to me. But maybe I'm just not
understanding what the use case for this, apart from rewinding?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Michael Paquier
michael.paquier@gmail.com
In reply to: Heikki Linnakangas (#32)
1 attachment(s)
Re: pg_rewind in contrib

On Wed, Jan 14, 2015 at 6:43 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Here is a new version. There are now a fair amount of code changes compared
to the version on github, like the logging and progress information, and a
lot of comment changes. I also improved the documentation.

Perhaps this could a direct link to the page of pg_rewind with  <xref
linkend="app-pgrewind">?
-    possibly new, system. Once complete, the primary and standby can be
+    possibly new, system. The <application>pg_rewind</> utility can be
+    used to speed up this process on large clusters.
+    Once complete, the primary and standby can be
Missing a dot a the end of the phrase of this paragraph:
+        <para>
+          This option specifies the target data directory that is synchronized
+          with the source. The target server must shut down cleanly before
+          running <application>pg_rewind</application>
+        </para>

Compilation of pg_rewind on MSVC is not done. It is still listed as a
contrib/ in Mkvcbuild.pm.

The compilation of pg_xlogdump fails because inclusion of
dbcommands_xlog.h is missing in rmgrdesc.c (patch attached).

src/bin/Makefile has not been updated to compile pg_rewind.
--
Michael

Attachments:

pg-rewind-bin-5-fix.patchtext/x-patch; charset=US-ASCII; name=pg-rewind-bin-5-fix.patchDownload
diff --git a/contrib/pg_xlogdump/rmgrdesc.c b/contrib/pg_xlogdump/rmgrdesc.c
index 180818d..deb1b8f 100644
--- a/contrib/pg_xlogdump/rmgrdesc.c
+++ b/contrib/pg_xlogdump/rmgrdesc.c
@@ -23,6 +23,7 @@
 #include "access/xlog_internal.h"
 #include "catalog/storage_xlog.h"
 #include "commands/dbcommands.h"
+#include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
 #include "rmgrdesc.h"
#38Greg Stark
stark@mit.edu
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind in contrib

I must have missed this, how did you some the hint bit problem with
pg_rewind? Last I understood you ran the risk that the server has unlogged
hint bit updates that you wouldn't know to rewind.

#39Andres Freund
andres@2ndquadrant.com
In reply to: Greg Stark (#38)
Re: pg_rewind in contrib

On 2015-01-15 13:21:56 +0000, Greg Stark wrote:

I must have missed this, how did you some the hint bit problem with
pg_rewind? Last I understood you ran the risk that the server has unlogged
hint bit updates that you wouldn't know to rewind.

wal_log_hints = on

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Greg Stark (#38)
Re: pg_rewind in contrib

On 01/15/2015 03:21 PM, Greg Stark wrote:

I must have missed this, how did you some the hint bit problem with
pg_rewind? Last I understood you ran the risk that the server has unlogged
hint bit updates that you wouldn't know to rewind.

There's a new GUC in 9.4, wal_log_hints, for that. It has to be turned
on for pg_rewind to work. Or data checksums must be enabled, which also
causes hint bit updates to be logged.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#32)
Re: pg_rewind in contrib

Here is a random bag of comments for the v5 patch:

pg_xlogdump fails to build:

CC xlogreader.o
CC rmgrdesc.o
../../src/include/access/rmgrlist.h:32:46: error: 'dbase_desc' undeclared here (not in a function)
PG_RMGR(RM_DBASE_ID, "Database", dbase_redo, dbase_desc, dbase_identify, NULL, NULL)
^
rmgrdesc.c:33:10: note: in definition of macro 'PG_RMGR'
{ name, desc, identify},
^
../../src/include/access/rmgrlist.h:32:58: error: 'dbase_identify' undeclared here (not in a function)
PG_RMGR(RM_DBASE_ID, "Database", dbase_redo, dbase_desc, dbase_identify, NULL, NULL)
^
rmgrdesc.c:33:16: note: in definition of macro 'PG_RMGR'
{ name, desc, identify},
^
../../src/Makefile.global:732: recipe for target 'rmgrdesc.o' failed
make[2]: *** [rmgrdesc.o] Error 1

SGML files should use 1-space indentation, not 2.

In high-availability.sgml, pg_rewind could be a link.

In ref/pg_rewind.sgml,

-P option listed twice.

The option --source-server had be confused at first, because the entry
above under --source-pgdata also talks about a "source server". Maybe
--source-connection would be clearer?

Reference pages have standardized top-level headers, so "Theory of
operation" should be under something like "Notes".

Similarly for "Restrictions", but that seems important enough to go
into the description.

src/bin/pg_rewind/.gitignore lists files for pg_regress use, which is not used anymore.

src/bin/pg_rewind/Makefile says "Makefile for src/bin/pg_basebackup".

There should be an installcheck target.

RewindTest.pm should be in the t/ directory.

Code like this:

+       if (map->bitmap == NULL)
+           map->bitmap = pg_malloc(newsize);
+       else
+           map->bitmap = pg_realloc(map->bitmap, newsize);

is unnecessary. You can just write

map->bitmap = pg_realloc(map->bitmap, newsize);

because realloc handles NULL.

Instead of FILE_TYPE_DIRECTORY etc., why not use S_IFDIR etc.?

About this code

+           if (!exists)
+               action = FILE_ACTION_CREATE;
+           else
+               action = FILE_ACTION_NONE;

action is earlier initialized as FILE_ACTION_NONE, so the second
branch is redundant. Alternatively, remove the earlier
initialization, so that maybe the compiler can detect if action is not
assigned in some paths.

Error messages should not end with a period.

Some calls to pg_fatal() don't have the string end with a newline.

In libpqProcessFileList(), I would tend to put the comment outside of
the SQL command string.

Mkvcbuild.pm changes still refer to pg_rewind in contrib.

TestLib.pm addition command_is sounds a bit wrong. It's evidently
modelled after command_like, but that now sounds wrong too. How about
command_stdout_is?

The test suite needs to silence all non-TAP output. So psql needs to
be run with -q pg_ctl with -s etc. Any important output needs to be
through diag() or note().

Test cases like

ok 6 - psql -A -t --no-psqlrc -c port=10072 -c SELECT datname FROM pg_database exit code 0

should probably get a real name.

The whole structure of the test suite still looks too much like the
old hack. I'll try to think of other ways to structure it.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Peter Eisentraut (#41)
1 attachment(s)
Re: pg_rewind in contrib

On 01/16/2015 03:30 AM, Peter Eisentraut wrote:

Here is a random bag of comments for the v5 patch:

Addressed most of your comments, and Michael's. Another version
attached. More on a few of the comments below.

The option --source-server had be confused at first, because the entry
above under --source-pgdata also talks about a "source server". Maybe
--source-connection would be clearer?

Hmm. I would rather emphasize that the source is a running server, than
the connection to the server. I can see the confusion, though. What
about phrasing it as:

--source-pgdata

Specifies path to the data directory of the source server, to
synchronize the target with. When <option>--source-pgdata</> is used,
the source server must be cleanly shut down.

The point is that whether you use --source-pgdata or --source-server,
the source is a PostgreSQL server. Perhaps it should be mentioned here
that you only need one of --source-pgdata and --source-server, even
though the synopsis says that too.

Another idea is to rename --source-server to just --source. That would
make sense if we assume that it's more common to connect to a live server:

pg_rewind --target mypgdata --source "host=otherhost user=dba"

pg_rewind --target mypgdata --source-pgdata /other/pgdata

Reference pages have standardized top-level headers, so "Theory of
operation" should be under something like "Notes".

Similarly for "Restrictions", but that seems important enough to go
into the description.

Oh. That's rather limiting. I'll rename the "Restrictions" to "Notes" -
that seems to be where we have listed limitations like that in many
other pages. I also moved Theory of operation as a "How it works"
subsection under "Notes".

The top-level headers aren't totally standardized in man pages. Googling
around, a few seem to have a section called "IMPLEMENTATION NOTES",
which would be a good fit here. We don't have such sections currently,
but how about it?

There should be an installcheck target.

Doesn't seem appropriate, as there are no regression tests that would
run against an existing cluster. Or should it just use the installed
pg_rewind and postgres binary, but create the temporary clusters all the
same?

RewindTest.pm should be in the t/ directory.

I used a similar layout in src/test/ssl, so that ought to be changed too
if we're not happy with it.

"make check" will not find the module if I just move it to t/, so that
would require changes to Makefiles or something. I don't know how to do
that offhand.

Instead of FILE_TYPE_DIRECTORY etc., why not use S_IFDIR etc.?

I don't see what that would buy us. Most of the code only knows about a
few S_IF* types, namely regular files, directories and symlinks. Those
are the types that there are FILE_TYPE_* codes for. If we started using
the more general S_IF* constants, it would not be clear which types can
appear in which parts of the code. Also, the compiler would no longer
warn about omitting one of the types in a "switch(file->type)" statement.

TestLib.pm addition command_is sounds a bit wrong. It's evidently
modelled after command_like, but that now sounds wrong too. How about
command_stdout_is?

Works for me. How about also renaming command_like to
command_stdout_like, and committing that along with the new
command_stdout_is function as a separate patch, before pg_rewind?

The test suite needs to silence all non-TAP output. So psql needs to
be run with -q pg_ctl with -s etc. Any important output needs to be
through diag() or note().

Test cases like

ok 6 - psql -A -t --no-psqlrc -c port=10072 -c SELECT datname FROM pg_database exit code 0

should probably get a real name.

The whole structure of the test suite still looks too much like the
old hack. I'll try to think of other ways to structure it.

Yeah. It currently works with callback functions, like:

test program:
-> call RewindTest::run_rewind_test
set up both cluster
-> call before_standby callback
start standby
-> call standby_following_master callback
promote standby
-> call after_promotion callback
stop master
run pg_rewind
restart master
-> call after_rewind callback

It might be better to turn the control around, so that all the code
that's now in the callbacks are in the test program's main flow, and
test program calls functions in RewindTest.sh to execute the parts that
are currently between the callbacks:

test program
-> call RewindTest::setup_cluster
do stuff that's now in before_standby callback
-> call RewindTest::start_standby
do stuff that's now in standby_following_master callback
-> call RewindTest::promote_standby
do stuff that's now in after_promotion callback
-> call RewindTest::run_pg_rewind
do stuff that's now in after_rewind callback

- Heikki

Attachments:

pg_rewind-bin-6.patch.gzapplication/gzip; name=pg_rewind-bin-6.patch.gzDownload
#43Michael Paquier
michael.paquier@gmail.com
In reply to: Heikki Linnakangas (#42)
Re: pg_rewind in contrib

Heikki Linnakangas wrote:

Addressed most of your comments, and Michael's. Another version attached.

Looking at the set of TAP tests, I think that those lines open again
the door of CVE-2014-0067 (vulnerability with make check) on Windows:
# Initialize master, data checksums are mandatory
remove_tree($test_master_datadir);
system_or_bail("initdb -N -A trust -D $test_master_datadir

$log_path");

IMO we should use standard_initdb in TestLib.pm instead as pg_regress
--config-auth would be used for SSPI. standard_initdb should be
extended a bit as well to be able to pass a path to logs with
/dev/null as default. TAP tests do not run on Windows, still I think
that it would be better to cover any eventuality in this area before
we forget. Already mentioned by Peter, but I think as well that the
new additions to TAP should be a separate patch.

Random thought (not related to this patch), have a new option in
initdb doing this legwork:
+       # Accept replication connections on master
+       append_to_file("$test_master_datadir/pg_hba.conf", qq(
+local replication all trust
+host replication all 127.0.0.1/32 trust
+host replication all ::1/128 trust
+));

I am still getting a warning when building with MSVC:
xlogreader.obj : warning LNK4049: locally defined symbol
pg_crc32c_table imported
[C:\Users\ioltas\git\postgres\pg_rewind.vcxproj]
1 Warning(s)
0 Error(s)

Nitpicking: number of spaces here is incorrect:
+      that when <productname>PostgreSQL</> is started, it will start replay
+     from that checkpoint and apply all the required WAL.)
+     </para>
The header of src/bin/pg_rewind/Makefile mentions pg_basebackup:
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_basebackup
In this Makefile as well, I think that EXTRA_CLEAN can be removed:
 +EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c
Regards,
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44Michael Paquier
michael.paquier@gmail.com
In reply to: Michael Paquier (#43)
Re: pg_rewind in contrib

On Mon, Jan 19, 2015 at 2:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Heikki Linnakangas wrote:

Addressed most of your comments, and Michael's. Another version attached.

Extra thing: src/bin/pg_rewind/Makefile surely forgets to clean up the
stuff from the regression tests:
+clean distclean maintainer-clean:
+       rm -f pg_rewind$(X) $(OBJS) xlogreader.c
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45Michael Paquier
michael.paquier@gmail.com
In reply to: Michael Paquier (#44)
Re: pg_rewind in contrib

On Mon, Jan 19, 2015 at 2:50 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Mon, Jan 19, 2015 at 2:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Heikki Linnakangas wrote:

Addressed most of your comments, and Michael's. Another version

attached.

Extra thing: src/bin/pg_rewind/Makefile surely forgets to clean up the
stuff from the regression tests:
+clean distclean maintainer-clean:
+       rm -f pg_rewind$(X) $(OBJS) xlogreader.c

Heikki, what's your status here?
--
Michael

#46Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Michael Paquier (#43)
1 attachment(s)
Re: pg_rewind in contrib

On 01/19/2015 07:38 AM, Michael Paquier wrote:

Looking at the set of TAP tests, I think that those lines open again
the door of CVE-2014-0067 (vulnerability with make check) on Windows:
# Initialize master, data checksums are mandatory
remove_tree($test_master_datadir);
system_or_bail("initdb -N -A trust -D $test_master_datadir

$log_path");

IMO we should use standard_initdb in TestLib.pm instead as pg_regress
--config-auth would be used for SSPI. standard_initdb should be
extended a bit as well to be able to pass a path to logs with
/dev/null as default. TAP tests do not run on Windows, still I think
that it would be better to cover any eventuality in this area before
we forget. Already mentioned by Peter, but I think as well that the
new additions to TAP should be a separate patch.

Agreed, fixed to use standard_initdb. .

Random thought (not related to this patch), have a new option in
initdb doing this legwork:
+       # Accept replication connections on master
+       append_to_file("$test_master_datadir/pg_hba.conf", qq(
+local replication all trust
+host replication all 127.0.0.1/32 trust
+host replication all ::1/128 trust
+));

Yeah, that would be good. Perhaps as part of the pg_regress
--config-auth. If it's an initdb, then it might make sense to have the
same option to set wal_level=hot_standby, and max_wal_senders, so that
the cluster is immediately ready for replication. But that's a different
topic, I'm going to just leave it as it is in this pg_rewind patch.

Attached is a new patch version, fixing all the little things you
listed. I believe this is pretty much ready for commit. I'm going to
read it through myself one more time before committing, but I don't have
anything mind now that needs fixing anymore. I just pushed the change to
split dbcommands.h into dbcommands.h and dbcommands_xlog.h, as that
seems like a nice-to-have anyway.

- Heikki

Attachments:

pg_rewind-bin-7.patch.gzapplication/gzip; name=pg_rewind-bin-7.patch.gzDownload
#47Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#46)
Re: pg_rewind in contrib

Heikki Linnakangas wrote:

Attached is a new patch version, fixing all the little things you listed. I
believe this is pretty much ready for commit. I'm going to read it through
myself one more time before committing, but I don't have anything mind now
that needs fixing anymore.

I do -- it's obvious that you've given some consideration to
translatability, but it's not complete: anything that's using fprintf()
and friends are not marked for translation with _(). There are lots of
things using pg_fatal etc and those are probably fine.

One big thing that's not currently translated is usage(), but there are
others.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Michael Paquier
michael.paquier@gmail.com
In reply to: Alvaro Herrera (#47)
Re: pg_rewind in contrib

On Mon, Mar 9, 2015 at 11:59 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Heikki Linnakangas wrote:

Attached is a new patch version, fixing all the little things you listed. I
believe this is pretty much ready for commit. I'm going to read it through
myself one more time before committing, but I don't have anything mind now
that needs fixing anymore.

I do -- it's obvious that you've given some consideration to
translatability, but it's not complete: anything that's using fprintf()
and friends are not marked for translation with _(). There are lots of
things using pg_fatal etc and those are probably fine.

One big thing that's not currently translated is usage(), but there are
others.

Definitely.

On top of that, I had an extra look at this patch, testing pg_rewind
with OSX, Linux, MinGW-32 and MSVC. Particularly on Windows, I have
been able to rewind a node and to reconnect it to a promoted standby
using this utility compiled on both MinGW-32 and MSVC (nothing fancy
with two services, one master and a standby on a local server). I
think nobody has tested that until now though...

A minor comment:
+  <para>
+   <application>pg_rewind</> is a tool for synchronizing a PostgreSQL cluster
+   with another copy of the same cluster, after the clusters' timelines have
PostgreSQL needs the markup <productname>.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49Amit Kapila
amit.kapila16@gmail.com
In reply to: Heikki Linnakangas (#46)
Re: pg_rewind in contrib

On Mon, Mar 9, 2015 at 7:32 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Attached is a new patch version, fixing all the little things you listed.

I believe this is pretty much ready for commit. I'm going to read it
through myself one more time before committing, but I don't have anything
mind now that needs fixing anymore. I just pushed the change to split
dbcommands.h into dbcommands.h and dbcommands_xlog.h, as that seems like a
nice-to-have anyway.

Few assorted comments:

1.
+    <step>
+     <para>
+      Copy all those changed blocks from the new cluster to the old
cluster.
+
</para>
+    </step>

Isn't it possible incase of async replication that old cluster has
some blocks which new cluster doesn't have, what will it do
in such a case?

I have tried to test some form of such a case and it seems to be
failing with below error:

pg_rewind.exe -D ..\..\Data\ --source-pgdata=..\..\Database1
The servers diverged at WAL position 0/16DE858 on timeline 1.
Rewinding from last common checkpoint at 0/16B8A70 on timeline 1

could not open file "..\..\Data\/base/12706/16391" for truncation: No such
file
or directory
Failure, exiting

Exact scenario is:

Node -1 (master):
Step-1
Create table t1(c1 int, c2 char(500)) with (fillfactor=10);
insert into t1 values(generate_series(1,110),'aaaa');
Stop Node-1 (pg_ctl stop ..)

Step-2
Copy manually the data-directory (it contains WAL log as well)
to new location say Database1

Node-2 (standby)
Step-3
Change settings to make it stand-by (recovery.conf and change
postgresql.conf)
Start Node and verify all data exists.

Step-4
use pg_ctl promote to make Node-2 as master

Step-5
Start Node-1
insert few more records
insert into t1 values(generate_series(110,115),'aaaa');

Step-6
Node-2
Now insert one more records in table t1
insert into t1 values(116,'aaaa');

Stop both the nodes.

Now if I run pg_rewind on old-master(Node-1), it will lead to above error.
I think above scenario can be possible in async replication.

If I insert more records (greater than what I inserted in Step-5)
in Step-6, then pg_rewind works fine.

2.
diff --git a/src/bin/pg_rewind/RewindTest.pm
b/src/bin/pg_rewind/RewindTest.pm

+# To run a test, the test script (in t/ subdirectory) calls the functions

What do you mean by t/ subdirectory?

3.
+ <application>pg_rewind</> was run, and thereforce could not be copied by

typo /thereforce

4.
+static void
+sanityChecks(void)
+{
+ /* Check that there's no backup_label in either cluster */

I could not see such a check in code. Am I missing anything?

5.
+ /*
+ * TODO: move old file out of the way, if any. And perhaps create the
+ * file with temporary name first and rename in place after it's done.
+ */
+ snprintf(BackupLabelFilePath, MAXPGPATH,
+ "%s/backup_label" /* BACKUP_LABEL_FILE */, datadir_target);
+

There are couple of other TODO's in the patch, are these for future?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#50Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#48)
Re: pg_rewind in contrib

Michael Paquier wrote:

On top of that, I had an extra look at this patch, testing pg_rewind
with OSX, Linux, MinGW-32 and MSVC. Particularly on Windows, I have
been able to rewind a node and to reconnect it to a promoted standby
using this utility compiled on both MinGW-32 and MSVC (nothing fancy
with two services, one master and a standby on a local server). I
think nobody has tested that until now though...

Sweet!

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Kapila (#49)
Re: pg_rewind in contrib

On 03/10/2015 07:46 AM, Amit Kapila wrote:

On Mon, Mar 9, 2015 at 7:32 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Attached is a new patch version, fixing all the little things you listed.

I believe this is pretty much ready for commit. I'm going to read it
through myself one more time before committing, but I don't have anything
mind now that needs fixing anymore. I just pushed the change to split
dbcommands.h into dbcommands.h and dbcommands_xlog.h, as that seems like a
nice-to-have anyway.

Few assorted comments:

1.
+    <step>
+     <para>
+      Copy all those changed blocks from the new cluster to the old
cluster.
+
</para>
+    </step>

Isn't it possible incase of async replication that old cluster has
some blocks which new cluster doesn't have, what will it do
in such a case?

Sure, that's certainly possible. If the source cluster doesn't have some
blocks that exist in the target, IOW a file in the source cluster is
shorter than the same file in the target, that means that the relation
was truncated in the source. pg_rewind will truncate the file in the
target to match the source's size, although that's not strictly
necessary as there will also be a WAL record in the source about the
truncation. That will be replayed on the first startup after pg_rewind
and would do the truncation anyway.

I have tried to test some form of such a case and it seems to be
failing with below error:

pg_rewind.exe -D ..\..\Data\ --source-pgdata=..\..\Database1
The servers diverged at WAL position 0/16DE858 on timeline 1.
Rewinding from last common checkpoint at 0/16B8A70 on timeline 1

could not open file "..\..\Data\/base/12706/16391" for truncation: No such
file
or directory
Failure, exiting

Hmm, could that be just because of the funny business with the Windows
path separators? Does it work if you use "-D ..\..\Data" instead,
without the last backslash?

+# To run a test, the test script (in t/ subdirectory) calls the functions

What do you mean by t/ subdirectory?

There is a directory, "src/bin/pg_rewind/t", which contains the
regression tests.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52Amit Kapila
amit.kapila16@gmail.com
In reply to: Heikki Linnakangas (#51)
Re: pg_rewind in contrib

On Wed, Mar 11, 2015 at 3:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 03/10/2015 07:46 AM, Amit Kapila wrote:

Isn't it possible incase of async replication that old cluster has
some blocks which new cluster doesn't have, what will it do
in such a case?

Sure, that's certainly possible. If the source cluster doesn't have some
blocks that exist in the target, IOW a file in the source cluster is
shorter than the same file in the target, that means that the relation was
truncated in the source.

Can't that happen if the source database (new-master) haven't
received all of the data from target database (old-master) at the
time of promotion?
If yes, then source database won't have WAL for truncation and
the way current mechanism works is must.

Now I think for such a case doing truncation in the target database
is the right solution, however should we warn user in some way
(either by mentioning about it in docs or in the pg_rewind utility after
it does truncation) that some of it's data that belongs to old-master
will be overridden by this operation, so if he wants he can keep a
backup copy of the same.

I have tried to test some form of such a case and it seems to be

failing with below error:

pg_rewind.exe -D ..\..\Data\ --source-pgdata=..\..\Database1
The servers diverged at WAL position 0/16DE858 on timeline 1.
Rewinding from last common checkpoint at 0/16B8A70 on timeline 1

could not open file "..\..\Data\/base/12706/16391" for truncation: No such
file
or directory
Failure, exiting

Hmm, could that be just because of the funny business with the Windows
path separators? Does it work if you use "-D ..\..\Data" instead, without
the last backslash?

I have tried without backslash as well, but still it returns
same error.

pg_rewind.exe -D ..\..\Data --source-pgdata=..\..\Database1
The servers diverged at WAL position 0/1769BD8 on timeline 5.
Rewinding from last common checkpoint at 0/1769B30 on timeline 5

could not open file "..\..\Data/base/12706/16394" for truncation: No such
file or directory
Failure, exiting

I have even tried with complete path:
pg_rewind.exe -D E:\WorkSpace\PostgreSQL\master\Data
--source-pgdata=E:\WorkSpace\PostgreSQL\master\Database1
The servers diverged at WAL position 0/1782830 on timeline 6.
Rewinding from last common checkpoint at 0/1782788 on timeline 6

could not open file "E:\WorkSpace\PostgreSQL\master\Data/base/12706/16395"
for truncation: No such file or directory
Failure, exiting

Another point is that after above error, target database
gets corrupted. Basically the target database contains
an extra data of source database and part of it's data.
I think thats because truncation didn't happened.

On retry it gives below message:
pg_rewind.exe -D ..\..\Data --source-pgdata=..\..\Database1

source and target cluster are on the same timeline
Failure, exiting

I think message displayed in this case is okay, however
displaying it as 'Failure' looks slightly odd.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#53Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Kapila (#52)
2 attachment(s)
Re: pg_rewind in contrib

On 03/11/2015 05:01 AM, Amit Kapila wrote:

On Wed, Mar 11, 2015 at 3:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 03/10/2015 07:46 AM, Amit Kapila wrote:

Isn't it possible incase of async replication that old cluster has
some blocks which new cluster doesn't have, what will it do
in such a case?

Sure, that's certainly possible. If the source cluster doesn't have some
blocks that exist in the target, IOW a file in the source cluster is
shorter than the same file in the target, that means that the relation was
truncated in the source.

Can't that happen if the source database (new-master) haven't
received all of the data from target database (old-master) at the
time of promotion?
If yes, then source database won't have WAL for truncation and
the way current mechanism works is must.

Now I think for such a case doing truncation in the target database
is the right solution,

Yeah, that can happen, and truncation is the correct fix for it. The
logic is pretty well explained by this comment in filemap.c:

/*
* It's a data file that exists in both.
*
* If it's larger in target, we can truncate it. There will
* also be a WAL record of the truncation in the source
* system, so WAL replay would eventually truncate the target
* too, but we might as well do it now.
*
* If it's smaller in the target, it means that it has been
* truncated in the target, or enlarged in the source, or
* both. If it was truncated locally, we need to copy the
* missing tail from the remote system. If it was enlarged in
* the remote system, there will be WAL records in the remote
* system for the new blocks, so we wouldn't need to copy them
* here. But we don't know which scenario we're dealing with,
* and there's no harm in copying the missing blocks now, so
* do it now.
*
* If it's the same size, do nothing here. Any locally
* modified blocks will be copied based on parsing the local
* WAL, and any remotely modified blocks will be updated after
* rewinding, when the remote WAL is replayed.
*/

however should we warn user in some way
(either by mentioning about it in docs or in the pg_rewind utility after
it does truncation) that some of it's data that belongs to old-master
will be overridden by this operation, so if he wants he can keep a
backup copy of the same.

Well, pg_rewind *always* overwrites any transactions that were committed
in the old master but not streamed to the standby. That's the whole
point. You're probably right that we should stress that out more in the
docs, though.

I've been thinking that it would be nice to print out a list of such
transactions that pg_rewind is going to overwrite. The admin could look
at the (hopefully short) list and perhaps try to fix up the data
manually afterwards, to recover the lost transactions. Perhaps we could
use the logical decoding stuff for that, to print out the transactions
in a human-readable format. But that's a TODO for the future.

I have tried to test some form of such a case and it seems to be

failing with below error:

pg_rewind.exe -D ..\..\Data\ --source-pgdata=..\..\Database1
The servers diverged at WAL position 0/16DE858 on timeline 1.
Rewinding from last common checkpoint at 0/16B8A70 on timeline 1

could not open file "..\..\Data\/base/12706/16391" for truncation: No such
file
or directory
Failure, exiting

Hmm, could that be just because of the funny business with the Windows
path separators? Does it work if you use "-D ..\..\Data" instead, without
the last backslash?

I have tried without backslash as well, but still it returns
same error.

pg_rewind.exe -D ..\..\Data --source-pgdata=..\..\Database1
The servers diverged at WAL position 0/1769BD8 on timeline 5.
Rewinding from last common checkpoint at 0/1769B30 on timeline 5

could not open file "..\..\Data/base/12706/16394" for truncation: No such
file or directory
Failure, exiting

I tried to reproduce this, but it tripped the "Assert(entry->isrelfile)"
assertion in process_block_change. However, that seems to be an
unrelated issue - pg_rewind was not handling FSM blocks correctly. It's
supposed to ignore them but extactPageInfo didn't get the memo. I think
I broke that when doing the changes for the new WAL record format.

After fixing that (new patch attached), your test case works fine for
me. I'm using the attached bash script to test it. Can you test if the
attached script works for you, and if it does, see if you can "fix" the
script so that it reproduces the error you're seeing?

Another point is that after above error, target database
gets corrupted. Basically the target database contains
an extra data of source database and part of it's data.
I think thats because truncation didn't happened.

Yeah, if something goes wrong during pg_rewind, the target database is
toast. Taking a backup, and using --dry-run is highly recommended ;-).
As a safety feature, perhaps pg_rewind should temporarily rename the
control file or something like that, so that if it's interrupted, the
target database will refuse to start up. Then again, that would make it
more difficult to do forensics or disaster recovery on the database, if
you didn't take a backup.

On retry it gives below message:
pg_rewind.exe -D ..\..\Data --source-pgdata=..\..\Database1

source and target cluster are on the same timeline
Failure, exiting

I think message displayed in this case is okay, however
displaying it as 'Failure' looks slightly odd.

Hmm. In other similar scenarios, pg_rewind will return Success with
message "No rewind required". Yeah, probably should do that in this case
too.

- Heikki

Attachments:

pg_rewind-bin-8.patch.gzapplication/gzip; name=pg_rewind-bin-8.patch.gzDownload
amits-test.shapplication/x-shellscript; name=amits-test.shDownload
#54Amit Kapila
amit.kapila16@gmail.com
In reply to: Heikki Linnakangas (#53)
1 attachment(s)
Re: pg_rewind in contrib

On Wed, Mar 11, 2015 at 2:23 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 03/11/2015 05:01 AM, Amit Kapila wrote:

I have tried without backslash as well, but still it returns
same error.

pg_rewind.exe -D ..\..\Data --source-pgdata=..\..\Database1
The servers diverged at WAL position 0/1769BD8 on timeline 5.
Rewinding from last common checkpoint at 0/1769B30 on timeline 5

could not open file "..\..\Data/base/12706/16394" for truncation: No such
file or directory
Failure, exiting

I tried to reproduce this, but it tripped the "Assert(entry->isrelfile)"
assertion in process_block_change. However, that seems to be an unrelated
issue - pg_rewind was not handling FSM blocks correctly. It's supposed to
ignore them but extactPageInfo didn't get the memo. I think I broke that
when doing the changes for the new WAL record format.

After fixing that (new patch attached), your test case works fine for me.
I'm using the attached bash script to test it. Can you test if the attached
script works for you, and if it does, see if you can "fix" the script so
that it reproduces the error you're seeing?

With attached modified script, I am able to reproduce the
error (I have used the latest pg_rewind patch (pg_rewind-bin-8))

The servers diverged at WAL position 0/1693400 on timeline 1.
Rewinding from last common checkpoint at 0/1693390 on timeline 1

could not open file "data-master/base/12706/16384" for truncation: No such
file or directory
Failure, exiting

I am able to reproduce it on Windows (haven't tried it on linux).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

amits-test-modify.shapplication/x-sh; name=amits-test-modify.shDownload
#55Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Kapila (#54)
1 attachment(s)
Re: pg_rewind in contrib

On 03/12/2015 08:49 AM, Amit Kapila wrote:

With attached modified script, I am able to reproduce the
error (I have used the latest pg_rewind patch (pg_rewind-bin-8))

Thanks! That reproduced the error for me too. Not sure why I couldn't
reproduce it earlier.

The cause was a silly typo in truncate_target_file:

@@ -397,7 +397,7 @@ truncate_target_file(const char *path, off_t newsize)

snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);

- fd = open(path, O_WRONLY, 0);
+ fd = open(dstpath, O_WRONLY, 0);
if (fd < 0)
pg_fatal("could not open file \"%s\" for truncation: %s\n",
dstpath, strerror(errno));

Attached is a new version of the patch, including that fix, and rebased
over current git master.

- Heikki

Attachments:

pg_rewind-bin-9.patch.gzapplication/gzip; name=pg_rewind-bin-9.patch.gzDownload
�?�Upg_rewind-bin-9.patch�\{W�H��>E��4p�d !�`C�d�!	�t���������W�<���{����-��	g�c��n�W�g�A8�����A����?���u8�v�[/��Q�����=e�r��^�/^��y������������|�����8O�mycc��;�}+���W��Wb��}-h(������"J�a��R���$
�����e�?/	DLpKQ��T�]�
q��\�U��2I��^$�L�^�I_x����(d~+s*��R�$��s��2��q��2)��bL���FO�������� I�i,Qe}��
���9Ar)K�"ES����^����/�0�d)��������v��]����\�E&72	�{^�9�$�w$����J�{VU�l*2I��A���
��O���#��U����G?M�0���������K���4��+.�}"�)A�u�S	C�HII
��[�)F�C��pbK�N����F����d,<W���E��v��	�4����[�?���������N�<���4?������K:B��+��#-�R�tL|�F�X��H��$� D�(�!��L���,��������?E6���m�NB�N�]�������e�a�<cLz3KY��!:��g"R���xf�m��A���ex��YZ�tD��U��C�^{8����e����i�~����a]l���$��8�����%T*���\��	���>�� �����*�|h�f�t����%� d�8��-�8W����32�~��d[���8$a�C�h��s8g��"�1��X�+Md���
�A�������
���S���eXF�`��G��@��g�
�kL�%�i�T������b�����qzp�b8����L�	���5|��2C�yV������&�u�&��`a�CiP�%Om$65�"�J���,�5���}�����+��W�.����i��h��$%�Q3��j4y�f�aQ�����Icbg`m��rbd�����,}��g��K���43�n�H��D���)�rI�O�Es�Di���\~1O��3���V�1�?[;=��)��KRm�����2�l���)�>B�K4N��l�-��]��n�B�ER���h1��B���)����m$���"u��e���� s����r��_��8@�A�M�F�4'2L&��2�:cp�S����'�@��gR�;�s��	�VE���C�w��'I�W�r���/|�xy�?
�F9���4��[9=>!����Mm:�0?�e$
+�f�0N�(�+��L{J%���!�%G\E%�_������7%u����t�)z��t�C��(�"y9"�,� �""bX��[�pR��A��H[��z��(<&�/q���w�D
�G^a�V�
:PP�v#��r����(���;�,������o^��C�>!�!�+R3/��(7!`�&�����
��������|�f&�pS����)�c�,��w�����6�.RfP�I���xP.�����U@��������9,y%;�S����H�X�9M���9��G���G�g�c�v�����3��6��B����9#�e�%ep$,��%RY���|�h�%�G�B��+N����2 s��\�1�����*9}&!q5��6�U_i I�!#c������.���wU���cpQ
m$!Y���d;V*�T)>�!���!)��1r���'e�+Y����8���S}��(����}�4S�A��
��e�$DAY���P���Ue����������G�N�U��A��!���7��&q%����J~Upe�������>��]HJ7!���Y��?S�-�����B3V�?��*���ym[5��|������iM,�	P�c�����Q���L�#���	}u�q
�)���L+�V���X�$���D�J��w���I5��@*�Z,5mi�����&(F�P��id��-N����3@W~�
K�n���5<N��:V&�m=�a��x@�9�)������i��eV���A9`F���:Z��]���2�:�p���s03H(#�k�p+L��_����w{�K���� 8�!���(���84^�S|��Z�Z�@3a�>��V7.k6�����<�3�@�F�I�j�������7���M;���_�|�<�+�C���A3�o�����%0��U&z!��@(��;L��@�Rs0�>
�)lG���	q����d����D��LC]P�8�J�I�>vs\&��NM)�r�L{��-����<�$��!��CL������xF����.U�DD���l^���;���R�j����I'�����K2!�Z���E�m���8�P������b=K�gQX����q�^��� �<�G��@���w0�<GtG����ahP�VeF6�8���0)��q���_���e������<U�����?��Pq�8,8���j�y���%\��KM��L��?��keO%��:�X����L�#���6$_��pa)��S�w`�3RW�����5*�Y��9��"]��u��P��]%pX+�\/�:����������R�P���$]~�$�dw^4�yH�$�
R�B���C+��m�/��c�_���&����)�X��e��<���0	������E
������h.mk(���t�p��<�� �yE��0��.�r1�r��eG������4��|]��.����(%>�>�7	!��MS�u��D���}P�ue��g��k��Kd�&�F�V��(��Xq��"&�(Nb)q^o�����rf��Z�V���C���V����+�gP!c|��I�N��*i��jTc��"�I+�:pU26
e�@U��M��UQ-6��|�s�����-��D��Z�ko`I��A����M�#�6X|����_\��I��\2�����N�j��m�����%W�#���=���:�5��.}�Wu}_D���+��X����G�C�=���������}�Z"�l0����"QA?�%����~���{����[�<5������7s=��s��P5�D{�����	����y��v�k���\��#������S}�ak���\����z������W!�����/��\���F?e���h!�'���+��s}_�
��'s��L�����zg4��p���#�FV�������M���~v�W??������"E��0���$o��G�w|zq)��o�b�����Tqf>���h`c�> �1��A5X�ET��.='	�����O����+9��� p�&���x�������U!��8�F����Pq
���2�P%*zU�DsP���O������Md��8���������`��z"�Z*�y������Yl�i���,*z���O��rn4���b{s��C����M�Q:�"q��V�?�����<�O$py����������#:d�
7����;5�{�0������}�D@h��m�.�Q�4���2����j�����i.���=e�u����0������g�`$���5�p�4�x���;<4��	�u����q�����D8��/>�:�tL@�T���\.���&Z*�c�o$��:�)G��7����7��G��1����������|�j-AEe�>���r������B�����������'&���������/�h�h��
����e&�/������#;1Bx��*���yI�_������?����g����8<�������D����_��]���5B�D��e(���S�2�Jg!|�X8c'~�	�?
���p:LHqA8b��`L���������������3���F�"����6B�D����Z[�.���7Z2<��K���U������/J<������/s��mZ�|�J�SsJ��O�-�aW5m���������%��OrSB��Q���$7�;
~k������|3�Ot(m����v+���4�(���"�����q��-_[/J�B�	���n.�U�:�y��Z5�Mr��U~P�\��g1<?bm������R�?�%��([_�B�!�xI���
n�d!Jui�T���8]�Rw�x���j�>=�mvr�W��G���E�/>�T�K�+��5�����j$����m�*��m����T��m�"��h���3�T�w5N[.A�;d|0�jp��5��^���qE�H��������SI�]�(�ih�]�zhX���$\���]�Z����T���g����j����}_�F���t<�����������m����2+L��l��bdh�"Y�bF�HplarO
e������@f�|a������1�������@��F	��`��_���o�y���#	�V�'��}�W_E�qC��L4�W��adj�mE_hZ`�1�R���r�%�[4NdQ���c������b:���V�<��&����>���btj2�X��������	�R���Y8z�|���Hi�3���.��7��9��_����������z���������D�;�k�'�\���j��*=!�&���z����"2Z��|F4�!�n�:�n�;���H�t��a+#��u#�F�g�fm��������]�P��$.B�lz�L�<S0��p,o�SE\��%���L{��!��z�C�q3}�Os�M���7��z�v�g������2��Y�����^uL�J��8(f���s�P��R�4���b$H�#�������~������X��oObK�O��G��v�a8$?�+zf�o�H-x���lMk�Lcz|xu�����+�����f��n���+9?�^�c��Rz�����F�[3~��h��TK�]�a����5~�I8_(�LR_r_8��t�"��jpk�o@�����e-F.Fm�@+��VL�/V��uB��Pa�����}�;8��3
� ����Z����� ������7���@#6~����	���CKjncY���}t�����+�0_�g��Ui��e�[j���m��Y�<B��c��8m"(��[�lp��Sw�m\�����H������Nd��4F9������j2=3�OZ�V��^���2R�/_�'q�(��uZ6V=���xXh��/��l�n4:�;���&�q����$qM�
��J���[m�|i��Y;�\|�����1���h�G;��l;��������&uT�D��#��,�v[;p#����9+�be�N8�����
pnfL6���c ���,�������z��'>Q(��dN��%�C��{V�`���J'G5�������6P�>�����[����6!�1�����Wa���]R}^g���y�K�e�P��v�'���]��Y�������k�����w#�>�}��D���R�Z�3���N8��R[��3��{�F�O@�qhu���o5 �}+i���
M1r5BzZ��'����Q�'h�#Qm=�a���wi�*c^Dd����v(w��*og���l�^��}����Wm��9���W2��oo9�;K��:w�x�M:]D���I���8��*G��_R�)�]p36�J�c���*�t� �n}�]�o���[d
/��]
4��=#{��ka�r��@��(�l6����\��9���.�Z�
�5
���r�b�,K������M��e �����M.���+�7/<�7����`����i�1�B�cf�qQ`�K���@7������}�;f��\{8}#F�f�T�Ko�o�4�}7�p5�S��?��h��+��|a*_�R�9�{u��eI�Fw=��2tN��XI�����q)�T*�(��YZM.��|]����XkB_�OI�
�uz�$��u�T�F=UH��?w��e��o��o9��k�'H�U�}|_"�����D��}_=}3�g
����D���zof�K�����WV�M:���zUa���P���QXf��C�dIFE��T+�a��9v��%�?������u$�"!;�7�K����"V�M����9����!#.���O�>����4!���k(b2���LA�H�>~������������6�$���S�hKD���MJrS$$��"�$��%�"P k��(@4��y���eFf6-��so������##c�yh�Yd+��,����c��u'���a�S�M��}�
�P`����^~!|��P9 y)�~�p�e���-8+��u��������[�}�L	�D�)9�����.��F�*��X�L��N|uyV[O1�we�Ii��9�'�$�P#�*����
��9���\���_U ������O��z�`��Jh��3g]�j���*s���u~�=t����>�h�\%Z�6]]`1O�d��������#���vv�b���3:�D��#�N��E/��	�(�����oe�#-�)�K[�}�x�%��Y;�av�xY���|�fw��{b����n�d��g�s���c�~��������I?����(Z����>+B��s�����Y�����T�}�~��<�)���7h^,Bjx�(�;��:�0�����v6���{�k��u�"����W����<����7P��j���� �&�,i������>��}��C3P��{��M0��9�G�g5WQ����\�y`+j��w��|}yJ�1�O����'���O��}�W��~�]�wr_��"�k�F������(�3,��^,�K������LW�t$�B/6h����k�3�x��]t��5g�=y}|\����C�`5@F��m�"oP��\����X�����s#�G��?���,n3iQ�����<AS���T�������'	Hp���U<��h��T�Ok����������*Y<h*�=�%Ks�Xp���:b3���U�x��S�Ml���`u��|�����
���F��}{
��S��`T1��D��ZzfZ�Y�&�n��PlQ��sm"Y�&	�K�A���7C�����Y������_�n�!m��Z������/���RR��ni�0�`���}8d���jh:�h=0��FO��?i�_���r������������?�������3E�t���J����C
����w"�+`0����F��Z��/4����:`���<�\���E���J�f�����!w5)��rdR0���P�����^Xh�F)b�@�m�Zxn�{zx�=��	C�uL�o�v(e�w����?��8�[(���R�lb6���k��ytq�xY1K�.d^��6�"8�7/=k4M��������C$�M��6�6O����o������A8�f��C���6�e��"�9��Y|��#g���c�(���s=8>�1��W2������i��/�*�
K_.�������~+���?�����X5L�����T '�������2����J��g���WQ q�\2;�|8�T��j�m��#�: �Rc�	aiW1�B&����=f�U��Fa9Z=�P]��
�����F��j�5"P���Ve��*l.����k���W0��bu����*��EE�����_u�a���XEn����+�,�X�o��C��l����1 }�iv�jk��E���j�yxS��x������F��`���c���9�����9��*�����<g�wZh��X��G�����G	2�j%G���8^?�v�,	� ���l��S�4,���T��z��5M�a��wx��\�?�!-�Kj��PV�{�yj6��'���������zJ���.^P���W���`z}9�����j^�����Jd��N�?�������y`������\��N���Z���3�|wy�����|���FAA�:onZ��
��7]*T���~���#<�%"����z�8��KAs>�mj�Pak�~[��jPM�eBW���Wh�_M:5pij��K�o~�?��G�����;��a����Z
a���������ktx�smR�J'6�����2%�w4�5i�E��c��qYU�q���OA��Y��K�:f�6���s��3ZxF{+b���F�S:9�E�
�0��l��c��}��I�����j�Z�����,��s��F�����F��F9�C��G��u�D�j j��S`�b�$(�e\e�b��IX�%�&u�q6RFQ}�F��R����Zf�S�o�WS%M��t4�W-w�x��~�F�:$�������������]�Q��NO+�������-��E������HQ��%u�<?dvo�����E�WsY�J�Q�]��J��������)5�$jM�����!lP����o(z���e��<sf9�
���z�_y������V���A���<��_)9���x8����=uC��z��l�'hj�1�/zN��U�4e��5?��;�K�+���8�=���|��V������9r����c?����T��d3��Bj�#�x=�smj�	F�l=�n9�7 �U<��Q,��O8��i����R���WR�e�E�YI���g�x4�����5�%('�S*����t{--���3kb����S>&�B�*F���\�X+���\��4ONO�����6$�B5Y�n��Wfq��Kk������L$<T�����-�������JPV�����i�1$�6���*��4/�����t����oV�[�^�5H�<U���������g*�^��?�|"�]K���=������.����/g�����l��������M~�F	�1��z�"Zb�[������k6 3��a���������C���~���]�6\�H[b:
�k������;r������#�
��TG���\�9��MV`��o��Bo��?�b�9�X���C.��Z�A��OV.��i���s��c�|���?�����,r3PcJ@�4DfA.wI��d|+ �����Z��k����O���q�Mk�Y@Yz�G�k\��9X�:�,�X�W�q�����7���\4������A��S0��D3�����zj��5�u�>���?f��dGYl$y����x���T���������h�������t������b���1%�J�A�x2�]A|i�]�0-�H;�CnO�<9�����R��b���q%���XC�
�T�������>��fe��f��\�/����N�q�*O1�b�0�O'��Az���H�(��������Pq��XZ!x�����_�)A8�
�4�	��gO��R��/ ��1�,e�J��eZ�vD���UfQ#��(�n������U1;�.�1y"���m|�E�����
t�{���d&&�"-��$����e=*g�mqGx����`�?��q�$=�)��U�A�2�i��������F	�&!���z���5�=L��Y�da��P�8���j�mYR���*�n�q��[	���'x�3�e���@VtTu�D�D��l��d4|����t�.e�3N�x��R4i�H{���,3�1.��8�1���bE+��8�����@���.�y�������?g��<F��e����U�T�8K�+�y�kI��_���F�'�`���hs����a�4����I�+h����P�[�ZC����\��o7�j��j.�c��Cm��"�����hW�����B);���f���#R|�*$ek��I�+���}:��@�=,S�J�?�s�v����������^;�8um��U:����<����_Y0�%%�9~�Vy��,��B��[�H���Q���
dH��
f�c�
��6|����
����%G�s��w~X��Z��$"j�m#��a��>>$n��)y,���AM#�����`������<ey/��b]�#�e9d%�����e!������>� ��O2 R�����F�?^3��!�n�
=�6G{�C��7D42�/��n�����0�����c�?�� KWJ"�`�v��+�(�'�A�Rk���P����pDOU����5����R�1�28����!F��,dD)���<-G������4��@|������Y�?����r3��-����j�*��
<�*[��'��P�'���a�e�7�.��g6������&r���{�<�h��~
iM���> T C�:WaDa�,_��L���J��!��� ALeV%�b�
��MR�D�h�+��[�b^�%��{�-@f���2P4�"}8��\G�~�mtV[}��P3�|�M�������g9U�`��p����Y��?��j��o,_En)_=���j����������4�x[C������!�v4I�����h����Vf�|�����$�.l�}�4.N�uJH�Y6c/��Rh��3����^��������\U��j��n�t�Z�h�'�5a����uKH�7s������������GH�7Z���K����h�T"K��l�e���Y���b<��ZX�\[��F@�H�j�,K	\@,���J�����(��-�`�.�����������V��I�H�U�Q?
���I2&�!��0�V������jqaz�����,x�:B�
��'���Me�8�:��[��r���Y�KaA|��o���GS���2� ��������J���D!h)9�
�����*����]�����I��[�
g�����p�����j1w#pL' q�f�����r?�<z��&DE	��Q/�������X�I��hn���pz���N3���t���,�0Lc��(��B�Xc���QX��t����0����D9&�-zV?�P|�m��xQ�8�gT�V^�^@��:����#�?������-�����]�����(T�R�lQx=F�Be�1��j��r���da�#;�����&���g	��#l	����q<_&c$Q��\������d1�xks���fQ�"����/������K[e�������s
Jbv�x����=
���1.��#-8s�c�7�R�{7����0�lG �[
�I�7�v�E1|>[���A:���2[n/�?���}�����>������A=:�y|pH�H�h��+:�����/����^s��L��?��(NNe��8�	D�"���HR�X���}�����4�x���_���I�l<�����~���"���v/(�����s"�>��%����30�Lq+���)�F�1+�%�	���=]2z\8�a�K"~�\ls����
hL	��
%����������)G��` �|o��?^���n��Zw�3I�f��������(�����i�/��q�3��r!�����ZmI����ai+a���o���a+\��D�b���Q��^�f`���
�]=Qe�
AV���C��"S��!�X5�,���-kO�>"7N���GT��|��d�5������0�>�$��Df����E+����S��@�	��d
ca���*�306��K|L��53����	,����NI*�?�[�:��u��2��e������%���(&[��1��sI��	����M��<e[�(����9�Ec�W?�����9B�?�K�8h���G��
�-�M	b��%Ge��\����� _���W��"��d�J�L0h���5_j��IP&��F�[33���L��(#�����]�U�9�%N���J�ZP�q!4�\
�VV�S8�1X�f����l�����ZE��n�E6-�\|�[uz�����L��
��8����)��@|�X!r���R�u$"�.�
��s��}L��%��*[���xM�x�U�lb-���>��������{Z�zB-�=r�yP�N��Q��R~�h5�f@��
�/����U��l�:k�,��7^��a=�>������
j������"��lUP���`�.3��]�Qn��5E�t�z��mm�E#8��	"�v�tU�������3�}�
q�����������4HD50����i�D�V"W��u�*U�9?�?�9��?�P5NN'��a� +5���j1�3���
A�N$�[�sXq��K��b���`	��r�U���pC�pN��K@^&c�Y��1�WU�~��K��~��;h�=j8���_���O-�FqQa�9&�x`�7�����=dX�����1���h��R05�/�X�>4s���v����I�j�_1q^�u�.L��]0cr<�����yos-���u�3��H��zt:�j�����K��(�e
;yA	�F2�����H���x`��}��7
����k����frt�t#������H�$q�����U8H�p\ir��m<�F�WW��L����O>��D��-����#Oy����_�S�p0on@:��\>'������M�<a���q�m� Y��,&��Fx���1�����v��JI�����1Z�
�aI�Z"��
:������^gpU�n[����K�>�p�����j����p�"����gxe?��P�����]-��~��y/e�8�:�	q6�1��U�\����\5�V�����[!����7��i���P��YP%��[7�w�.���$t@�Nb���� !o��20V3U��(��Z����M�DKa��bQ�H�������;�:7K�����r$'d�DX��,�6�����{����w���%�F�7di�����Q8��x�b�������f��x�V)�DaX��sC���c��<[��@&)*����9��|=�!W��1�@Zj��bNJ&�B���=�K�.�7����POm/���M	����	R����h4��{�:�z�j�,�?���G���x���������>��</����:������6�����f�>�7��z@[�)��|��5���}�����,�M�1����D`WS�dI���N��A�{G��x7x[T��-�>�����t��(�����6�
cM5�}fm�U��K��*2��9�D��UsE����i�K���4���W'JRb�a��:�����^��v�%�%����z�<5���K� �B�7�\Io8���U|i,6CP�����(�iC	���*�#�C���XYX���*N�������*6���~0��~���%�^y�ej���$����h)�����F@��Y�*�8��k�d��W���R
�"Z�l�8O9�s�s�O+<�����~�I����z�I=��^���k/������JpK���!��xfU�_���YK�T�*
��*��6���&UU��|9�m�
wwk��P�[F[�!�~g��Xi1�igi'	��7����aBI�!_��A|i%��C������&�X�����8m�8��i����6WY�_��KH�`&_Rx�Yv�3�A�������E�e�����T[T��_�c~��x�[;���6�g���y�KU3��t��?v�1�l/8��g}�O�t��|�<�]E��hs^��7o�;�&x�oN��)+z���bD(@��g���EV4�
@��;Y��uF��F�B8��V�
1���\b��GBb���HLY��`��p���C�jO�pY���Q�������Y��Q�)��X 
�S�:�v��Ot�L�5��:v ��c��b:��nbx����6����z[�>swR�94�t<�X=���H�0��7�F��+l��Q��@���%��i?���9 f86i��A/>�8��9p@��5�W=�"
������]��\�	�������uZ
>���pl�X3&b}$��C�w��~C���~��6-���7C~��N������==�����=���!�����,��3�^��Vp��h0�a/f�Q�s�,�feM��p���#q���u'k��;W��m�`��Y<X
��)��cY>��3jP���O�L�u�
>���>2�z��gG;�����/2��e�/��ubk����9W2�b
�N������&}���9f��+�E�M}#�M������|����&��F�����x��QaJ���k����`
��Y<J��]D�o��unU�N�����uE����U�����|������h\H ���6cU��]��+���!VXR|5�����s����yJ�//:k��r?�*��j(@�"���7%��d~��!�������N��J6n&�0��RrB�|����p����`bX�d4d��	VV�\%���$$�� a1\�1V�����8x�V���S	�g�2�6T�>��+�]�O(��V���PK�+tN��E2n��l��b�	�u��;J:�a�� ��v D	jNdA�fl{N���t3�P���n�d�6B]�Q}����n�2�GJ�*��V��V~�[ ���H�����"�D������M7w��g� X�7���"���Iy�����:�*S�"b��n<������w�G����G�;����<g���7*G8��xpw���	��<�"�0c�JZ�����c�n��>�����ya�!|}�?1s�C}���\;m[�0;��'���������Ko�W3^������$\8��W|
��Y+��{�W��I���p�^5`0�V�$D�F[��~n[�v����������������F���p����d��4]������~��?��-��Zfv/�����u�=<l��}|Y�����C�(����.��b�e6����������[���f�\>.����?��Ba�|F�co�q���
Q���`�g��Q����f�F��b�)L��NG^����3��)��wD�c=�W�'0R�}�f���v�]P�O���0���T	7��Z�A��S��O/��p�3h�Se�R��:1���kao�a�>	�U}��}����K�}�^�N@����9s�J1`��������N�/N�<y����"�����
�]
�bfT�0��Sf^�%���w�t����@7�%"�
a�������Y��{�����r�_>r0�G
N1��\�g�0���L�{���e�@��N������9��8=;)�yC��)Ce�6������Jk@L5�*�P(d����9��_V��*�T��)��x�J'h�vb�;WN������n��7Y�f�\�!�%�,_�����N��C���L�wI+�4��~f��O�[����5��Re}�(��Z��26��;z�������Nmc���G	0�n��^3�
�+%�Y��2-&1�����>�3��H�K��U}:z�����r���d�Y��q�&M���x�3K�zG���14��9��x�9�D��x1���EkT�4HZ=J�����P�����`�q�V����G'�w
�
�K��~�� ���)?���|�����e�L)������s4��fn������"*L�n��!*�WW[���%<����/��d+�\`�����,��T=�t��
��PQ����X�����$�=p'!1v���[fE����j��M���X��!�J����������Q�m_d�7��&H���}Iy�9S&	�^�cj��0� ��mB��u��&�1@�8T^�_��$���������f�D0��]���H	�)��������7���-C���-��������T�6�%Ca���~���6�F	F$�uT��&�l���Q�^c1�b��Uj4'S6�,lD�[��H>�����
�Z�j�C{s�s���\����>�f�,)ghg�
�j�4C��������#eW E�u����Z���Py }����I�=k��6�.k�.�R�J��Pc�h���~��)s��������a�
��\��G�@�����;i ���;�di��Z�<%%4P��z���k:?����#�`����:������C��(�k%B���v[�����v
��M�
��D)�Wd����o���^t�d
���
E9p�O���yh��������I�c���Q�#eL3ge�2�1�9������2E���X������]9����11�}��Tr��SO]�vYi�_{
c������V������B���_�����Xx>|�p0b�&�� �3��,���i/[��q�
������Dz]����,�9x%��$������b�������D��+]������y�"(o �����V����G<��n������'��p�d���2�ho���1�g��A/�����Y���>���g�����9�P;s�Z	�U�����4X#��iN�^�Gt�������%@6��iO%��8�(�)������b���N�uF.H])�n)f<���x��;�0�&�FwC9J~��'������ �s��W�:kZ�>���I�]V����{��-b>jh������Ix�?��b�}t:��@ko��I���O����bqa����L�1���es��0���(�_T����5���1������A������SmIQ�Z���<���2�^|�P-��3T�����l����<o\\���:m�%��}J�;p�-���qU������~�
����V`	"����1�����8�lK �@���K��9�9���&*-H6�I���?��C'�z��j��2;*�����V
�L+�����������Gc���C=r+k��ql&<��������Ju��iHq�A0�jg�0_�H��0���!Dq�%S�x��kfv�YqW�#^1q��c������)-<�s��lNZ)3�V���U�E�]�T�nR>f������s3��������k����#�����KM������G;������9Nx/���N��R�QF>��r��@��s1���F�����_�)���-,C������1�������3`���g�w�c0v�ALS!�$w��\!�2U��klh��:�8
���{�,��������Z��]��?�<O��=<6�;�@��>r3��S1&�FZ�^o��r:���9�5QqsVt�x�m���v�B��(3l�a8:F���x�B��5���r�|����U�>,����yg�T�I����[s�����&9�/�jm�k�{@�Z�����_�[�&5��pj��N ����[�!����eDm��
��i�'�R!=B5�I���pg;�W�������jCg\8��_�glx�I!��t>�3����=���T<eFNSjbh,b������aC�C�~>��!:o�>�8��A�mp��S#�Anp�s�s����a��U�����jd�)�u�*n�/��������P-)�D��M�<D���@�N����9FSjG�����
	Hi}����a��)oVW����jI��b��o���w�:���)��J���&.]B���TU��/��Y��y��+����������g�a�Y ���2����G�����!#�{���{�A��>x}b��h��8�\r4��[����u�4���Ga�Y��B�?���$t���@!!�����2�H,�#�O�_0R���h�7�K���Qy\�bU��"�U8��u~��!����x�qJ'd�x���������e�<�?�G'~
��xrgo��s},�����'��K��;�@6�&}}K@_F��3���::�������}-!����/�9������d�x�R�<�����nx&=�
_>JA>"w\<�p")�)��<e�o1����h+��_��6���e�k��R��}Pl�$�!G���o���&9s(���l���?�fU�Iq9����e����>��exkQ�HW�r���n35�{���9�LE.G!�nt�� \	��_�
bI4/�3�Lq�������Kl\�H�
ES��U����5������EBY-J���p������J�4����|N�#a� �Q�V}a%q�qNjQ>ED���M�����?���;�Gq/s<�%��,��]��j��*-
ap�i��_��������U����,�n_
�Lo^���pLe�4S�����V+OVl������
�����>'�S&�Z���W=����>�A��+�<���7
���������
���
��i\�Q��Y�1�#�/+�p"n�9�u>]�s*���E��)����@��V|h�Y4�?nAq�`�|���.�\�"���+A[�������X�����}SD�'5�UL�3��`F&�?�H-
�<������8��-�I�vk��m5c3z���/����E:"	-z�5��vu^s��,�c�6�0�k�p�1��`����_�Dc���Y2]D,��^kxWy <����6i�
f�H���-+�@�f�h ����+O)���;��ip;�5����n�KU�������*��7�Q~�Y��b�
�P���ij%!������tw��v
f��k�8��m'�$���
s��W�#�
2!JSI����?�q�N�cRG.�1�#��p��h�������'�g���T�}��d/�
�YK���:�E["���BA�K_e5 hQ��Ho��v��"*�}*��r�yF�	��h��L�JMe)�J��������{��9>�5L�gA��R��&����Q����/��U������yR�g������+k1�������ov�.�����X%�?w~
��h�X����:����y��.���b��QK�>����R}/�E@��9�c�M��o�F@����\37s����1xE��Gy<2�-�bI�,|g��z,�|I3.}%�5�^>f�_z4��y�L����3I
�%gE�	�A3�!%pF 5Gw�,��0��������V<3��$E����|��W
^����Q������Gt��_����
=8}�j��p�9�"����%U��gXr������<<:�,]9��yt��Y(��c����t.�MH���9e�\s�aa�
��C���c����J�3��U�%����H�K���{qgR/w3�`��95B��C�A��(��f-D�[��iaRj����$������*� <����+�X�5���j�������F���UtE�A�3��:���v�[ZE�,�t�3O��t���1���X�)6�,�X���8��;�8��H�|�N H��	�&��<n�A&�����bi�g,�*���K�Y�[$�B&A<;B�:y��RI�����9�����T"sh�Q�46�|�7HJ!K������G�	�7}#������V����d�����y�`@v����v��K[d?��r!�Bw���1%���\0����`t��c����<���D���3�l
&}D�V�y���>�my�����l�$6�I�nS����q�ks�������o����o_5..�_6�?^�P*=V��+k��x<�W�M���]��&)�������/��K�ZO���k�F�^C7cvW~�����{{��2�v��AY	������Y+����mn��������j���3=��S��;z���V������~rt��\��������o����W_�iwh#���s.
O*[�zV5�6m
���d�-�z���WN�d�-6�Gf���x��@��i1���S_�o�DR��a�H�����)��� ��
�/`-���8-���r�P�9;1�VQ�����������S�(���3sm����cH �G����V
'
"A����P�"��VB
�dL�lu�kL.���_p�!����e��W�|���l��C��zgg�����}/�{u�$�:�b�Te�=�
�?-f���Wz`���n�*�#?�UO���ZP�����#0AU����0���m���h�W�c�x�t|s�u���C�b�|�Y�z�B��(i"�M��g��Q�U�$
�������|\'CB�L23���t�X�p���j���38!%��*E���_�,$����M_VlH9���/�I�G���k)j{����@?�����z�DV�p#r#
x���1����!�1'C��������N��P-[�[ H���g����=*R���!`�{�PC4��\>~y��/������U�M��A�A|���u�������o<�C�=6��kd���
3���p;�e����,�8��������!)_��yT���������UF��M�Z�\���f�h��T�<����Ro~�]��f�K�������S��������V��+N�b����������7fPzq�������O�s|O�njZ�O�/wq!�)�f�?�6�O_�4����*|���a�=�1�fy���W�Y�T�0�����f�X:hs�Z��".Y��m�!�d�����;"������S�/;9���_�-t���@f+6���=��0�C��Y���p���x������w�&��w�����];�)<}�<��@����-�������q�x���ks���@���q	�fM��^���a�	��������|`�� +���\�� ��!��Y�o��k
z�A��4{Io2NMO���?�*��l�-�R��s7�mq���H\O.��^�l�_P�a��c���(v��2�n�Q8�ya��|����Zg	���e���v�r������-3=hEKB�_h��w���]��������o_�<�U�z�����38
���������?v&}�:m��rX��F��Q����L`�F��Z���T�
�.
BR��14,bI�og�+���^c�'�jLV�>�m��4�J����)����]t����	s767�q�O[.����^�W�����FI{P3�u����"���P�1� ��?V��`i�v�������W�W��4�1��aZN�1�y��A�2@���L����A�"za����;p����2��F,W~	.0���4��v
t���s [@�:������L5,���4���96�ptXw��C��G������p+��}s1�{���U	�>J~?N��|i�QC�����u}!����QU��j�sL��	��;f��`����!��������x�A�d��lN]*W�^����6�E<S��"���A�D�X9y����5D��C�K}��Z���S0������H�^X�UUZ���CjF^�{�i��[�� ;�e��M��������s�2��U^�V��Wl��j��6�w1�2�<�/�Z��I���=U<g�t�������| ��NbBD%�&��`�
�M�A�
K;�^���\��5�m"#������U�!���fI��s�"!�"5h��g�D��/��=%���P���z�-�Y���9�9�����/nn�"D����j�u�c,9JY��=	*zzx��N���m&��-�`����N�{p8;Qw�%�5�:����c��C����pE����4S�Wgf\T���u�	��j8[.9 Y�;�t�%��0*
�
<�'���q+�������y[�L�l�Z�+��y�h
/��4>�(�����T��
�'�y2��QX>�L��Vj�'h��`���\��vay��)��G��t0������l�]��U�H��/��}g����]�;�P�C8��
�OZ���"��%�a�?��whs��Q��1�b���p�R�x�/"���i�����G���_I���K�{	�:�m��������X�,�]��1�� d��TcH"cW��M<&�C�@xE�.��?������6�W�fR7��RA�����#�U��7S����#"P�a�
j������(^�.�2E�0?N;��x��>����D����`46��\�.5��Q_�4�(���������-&t
�ee������8�v����� f;�����r�
��Fq�h|F3r��\��	M���C�)]%������sf�����}�0B6��L��G)�&�{#Z�U�@�������@
,�.I��Yivq�^&�P*�D����y����i���iP,�'��%���;��m��Y������]�u����������N�N.�?��<<��D�}�iOO��N:����������������Q��v\GS�����h�H��������F������������c���&<`�u�p������?�M(�d����#���@7���F��F��I��3	���Iu��A���&����O��)|_�R.���
�O ]?N	������^6__4����q�8���M�������������)"8�^}pH����:� ��rng�GLx���*x�t{�}b�#z��j%Z9�E�����r�Z}�����M���~A�=�k�c��C�W����'*&�<�P������y������)��#��y_�%k"��?y��^��b����5��������<Z�@�KK�^l�B��M��y~�/����G'���r���.�����7���������v����0W2@���%����Y�S��$I.Q�P������~(�yv)[�T��B�,�@��$y�YC���k�E��c��qY�O�pE�;�������
A��
��">�������oy���<5�I>}Z�G���E[m�=w[W�	��
R8zIe!�,��lP��a"*r<a����3�M���rsP
k0~������PZs�*}���r���
GZ�G����a�J��;�kx�<�G�}s��LO����(B>��h�I��������ex��|��m6wH�i����`��t4v��F6�`j��a*A�:,����N20���^F5+n��7�:��(s0����pJ8k.�r�6������L���\�2��&��',7���Tnp�`2~O��=h0z���(������Kv^v��[&#���`�~?r�Q�n�f�j.jLC'�#�d:��R���FR�F��������j���XAklKn��=����O�����	3@��V�Q����p���42|J(sG��v�K����W/��G�L�<�Y�mhjFyt�;����FwfY(�i�����9�.��P�n
��u������(������'F�x��D)��I�r��:Y�$�����6k������S������
�d���M����3�1���|z��.OOw����fP�c��3H�,3.9;,I\����8".�G�����q���a���JE�rB��#>��7F�_M��$���N���45���b�
�%1��
�v�
�������N�i��F���������6������KJ-#T�D������m����.K.����Wx��:�������U���Q})�!-�_O�~������-�ry�Pfs�J�AE������Q��e|]���[�p��61�
�U��N���-�2x5��Y���:/���C��B"���������x��Ua��)�ls��6_)S�9F*�B�E�~!���N����}��c���#����/���$����~�f4���nkl�:S��bu�����I�{���7+Jef�?�3��3����x;/�1�r������������(�zJ������g�8XQ02l��HtYi�cL�q|�t+��������O�����|u��:���e�S��;z����<;mu��'P&^��FTN�/=�sH�D��������`��7K���%�=�p7
���DR�Z���^�,�R.Z���t��	G���|F������9k^����p��\��5�s��r�N���<�������]�H��y����H7�>�A��<�,�j�
���%a���\���O� �����������L���3����^���)M��
&w��h��f�Z����5�S�_�E�?�V�r���4��w�����}T���n��7���<�N:#�B.����%jt�����`P�zChS&^�����S�"�aE��S��4��9mJ7i�����������n�O�>i"�s��5�[)�v�z�q1����M�� ���?�h����9|~r�v�'�����:����w������|��hb!�����������y���9q�N�&s�7�U_*��
=X�����0����0U���?�����Vk�7������~�m�Y�l_������|�S9��?���*�R�o����R����W\j���/�\`-{V�KS�D�=mQm���6�v|�"�.�g+���Gb�>�wqi�e�Q*	.�=��T�!�+:iH�l��+	���~;_�y�u�:���/NAt��
Xq�0.�Cj����`G���x��d�j���=�JN��D��mz�69�G��|��%{�������Zs�i]V�V(kxV�?��4�,�8Y?���N5Z+�O��s�Ni���E�,�u�X��T�{�?�\�B�
]�`B�S^��
�
Wa�;�T�d4B��Ks��=����U�
���P��^ $x�2��3�-��4������_9�j�;R�E��������~���b�s�K�h��<��4�`
�t��me�y���v�2�B9'����m��L�O�9���[�an3��'���a��'���m=��D��J8\8;^?�6�#���
��Br�p�gu����#oq����W�d�������a�g��;�]0L_��t�+L3��������	��{�����sPK��*)-pyD�+$��"�N�UF_�0��3[�����mL�_
[�np��NFC�����rL)] ��ED��&����:��jV�`����F!�O��u V��,�I:��c��c1�2���D��6�rf1~����=�����Uw�����1Az����)��
����\��[E�q�Mf(n8���j)4��\�\�Y����� ������Oh���j������&��\�V�T���2��RB,r�bI��i�:�("���%
����45��r�c'%_�����/���o�fy�o��:��.�����
�e�������I��:`%������w���$�����S�[��������:�,`I$!��fv���G*�9��vo�L�G4�4I`��6�#���/��L���m�I�Y����6p��E��6;�t�L�b7|�T���y��u��h�=�r8��D��&�hD��|�g�+^���"�d'��r�������r� @*<����H���z`����� ����H'��R3-�
+�b�3=��l��i_���O�n��� ���;���0�v�?<�0�����ZM2{T�d�s&�|���S	����y��SS�-=j�Q��XC��p��Y��8��xY��uXP��\����d���S����f&>R)|�h�������`
p�@��
^�������\C>B�6^�e�>B?�l2�"�f������G���Btt�$�-a��hb3�NP��Q<�h)��a�Z�G6W2No�N���E�y�������H�"�"����t���h�M��_�{��������#CirSB���y��*4����Nx�@�_.W�I8��DK���$z�<�`���������Uu�����Q5��2��<��
`%"^���ju���%�_�!��_���s�
����K�D������zt
���
��������j��1�)��J�@HA�=�C\�^�����$M��-��o}?�kl�}�
��p�d��m�� �E�x�m,��hR��D������&#��F���8#3����L�9��V:�u�?�b$�*�_��M<�#���	�������3
�N�xd���L��d�BN�q���T"&GTj����1A�a&n���sA���D�*ps�]q%����WK�+)�)X�[����b�����g�����������}&��So0����A��N�+L��,��)X��y����~���O������%+��]���?�,����{"�������4y��n�|���.�.x�7�k�-��d�g�����\��)<
{1��g�PmY�K�X)��\�E���A����sU��
��Y���6;=TS�c����������V8��7�@A������f�`��?o��No��
����R��]��+q�FiHzfx��B�����,�%��d�^�����9���Z�*�b�'n5FM��M��^-s,��E�D���U����+��C3|�|����E���W@�t����`�y-����� uQ��U�(���1o���gHs�Q�� .������$�h����c98{	�T�C9���o�=��~7���l����{�jy�5��1A�T$�����yw��"�_�Gr���jz�)��s�����K�{#�����3��8��q��P;���m��7��(��U!NI_��Q� �H���}��o(L�Q`z�8�Y��:!�N{�Xy�������l�;�s���,|�1,vx�7�:|��?�@^���#����aXa�tz\x|;@j�^�.��dYD�Tm����#1�)�f�a@(+iRg�2g���>��^���#�q���k(	;�[1dpGdM�<��
�!�A0PQ.|�)�y�D�NL4�%��E-y�)8���]��s��C��4�O�i��D{e�UH�����KF�}������-d�8�d �vJii�B�E�����)O�s�7R�gP<�U���oEY+���[0+�����R�>R�P����E�	���������.��b��3�!�p�-���W�m�JJ����e=7b>cW�����j������6N��H�-���4hi�@��z�/�����w��B�iDT��I�2^��~�u�D�S�:]�=rl�X���Z�R��0����u^��%�o�\YJ^=��X03�/���$��1k1{��*�C�J&w���U��������0S1���o},�yJ�"���=GF�������i�l�S� b�f�L��H'�����O1=�����
~�H��B^�*�H���3D�etb�!�wr��h����[����P�y���C4����{��qo��5�\����J������C���n��o4_������	�1��0��C�������i�%*:n�hC �}a�]`F��Q�o{�0��`�4
�B��w�`������y���q���h���z@8���?��#�Gg�Hq�R����+������2�p`��\�*/����f�~��
0������oG����j��Et����������-�������(`������r�*�����S��B�uvJ�"������������u�u4�����i�^5.8=��+$�
C^�]�N�}uWX����UC��"��=��2�b����S��'��7v3�	9�Nq
�������p���+���'B�
5^4PL�����Qkg�d�k����a����N�.�����A��A<8}u���`%@@CUK��h7���E@�J\�$ N
��,�����=���W�?C����������(��������_~�s:�N�'.��-�D��P=����GBU:\��i�B66_[yq������_�oC�O~}��V/�@���a4_l��l[;����n��;���DO�q8���w8yl��}2x��.���,��%#.�PK�����C/h��	�@��#o��W�����o��.	�rX��@���W�m��h�2Uh�����l��"���������`}����\a�/A��)kc�9/W����&Q���J$��:;'9E�c�hz_7���
"�CJ��d���U!�\��
��TQ2��	=U����C885��
-�%���c[���">Hg3��XE�����)=���Ai	���d���b)�����y�h�5M]=<z��<���d�'��I��Z9���n9$�X%S�����Ye.����w�>��R.���������.
�-�0D����vs
���������5����Z2	��ZN6TK��!���@�w��+����b��.�2���Q2���RY�\��f�Y�|tX�������Q
��j%�$�4#��n{B��������V�*��V}�-d�~���������[�r������A�d2�������2�yi��������
=a��Y�RV�8�k���B��w}
��7O�VM;�����M���[����2��\s��D�O�$46Y��~q_9�%��S�j�/��W�����|\�Hy��9�~7�9�E��������ebC��qq���	�1CR�S*������ET)[�����u��M�qgM6�������������!����V���Haf�`ofv�i�6/�;[�3t����n�Ca��@�>���>����l��Ms�>�X��S������nUZ�^�ZR�6Y�d��/Ow?��R��-8������+0���<������������cy���Y���^4�����<�����������M3p�MCj�-������]�[v�R�O
U1}V�i��P2F��P�}R���I�
�m��b�_e���}S0�D�u���������2x�������#����D9��d�<���;K�op�q|N)5,��c$O���J �C��Ehk��B�/#�N�o�u���.ZH����&��B�����1��5`gh:��g���W���]vE+-�������p|X.j��1��P<	�Z���D������@������y�!���������.	�`\�r�;����q��/�!� ~�?x��/`��CM�i����f�U����������1�w�?X�#*V����r���Y�X^o�#S�'�!7�,��oP���+v>kg*��������<��Q�N<�����*]�W^�����[�L���a�_J���M��1M��X�-������M�4>Kd��-$��lv���w���J�nf�3���>��n�	���O�"��Y�o[7E*[T���������*��@xH�[����$�<��ZX���9��	��i����X�c�X�KdJ~���8���-��+��o�7c}&�U{�~�T��!9��y��#~�i-�~L������T(�!�����t�����+k�\U�k�.�����
�S��61?�Gp��&*��tk�-����ow��]����(��x��W�����9�J��s���o��e�1�E����}�����'�����3�O7(��@���I�Q���.�J*G`��wPzk�� ��c�J5��US��4�`��m�Px�\}V���O���hm�E�@$h�6>�L�6hG5hgu�SH&!�LM�=��e�ca"������������'�����e�n�T���(����$!����<���<Zr���6�O=��h�
TLq����6����4��j�����-U$>�2o��,}HHh6c+��.�;X���SM��4�s8�n�����0��%G�N�����6���������H�_�h��f�vb����[��D�7�n�&�������_���g�KA3��J�����]dx�s�O�8`i"�"�?��d����p��t���}��(��M+��fsQQ�W�����]P5R�
y���<6jV���
��TA������z�	�Z��1fT�	�@�������5��Av�G�T��Q�x�����w�}	a�6\�(���J�����u���]�L�5i	B����r��,���E���/A���i���
Hw�8�\0|P�G���A�n����`zc�����4��(�������<f�N�A�������*����T�%~ema[|��
^�:��"4\�f���~M�c�LAB1F`am
��7
�B��{�q�[d�P��ky�p�� ��9sD]A%�iApT�y�P����������I�kR�Qy�������y3�U��Z����mg{����u!-����`�������_&T9z��F���Z>�$Oj,-|��{(�G@s��g�	�w�0��C�i�96�1�D�,��Wh���I�\bFcL��/�:T64��d�!��W�n|�f��=D��^0��_�3L�H�X���t(�x�����&����tm*ah��}B��.��\���g8�(m9>b���$��_���
%
��������
"��V.',������
�/l�����n|�cf���#�<m�\��`�/4������&��i�������W�%�{���|C�D�����#gF���]#�O������@����1�}��'R����������l]6�X�?��b��K.,��1z��#h��K�=y�lG��l����������.a$@.� �bd& ����P1�z��0X��-�&�<�Y]�`Gp��
�G:i��g��t��"P@b��MptW��p�"+X��`��XYS	r9�OF����U?�����G����[����ne��������"���y����J��e<�P�����a2�n�"�>��������������z����G�l����.f�U����p����=����1W�h0Eo�rvz~iz��me��4��[����~���M����T���N�M��<9��5G������T 2�q{�A�z��� �XT�*_g���`���%��)O����]�j'\����5M�X`�������^���#V5c��������/�67�I � cQ-?+��������m>���*��i���#Lddj.w��QJ����_.�r�A�z�������M��f����n���W����^����������E_����w�~o'�z}�o�v����u?�{���������Y�k���l3��37b�i������i�Je�$���`�zS6����o���Li��A��oxa�������������������Y��Zm��?#�h�Cr����z��d�A����vW��k���MK��IA���6�5J�c2yN!y.=��N'�:��\�
��x��;��������|U����z���w�i+��?�o�1�O��k�[l�-^oa�ED7�p�J�6�n�l������3|�F���x�����a:���������J�XPdz�k�����6T���	p[��l4�/�x����E{��[|g+��������G/���V����g8��Z�0,I�"���#������9(�
#56Amit Kapila
amit.kapila16@gmail.com
In reply to: Heikki Linnakangas (#55)
Re: pg_rewind in contrib

On Fri, Mar 13, 2015 at 1:13 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

The cause was a silly typo in truncate_target_file:

@@ -397,7 +397,7 @@ truncate_target_file(const char *path, off_t newsize)

snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target,

path);

-       fd = open(path, O_WRONLY, 0);
+       fd = open(dstpath, O_WRONLY, 0);
if (fd < 0)
pg_fatal("could not open file \"%s\" for truncation:

%s\n",

dstpath, strerror(errno));

Attached is a new version of the patch, including that fix, and rebased

over current git master.

I have verified that new patch has fixed the problem.

Few more observations:

Getting below linking error with Asserts enabled in Windows build.

1>xlogreader.obj : error LNK2019: unresolved external symbol
ExceptionalCondition referenced in function
XLogReadRecord
1>.\Debug\pg_rewind\pg_rewind.exe : fatal error LNK1120: 1 unresolved
externals

Am I doing anything wrong while building?

2.
msvc\clean.bat has below way to clean xlogreader.c for pg_xlogdump,
shouldn't something similar required for pg_rewind?

REM clean up files copied into contrib\pg_xlogdump
if exist contrib\pg_xlogdump\xlogreader.c del /q contrib
\pg_xlogdump\xlogreader.c
for %%f in (contrib\pg_xlogdump\*desc.c) do if not %%f==contrib\pg_xlogdump
\rmgrdesc.c del /q %%f

3.
+void
+remove_target(file_entry_t *entry)
{
..
+ case FILE_TYPE_REGULAR:
+ remove_target_symlink(entry->path);
+
break;
+
+ case FILE_TYPE_SYMLINK:
+ remove_target_file(entry-

path);

+ break;
}

It seems function calls *_symlink and *_file are reversed, in reality it
might not matter much because the code for both functions is same,
but still it might diverge in future.

4.
Copyright notice contains variation in terms of years

+ * Copyright (c) 2010-2015, PostgreSQL Global Development Group
+ * Copyright (c) 2013-2015, PostgreSQL Global Development Group

+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group

Is there any particular reason for the same?

5.
+ * relation files. Other forks are alwayes copied in toto, because we
cannot
+ * reliably track changes to
the, because WAL only contains block references
+ * for the main fork.
+ */
+static bool
+isRelDataFile(const
char *path)

Sentence near "track changes to the, .." looks incomplete.

6.
+libpqConnect(const char *connstr)
{
..
+ /*
+ * Also check that full_page-writes are enabled. We can get torn pages if
+ * a page is
modified while we read it with pg_read_binary_file(), and we
+ * rely on full page images to fix them.
+
 */
+ str = run_simple_query("SHOW full_page_writes");
+ if (strcmp(str, "on") != 0)
+
pg_fatal("full_page_writes must be enabled in the source server\n");
+ pg_free(str);
..
}

Do you think it is worth to mention this information in docs?

7.
Function execute_pagemap() exists in both copy_fetch.c and
libpq_fetch.c, are you expecting that they will get diverged
in future?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#57Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#56)
Re: pg_rewind in contrib

Amit Kapila wrote:

Getting below linking error with Asserts enabled in Windows build.

1>xlogreader.obj : error LNK2019: unresolved external symbol
ExceptionalCondition referenced in function
XLogReadRecord
1>.\Debug\pg_rewind\pg_rewind.exe : fatal error LNK1120: 1 unresolved
externals

Am I doing anything wrong while building?

Assert() gets defined in terms of ExceptionalCondition if building in
!FRONTEND, so it seems like the compile flags are wrong for pg_rewind's
copy of xlogreader.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58Amit Kapila
amit.kapila16@gmail.com
In reply to: Heikki Linnakangas (#53)
Re: pg_rewind in contrib

On Wed, Mar 11, 2015 at 2:23 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 03/11/2015 05:01 AM, Amit Kapila wrote:

Can't that happen if the source database (new-master) haven't
received all of the data from target database (old-master) at the
time of promotion?
If yes, then source database won't have WAL for truncation and
the way current mechanism works is must.

Now I think for such a case doing truncation in the target database
is the right solution,

Yeah, that can happen, and truncation is the correct fix for it. The

logic is pretty well explained by this comment in filemap.c:

*
* If it's the same size, do nothing here. Any locally
* modified blocks will be copied based on parsing the local
* WAL, and any remotely modified blocks will be updated after
* rewinding, when the remote WAL is replayed.
*/

What about unlogged table, how will they handle this particular case?

I think after old-master and new-master got diverged any operations
on unlogged table won't guarantee that we can get those modified
blocks from new-master during pg_rewind and I think it can lead
to a case where unlogged tables have some data from old-master
and some data from new master considering user always take of
clean shut-down.

Typo in patch:

+ * For our purposes, only files belonging to the main fork are considered
+ * relation files. Other forks are alwayes copied in toto, because we
cannot

/alwayes/always

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#59Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Kapila (#56)
Re: pg_rewind in contrib

On 03/14/2015 02:31 PM, Amit Kapila wrote:

Getting below linking error with Asserts enabled in Windows build.

1>xlogreader.obj : error LNK2019: unresolved external symbol
ExceptionalCondition referenced in function
XLogReadRecord
1>.\Debug\pg_rewind\pg_rewind.exe : fatal error LNK1120: 1 unresolved
externals

Am I doing anything wrong while building?

Works for me. Perhaps there were some changes to #includes that
inadvertently fixed it..

2.
msvc\clean.bat has below way to clean xlogreader.c for pg_xlogdump,
shouldn't something similar required for pg_rewind?

REM clean up files copied into contrib\pg_xlogdump
if exist contrib\pg_xlogdump\xlogreader.c del /q contrib
\pg_xlogdump\xlogreader.c
for %%f in (contrib\pg_xlogdump\*desc.c) do if not %%f==contrib\pg_xlogdump
\rmgrdesc.c del /q %%f
y.

I changed the way pg_xlogdump does that, and pg_rewind follows the new
example. (see /messages/by-id/550B14A5.7060708@iki.fi)

4.
Copyright notice contains variation in terms of years

+ * Copyright (c) 2010-2015, PostgreSQL Global Development Group
+ * Copyright (c) 2013-2015, PostgreSQL Global Development Group

+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group

Is there any particular reason for the same?

I've created many of the files by copying an old file and modifying
heavily. The copyright notices have been carried over from the original
files. Many of the files would still contain some of the original copied
code, while others might not. I'm not sure what the best way to deal
with that is - stamp everything as 2015, 2013-2015, or leave them as
they are. It doesn't really matter in practice.

5.
+ * relation files. Other forks are alwayes copied in toto, because we
cannot
+ * reliably track changes to
the, because WAL only contains block references
+ * for the main fork.
+ */
+static bool
+isRelDataFile(const
char *path)

Sentence near "track changes to the, .." looks incomplete.

Fixed, it was supposed to be "them", not "the".

6.
+libpqConnect(const char *connstr)
{
..
+ /*
+ * Also check that full_page-writes are enabled. We can get torn pages if
+ * a page is
modified while we read it with pg_read_binary_file(), and we
+ * rely on full page images to fix them.
+
*/
+ str = run_simple_query("SHOW full_page_writes");
+ if (strcmp(str, "on") != 0)
+
pg_fatal("full_page_writes must be enabled in the source server\n");
+ pg_free(str);
..
}

Do you think it is worth to mention this information in docs?

Added.

7.
Function execute_pagemap() exists in both copy_fetch.c and
libpq_fetch.c, are you expecting that they will get diverged
in future?

They look pretty much identical, but the copy_file_range functions they
call are in fact separate functions, and differ a lot. I have renamed
the libpq version to avoid confusion.

I have committed this, with some more kibitzing. hope I have not missed
any comments given so far. Many thanks for the review, and please
continue reviewing and testing it :-).

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60Venkata Balaji N
nag1010@gmail.com
In reply to: Heikki Linnakangas (#59)
Re: pg_rewind in contrib

I have committed this, with some more kibitzing. hope I have not missed
any comments given so far. Many thanks for the review, and please continue
reviewing and testing it :-).

I have been testing the pg_rewind and have an analysis to share along with
few questions -

I had a streaming replication setup with one master and one slave running
successfully.

Test 1 :

- Killed postgres process on master and promoted slave. Both were in sync
earlier.
- performed some operations (data changes) on newly promoted slave node and
shutdown
- Executed pg_rewind on old master and got the below message

*target server must be shut down cleanly*
*Failure, exiting*

If the master is crashed or killed abruptly, it may not be possible to do a
rewind. Is my understanding correct ?

Test 2 :

- On a successfully running streaming replication with one master and one
slave, i did a clean shutdown of master
- promoted slave
- performed some operations (data changes) on newly promoted slave and did
a clean shutdown
- Executed pg_rewind on the old master to sync with the latest changes on
new master. I got the below message

*The servers diverged at WAL position 0/A2000098 on timeline 1.*
*No rewind required.*

I am not getting this too.

Regards,
Venkata Balaji N

#61Michael Paquier
michael.paquier@gmail.com
In reply to: Venkata Balaji N (#60)
Re: pg_rewind in contrib

On Thu, Mar 26, 2015 at 12:23 PM, Venkata Balaji N <nag1010@gmail.com> wrote:

Test 1 :

[...]

If the master is crashed or killed abruptly, it may not be possible to do a
rewind. Is my understanding correct ?

Yep. This is mentioned in the documentation:
http://www.postgresql.org/docs/devel/static/app-pgrewind.html
"The target server must shut down cleanly before running pg_rewind".

Test 2 :

- On a successfully running streaming replication with one master and one
slave, i did a clean shutdown of master
- promoted slave
- performed some operations (data changes) on newly promoted slave and did a
clean shutdown
- Executed pg_rewind on the old master to sync with the latest changes on
new master. I got the below message

The servers diverged at WAL position 0/A2000098 on timeline 1.
No rewind required.

I am not getting this too.

In this case the master WAL visibly did not diverge from the slave WAL
line. A rewind is done if the master touches new relation pages after
the standby has been promoted, and before the master is shutdown.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62Vladimir Borodin
root@simply.name
In reply to: Michael Paquier (#61)
Re: pg_rewind in contrib

26 марта 2015 г., в 7:32, Michael Paquier <michael.paquier@gmail.com> написал(а):

On Thu, Mar 26, 2015 at 12:23 PM, Venkata Balaji N <nag1010@gmail.com> wrote:

Test 1 :

[...]

If the master is crashed or killed abruptly, it may not be possible to do a
rewind. Is my understanding correct ?

Yep. This is mentioned in the documentation:
http://www.postgresql.org/docs/devel/static/app-pgrewind.html
"The target server must shut down cleanly before running pg_rewind».

You can start old master, wait for crash recovery to complete, stop it cleanly and then use pg_rewind. It works.

Test 2 :

- On a successfully running streaming replication with one master and one
slave, i did a clean shutdown of master
- promoted slave
- performed some operations (data changes) on newly promoted slave and did a
clean shutdown
- Executed pg_rewind on the old master to sync with the latest changes on
new master. I got the below message

The servers diverged at WAL position 0/A2000098 on timeline 1.
No rewind required.

I am not getting this too.

In this case the master WAL visibly did not diverge from the slave WAL
line. A rewind is done if the master touches new relation pages after
the standby has been promoted, and before the master is shutdown.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Да пребудет с вами сила…
https://simply.name/ru

#63Arthur Silva
arthurprs@gmail.com
In reply to: Vladimir Borodin (#62)
Re: pg_rewind in contrib

On Mar 26, 2015 4:20 AM, "Vladimir Borodin" <root@simply.name> wrote:

26 марта 2015 г., в 7:32, Michael Paquier <michael.paquier@gmail.com>

написал(а):

On Thu, Mar 26, 2015 at 12:23 PM, Venkata Balaji N <nag1010@gmail.com>

wrote:

Test 1 :

[...]

If the master is crashed or killed abruptly, it may not be possible to

do a

rewind. Is my understanding correct ?

Yep. This is mentioned in the documentation:
http://www.postgresql.org/docs/devel/static/app-pgrewind.html
"The target server must shut down cleanly before running pg_rewind>>.

You can start old master, wait for crash recovery to complete, stop it

cleanly and then use pg_rewind. It works.

Shouldn't we have a flag so it does that automatically if necessary?

Test 2 :

- On a successfully running streaming replication with one master and

one

slave, i did a clean shutdown of master
- promoted slave
- performed some operations (data changes) on newly promoted slave and

did a

clean shutdown
- Executed pg_rewind on the old master to sync with the latest changes

on

Show quoted text

new master. I got the below message

The servers diverged at WAL position 0/A2000098 on timeline 1.
No rewind required.

I am not getting this too.

In this case the master WAL visibly did not diverge from the slave WAL
line. A rewind is done if the master touches new relation pages after
the standby has been promoted, and before the master is shutdown.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Да пребудет с вами сила...
https://simply.name/ru

#64Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Arthur Silva (#63)
Re: pg_rewind in contrib

On 03/26/2015 09:45 PM, Arthur Silva wrote:

On Mar 26, 2015 4:20 AM, "Vladimir Borodin" <root@simply.name> wrote:

26 О©╫О©╫О©╫О©╫О©╫ 2015 О©╫., О©╫ 7:32, Michael Paquier <michael.paquier@gmail.com>

О©╫О©╫О©╫О©╫О©╫О©╫О©╫(О©╫):

On Thu, Mar 26, 2015 at 12:23 PM, Venkata Balaji N <nag1010@gmail.com>

wrote:

If the master is crashed or killed abruptly, it may not be possible to

do a

rewind. Is my understanding correct ?

Yep. This is mentioned in the documentation:
http://www.postgresql.org/docs/devel/static/app-pgrewind.html
"The target server must shut down cleanly before running pg_rewind>>.

You can start old master, wait for crash recovery to complete, stop it

cleanly and then use pg_rewind. It works.

Shouldn't we have a flag so it does that automatically if necessary?

Might be handy, but currently pg_rewind never invokes postgres or other
binaries, so it would be a fair amount of new functionality to implement
that. Let's keep it simple for now.
- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers