logical changeset generation v6
Hi!
Attached you can find the newest version of the logical changeset
generation patchset. Reduced by a couple of patches because the have
been committed last round. Hurray! and thanks.
The explanation of how to use the patch from last time:
http://archives.postgresql.org/message-id/20130614224817.GA19641%40awork2.anarazel.de
still holds true, so I am not going to repeat it here.
The individual patches are:
0001 wal_decoding: Allow walsender's to connect to a specific database
One logical decoding operation can only decode content from one
database at a time. Because of that the walsender needs to connect
to a specific database. The earlier "replication=on/off" parameter
now also has a valid parameter "database" which allows that.
0002 wal_decoding: Log xl_running_xact's at a higher frequency than checkpoints are done
Imo relatively unproblematic and even useful without changeset extraction.
0003 wal_decoding: Add information about a tables primary key to struct RelationData
Not much comments on this in the past. Kevin thinks we might want to
choose the best candidate key in a more elaborate manner.
0004 wal_decoding: Introduce wal decoding via catalog timetravel
The actual feature. Got cleaned up and shrunk since the last submission.
0005 wal_decoding: test_decoding: Add a simple decoding module in contrib
Example output plugin that's also used for testing.
0006 wal_decoding: pg_receivellog: Introduce pg_receivexlog equivalent for logical changes
Commandline utility to receive the changestream and manipulate slots.
0007 wal_decoding: test_logical_decoding: Add extension for easier testing of logical decoding
Allows to not only create and destroy logical slots which is part of
0005, but also receive the changestream via an SQL SRF.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attachments:
0001-Improve-regression-test-for-8410.patchtext/x-patch; charset=us-asciiDownload
>From 14f521d9e2e9efde8b19a1664b2cf2056a2e9520 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 31 Aug 2013 01:54:05 +0200
Subject: [PATCH] Improve regression test for #8410
The previous version of the query disregarded the result of the MergeAppend
instead of checking its results.
---
src/test/regress/expected/inherit.out | 49 +++++++++++++++++------------------
src/test/regress/sql/inherit.sql | 16 ++++++------
2 files changed, 32 insertions(+), 33 deletions(-)
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 8520281..a2ef7ef 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1353,42 +1353,41 @@ ORDER BY x, y;
-- exercise rescan code path via a repeatedly-evaluated subquery
explain (costs off)
SELECT
- (SELECT g.i FROM (
- (SELECT * FROM generate_series(1, 2) ORDER BY 1)
+ ARRAY(SELECT f.i FROM (
+ (SELECT d + g.i FROM generate_series(4, 30, 3) d ORDER BY 1)
UNION ALL
- (SELECT * FROM generate_series(1, 2) ORDER BY 1)
+ (SELECT d + g.i FROM generate_series(0, 30, 5) d ORDER BY 1)
) f(i)
- ORDER BY f.i LIMIT 1)
+ ORDER BY f.i LIMIT 10)
FROM generate_series(1, 3) g(i);
- QUERY PLAN
-------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------
Function Scan on generate_series g
SubPlan 1
-> Limit
- -> Result
- -> Merge Append
- Sort Key: generate_series.generate_series
- -> Sort
- Sort Key: generate_series.generate_series
- -> Function Scan on generate_series
- -> Sort
- Sort Key: generate_series_1.generate_series
- -> Function Scan on generate_series generate_series_1
-(12 rows)
+ -> Merge Append
+ Sort Key: ((d.d + g.i))
+ -> Sort
+ Sort Key: ((d.d + g.i))
+ -> Function Scan on generate_series d
+ -> Sort
+ Sort Key: ((d_1.d + g.i))
+ -> Function Scan on generate_series d_1
+(11 rows)
SELECT
- (SELECT g.i FROM (
- (SELECT * FROM generate_series(1, 2) ORDER BY 1)
+ ARRAY(SELECT f.i FROM (
+ (SELECT d + g.i FROM generate_series(4, 30, 3) d ORDER BY 1)
UNION ALL
- (SELECT * FROM generate_series(1, 2) ORDER BY 1)
+ (SELECT d + g.i FROM generate_series(0, 30, 5) d ORDER BY 1)
) f(i)
- ORDER BY f.i LIMIT 1)
+ ORDER BY f.i LIMIT 10)
FROM generate_series(1, 3) g(i);
- i
----
- 1
- 2
- 3
+ array
+------------------------------
+ {1,5,6,8,11,11,14,16,17,20}
+ {2,6,7,9,12,12,15,17,18,21}
+ {3,7,8,10,13,13,16,18,19,22}
(3 rows)
reset enable_seqscan;
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index e88a584..8637655 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -442,21 +442,21 @@ ORDER BY x, y;
-- exercise rescan code path via a repeatedly-evaluated subquery
explain (costs off)
SELECT
- (SELECT g.i FROM (
- (SELECT * FROM generate_series(1, 2) ORDER BY 1)
+ ARRAY(SELECT f.i FROM (
+ (SELECT d + g.i FROM generate_series(4, 30, 3) d ORDER BY 1)
UNION ALL
- (SELECT * FROM generate_series(1, 2) ORDER BY 1)
+ (SELECT d + g.i FROM generate_series(0, 30, 5) d ORDER BY 1)
) f(i)
- ORDER BY f.i LIMIT 1)
+ ORDER BY f.i LIMIT 10)
FROM generate_series(1, 3) g(i);
SELECT
- (SELECT g.i FROM (
- (SELECT * FROM generate_series(1, 2) ORDER BY 1)
+ ARRAY(SELECT f.i FROM (
+ (SELECT d + g.i FROM generate_series(4, 30, 3) d ORDER BY 1)
UNION ALL
- (SELECT * FROM generate_series(1, 2) ORDER BY 1)
+ (SELECT d + g.i FROM generate_series(0, 30, 5) d ORDER BY 1)
) f(i)
- ORDER BY f.i LIMIT 1)
+ ORDER BY f.i LIMIT 10)
FROM generate_series(1, 3) g(i);
reset enable_seqscan;
--
1.8.2.rc2.4.g7799588.dirty
0001-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patchtext/x-patch; charset=us-asciiDownload
>From 078dcdd696604801c898decbe478e3c99fe257a6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 1/8] wal_decoding: Allow walsender's to connect to a specific
database
Extend the existing 'replication' parameter to not only allow a boolean value
but also "database". If the latter is specified we connect to the database
specified in 'dbname'.
This is useful for future walsender commands which need database interaction,
e.g. changeset extraction.
---
doc/src/sgml/protocol.sgml | 24 +++++++++---
src/backend/postmaster/postmaster.c | 23 ++++++++++--
.../libpqwalreceiver/libpqwalreceiver.c | 4 +-
src/backend/replication/walsender.c | 43 +++++++++++++++++++---
src/backend/utils/init/postinit.c | 5 +++
src/bin/pg_basebackup/pg_basebackup.c | 4 +-
src/bin/pg_basebackup/pg_receivexlog.c | 4 +-
src/bin/pg_basebackup/receivelog.c | 4 +-
src/include/replication/walsender.h | 1 +
9 files changed, 89 insertions(+), 23 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0b2e60e..2ea14e5 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1301,10 +1301,13 @@
<para>
To initiate streaming replication, the frontend sends the
-<literal>replication</> parameter in the startup message. This tells the
-backend to go into walsender mode, wherein a small set of replication commands
-can be issued instead of SQL statements. Only the simple query protocol can be
-used in walsender mode.
+<literal>replication</> parameter in the startup message. A boolean value
+of <literal>true</> tells the backend to go into walsender mode, wherein a
+small set of replication commands can be issued instead of SQL statements. Only
+the simple query protocol can be used in walsender mode.
+Passing a <literal>database</> as the value instructs walsender to connect to
+the database specified in the <literal>dbname</> paramter which will in future
+allow some additional commands to the ones specified below to be run.
The commands accepted in walsender mode are:
@@ -1314,7 +1317,7 @@ The commands accepted in walsender mode are:
<listitem>
<para>
Requests the server to identify itself. Server replies with a result
- set of a single row, containing three fields:
+ set of a single row, containing four fields:
</para>
<para>
@@ -1356,6 +1359,17 @@ The commands accepted in walsender mode are:
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ dbname
+ </term>
+ <listitem>
+ <para>
+ Database connected to or NULL.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
</listitem>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 01d2618..a31b01d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1894,10 +1894,21 @@ retry1:
port->cmdline_options = pstrdup(valptr);
else if (strcmp(nameptr, "replication") == 0)
{
- if (!parse_bool(valptr, &am_walsender))
+ /*
+ * Due to backward compatibility concerns replication is a
+ * bybrid beast which allows the value to be either a boolean
+ * or the string 'database'. The latter connects to a specific
+ * database which is e.g. required for changeset extraction.
+ */
+ if (strcmp(valptr, "database") == 0)
+ {
+ am_walsender = true;
+ am_db_walsender = true;
+ }
+ else if (!parse_bool(valptr, &am_walsender))
ereport(FATAL,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid value for boolean option \"replication\"")));
+ errmsg("invalid value for option \"replication\", legal values are false, 0, true, 1 or database")));
}
else
{
@@ -1983,8 +1994,12 @@ retry1:
if (strlen(port->user_name) >= NAMEDATALEN)
port->user_name[NAMEDATALEN - 1] = '\0';
- /* Walsender is not related to a particular database */
- if (am_walsender)
+ /*
+ * Generic walsender, e.g. for streaming replication, is not connected to a
+ * particular database. But walsenders used for logical replication need to
+ * connect to a specific database.
+ */
+ if (am_walsender && !am_db_walsender)
port->database_name[0] = '\0';
/*
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6bc0aa1..ee0f1fe 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -130,7 +130,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
"the primary server: %s",
PQerrorMessage(streamConn))));
}
- if (PQnfields(res) != 3 || PQntuples(res) != 1)
+ if (PQnfields(res) != 4 || PQntuples(res) != 1)
{
int ntuples = PQntuples(res);
int nfields = PQnfields(res);
@@ -138,7 +138,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
PQclear(res);
ereport(ERROR,
(errmsg("invalid response from primary server"),
- errdetail("Expected 1 tuple with 3 fields, got %d tuples with %d fields.",
+ errdetail("Expected 1 tuple with 4 fields, got %d tuples with %d fields.",
ntuples, nfields)));
}
primary_sysid = PQgetvalue(res, 0, 0);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index afd559d..b00a91a 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -46,7 +46,10 @@
#include "access/timeline.h"
#include "access/transam.h"
#include "access/xlog_internal.h"
+#include "access/xact.h"
+
#include "catalog/pg_type.h"
+#include "commands/dbcommands.h"
#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -89,9 +92,10 @@ WalSndCtlData *WalSndCtl = NULL;
WalSnd *MyWalSnd = NULL;
/* Global state */
-bool am_walsender = false; /* Am I a walsender process ? */
+bool am_walsender = false; /* Am I a walsender process? */
bool am_cascading_walsender = false; /* Am I cascading WAL to
- * another standby ? */
+ * another standby? */
+bool am_db_walsender = false; /* connect to database? */
/* User-settable parameters for walsender */
int max_wal_senders = 0; /* the maximum number of concurrent walsenders */
@@ -243,10 +247,12 @@ IdentifySystem(void)
char tli[11];
char xpos[MAXFNAMELEN];
XLogRecPtr logptr;
+ char* dbname = NULL;
/*
- * Reply with a result set with one row, three columns. First col is
- * system ID, second is timeline ID, and third is current xlog location.
+ * Reply with a result set with one row, four columns. First col is system
+ * ID, second is timeline ID, third is current xlog location and the fourth
+ * contains the database name if we are connected to one.
*/
snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
@@ -265,9 +271,23 @@ IdentifySystem(void)
snprintf(xpos, sizeof(xpos), "%X/%X", (uint32) (logptr >> 32), (uint32) logptr);
+ if (MyDatabaseId != InvalidOid)
+ {
+ MemoryContext cur = CurrentMemoryContext;
+
+ /* syscache access needs a transaction env. */
+ StartTransactionCommand();
+ /* make dbname live outside TX context */
+ MemoryContextSwitchTo(cur);
+ dbname = get_database_name(MyDatabaseId);
+ CommitTransactionCommand();
+ /* CommitTransactionCommand switches to TopMemoryContext */
+ MemoryContextSwitchTo(cur);
+ }
+
/* Send a RowDescription message */
pq_beginmessage(&buf, 'T');
- pq_sendint(&buf, 3, 2); /* 3 fields */
+ pq_sendint(&buf, 4, 2); /* 4 fields */
/* first field */
pq_sendstring(&buf, "systemid"); /* col name */
@@ -295,17 +315,28 @@ IdentifySystem(void)
pq_sendint(&buf, -1, 2);
pq_sendint(&buf, 0, 4);
pq_sendint(&buf, 0, 2);
+
+ /* fourth field */
+ pq_sendstring(&buf, "dbname");
+ pq_sendint(&buf, 0, 4);
+ pq_sendint(&buf, 0, 2);
+ pq_sendint(&buf, TEXTOID, 4);
+ pq_sendint(&buf, -1, 2);
+ pq_sendint(&buf, 0, 4);
+ pq_sendint(&buf, 0, 2);
pq_endmessage(&buf);
/* Send a DataRow message */
pq_beginmessage(&buf, 'D');
- pq_sendint(&buf, 3, 2); /* # of columns */
+ pq_sendint(&buf, 4, 2); /* # of columns */
pq_sendint(&buf, strlen(sysid), 4); /* col1 len */
pq_sendbytes(&buf, (char *) &sysid, strlen(sysid));
pq_sendint(&buf, strlen(tli), 4); /* col2 len */
pq_sendbytes(&buf, (char *) tli, strlen(tli));
pq_sendint(&buf, strlen(xpos), 4); /* col3 len */
pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
+ pq_sendint(&buf, strlen(dbname), 4); /* col4 len */
+ pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
pq_endmessage(&buf);
}
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 2c7f0f1..56c352c 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -725,7 +725,12 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
ereport(FATAL,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser or replication role to start walsender")));
+ }
+ if (am_walsender &&
+ (in_dbname == NULL || in_dbname[0] == '\0') &&
+ dboid == InvalidOid)
+ {
/* process any options passed in the startup packet */
if (MyProcPort != NULL)
process_startup_options(MyProcPort, am_superuser);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a1e12a8..89e2376 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1361,11 +1361,11 @@ BaseBackup(void)
progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
disconnect_and_exit(1);
}
- if (PQntuples(res) != 1 || PQnfields(res) != 3)
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
{
fprintf(stderr,
_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
- progname, PQntuples(res), PQnfields(res), 1, 3);
+ progname, PQntuples(res), PQnfields(res), 1, 4);
disconnect_and_exit(1);
}
sysidentifier = pg_strdup(PQgetvalue(res, 0, 0));
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 787a395..fe8aef6 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -252,11 +252,11 @@ StreamLog(void)
progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
disconnect_and_exit(1);
}
- if (PQntuples(res) != 1 || PQnfields(res) != 3)
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
{
fprintf(stderr,
_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
- progname, PQntuples(res), PQnfields(res), 1, 3);
+ progname, PQntuples(res), PQnfields(res), 1, 4);
disconnect_and_exit(1);
}
servertli = atoi(PQgetvalue(res, 0, 1));
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index d56a4d7..22a5340 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -534,11 +534,11 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
PQclear(res);
return false;
}
- if (PQnfields(res) != 3 || PQntuples(res) != 1)
+ if (PQnfields(res) != 4 || PQntuples(res) != 1)
{
fprintf(stderr,
_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
- progname, PQntuples(res), PQnfields(res), 1, 3);
+ progname, PQntuples(res), PQnfields(res), 1, 4);
PQclear(res);
return false;
}
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2cc7ddf..5097235 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -19,6 +19,7 @@
/* global state */
extern bool am_walsender;
extern bool am_cascading_walsender;
+extern bool am_db_walsender;
extern bool wake_wal_senders;
/* user-settable parameters */
--
1.8.4.21.g992c386.dirty
0002-wal_decoding-Log-xl_running_xact-s-at-a-higher-frequ.patchtext/x-patch; charset=us-asciiDownload
>From 9cd917e256159c1947aa56afa20b89cd1783c256 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 2/8] wal_decoding: Log xl_running_xact's at a higher frequency
than checkpoints are done
Logging information about running xacts more frequently is beneficial for both,
hot standby which can reach consistency faster and release some resources
earlier using this information, and future logical replication which can
initialize quicker using this.
Do so in the background writer which seems to be the best choice as its
regularly running and shouldn't be busy for too long without getting back into
its main loop.
Also mark xl_running_xact records as being relevant for async commit so the wal
writer writes them out soonish instead of possibly waiting a long time.
---
src/backend/postmaster/bgwriter.c | 62 +++++++++++++++++++++++++++++++++++++++
src/backend/storage/ipc/standby.c | 27 ++++++++++++++---
src/include/storage/standby.h | 2 +-
3 files changed, 86 insertions(+), 5 deletions(-)
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 286ae86..13d57c5 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -54,9 +54,11 @@
#include "storage/shmem.h"
#include "storage/smgr.h"
#include "storage/spin.h"
+#include "storage/standby.h"
#include "utils/guc.h"
#include "utils/memutils.h"
#include "utils/resowner.h"
+#include "utils/timestamp.h"
/*
@@ -71,6 +73,20 @@ int BgWriterDelay = 200;
#define HIBERNATE_FACTOR 50
/*
+ * Interval in which standby snapshots are logged into the WAL stream, in
+ * milliseconds.
+ */
+#define LOG_SNAPSHOT_INTERVAL_MS 15000
+
+/*
+ * LSN and timestamp at which we last issued a LogStandbySnapshot(), to avoid
+ * doing so too often or repeatedly if there has been no other write activity
+ * in the system.
+ */
+static TimestampTz last_snapshot_ts;
+static XLogRecPtr last_snapshot_lsn = InvalidXLogRecPtr;
+
+/*
* Flags set by interrupt handlers for later service in the main loop.
*/
static volatile sig_atomic_t got_SIGHUP = false;
@@ -142,6 +158,12 @@ BackgroundWriterMain(void)
CurrentResourceOwner = ResourceOwnerCreate(NULL, "Background Writer");
/*
+ * We just started, assume there has been either a shutdown or
+ * end-of-recovery snapshot.
+ */
+ last_snapshot_ts = GetCurrentTimestamp();
+
+ /*
* Create a memory context that we will do all our work in. We do this so
* that we can reset the context during error recovery and thereby avoid
* possible memory leaks. Formerly this code just ran in
@@ -276,6 +298,46 @@ BackgroundWriterMain(void)
}
/*
+ * Log a new xl_running_xacts every now and then so replication can get
+ * into a consistent state faster (think of suboverflowed snapshots)
+ * and clean up resources (locks, KnownXids*) more frequently. The
+ * costs of this are relatively low, so doing it 4 times
+ * (LOG_SNAPSHOT_INTERVAL_MS) a minute seems fine.
+ *
+ * We assume the interval for writing xl_running_xacts is
+ * significantly bigger than BgWriterDelay, so we don't complicate the
+ * overall timeout handling but just assume we're going to get called
+ * often enough even if hibernation mode is active. It's not that
+ * important that log_snap_interval_ms is met strictly. To make sure
+ * we're not waking the disk up unneccesarily on an idle system we
+ * check whether there has been any WAL inserted since the last time
+ * we've logged a running xacts.
+ *
+ * We do this logging in the bgwriter as its the only process thats
+ * run regularly and returns to its mainloop all the
+ * time. E.g. Checkpointer, when active, is barely ever in its
+ * mainloop and thus makes it hard to log regularly.
+ */
+ if (XLogStandbyInfoActive() && !RecoveryInProgress())
+ {
+ TimestampTz timeout = 0;
+ TimestampTz now = GetCurrentTimestamp();
+ timeout = TimestampTzPlusMilliseconds(last_snapshot_ts,
+ LOG_SNAPSHOT_INTERVAL_MS);
+
+ /*
+ * only log if enough time has passed and some xlog record has been
+ * inserted.
+ */
+ if (now >= timeout &&
+ last_snapshot_lsn != GetXLogInsertRecPtr())
+ {
+ last_snapshot_lsn = LogStandbySnapshot();
+ last_snapshot_ts = now;
+ }
+ }
+
+ /*
* Sleep until we are signaled or BgWriterDelay has elapsed.
*
* Note: the feedback control loop in BgBufferSync() expects that we
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index c704412..97da1a0 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -42,7 +42,7 @@ static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlis
ProcSignalReason reason);
static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
-static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
+static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
@@ -853,10 +853,13 @@ standby_redo(XLogRecPtr lsn, XLogRecord *record)
* currently running xids, performed by StandbyReleaseOldLocks().
* Zero xids should no longer be possible, but we may be replaying WAL
* from a time when they were possible.
+ *
+ * Returns the RecPtr of the last inserted record.
*/
-void
+XLogRecPtr
LogStandbySnapshot(void)
{
+ XLogRecPtr recptr;
RunningTransactions running;
xl_standby_lock *locks;
int nlocks;
@@ -876,9 +879,12 @@ LogStandbySnapshot(void)
* record we write, because standby will open up when it sees this.
*/
running = GetRunningTransactionData();
- LogCurrentRunningXacts(running);
+ recptr = LogCurrentRunningXacts(running);
+
/* GetRunningTransactionData() acquired XidGenLock, we must release it */
LWLockRelease(XidGenLock);
+
+ return recptr;
}
/*
@@ -889,7 +895,7 @@ LogStandbySnapshot(void)
* is a contiguous chunk of memory and never exists fully until it is
* assembled in WAL.
*/
-static void
+static XLogRecPtr
LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
{
xl_running_xacts xlrec;
@@ -939,6 +945,19 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
CurrRunningXacts->oldestRunningXid,
CurrRunningXacts->latestCompletedXid,
CurrRunningXacts->nextXid);
+
+ /*
+ * Ensure running_xacts information is synced to disk not too far in the
+ * future. We don't want to stall anything though (i.e. use XLogFlush()),
+ * so we let the wal writer do it during normal
+ * operation. XLogSetAsyncXactLSN() conveniently will mark the LSN as
+ * to-be-synced and nudge the WALWriter into action if sleeping. Check
+ * XLogBackgroundFlush() for details why a record might not be flushed
+ * without it.
+ */
+ XLogSetAsyncXactLSN(recptr);
+
+ return recptr;
}
/*
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 7f3f051..d4a8fe4 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -113,6 +113,6 @@ typedef RunningTransactionsData *RunningTransactions;
extern void LogAccessExclusiveLock(Oid dbOid, Oid relOid);
extern void LogAccessExclusiveLockPrepare(void);
-extern void LogStandbySnapshot(void);
+extern XLogRecPtr LogStandbySnapshot(void);
#endif /* STANDBY_H */
--
1.8.4.21.g992c386.dirty
0003-wal_decoding-Add-information-about-a-tables-primary-.patchtext/x-patch; charset=us-asciiDownload
>From 03866bd4e623c8b28deba62288054ab445400b98 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 3/8] wal_decoding: Add information about a tables primary key
to struct RelationData
'rd_primary' now contains the Oid of an index over uniquely identifying
columns. Several types of indexes are interesting and are collected in that
order:
* Primary Key
* oid index
* the first (OID order) unique, immediate, non-partial and
non-expression index over one or more NOT NULL'ed columns
To gather rd_primary value RelationGetIndexList() needs to have been called.
This is helpful because for logical replication we frequently - on the sending
and receiving side - need to lookup that index and RelationGetIndexList already
gathers all the necessary information.
This could be used to replace tablecmd.c's transformFkeyGetPrimaryKey, but
would change the meaning of that, so it seems to require additional discussion.
---
src/backend/utils/cache/relcache.c | 52 +++++++++++++++++++++++++++++++++++---
src/include/utils/rel.h | 12 +++++++++
2 files changed, 61 insertions(+), 3 deletions(-)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index b4cc6ad..44dd0d2 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3462,7 +3462,9 @@ RelationGetIndexList(Relation relation)
ScanKeyData skey;
HeapTuple htup;
List *result;
- Oid oidIndex;
+ Oid oidIndex = InvalidOid;
+ Oid pkeyIndex = InvalidOid;
+ Oid candidateIndex = InvalidOid;
MemoryContext oldcxt;
/* Quick exit if we already computed the list. */
@@ -3519,17 +3521,61 @@ RelationGetIndexList(Relation relation)
Assert(!isnull);
indclass = (oidvector *) DatumGetPointer(indclassDatum);
+ if (!IndexIsValid(index))
+ continue;
+
/* Check to see if it is a unique, non-partial btree index on OID */
- if (IndexIsValid(index) &&
- index->indnatts == 1 &&
+ if (index->indnatts == 1 &&
index->indisunique && index->indimmediate &&
index->indkey.values[0] == ObjectIdAttributeNumber &&
indclass->values[0] == OID_BTREE_OPS_OID &&
heap_attisnull(htup, Anum_pg_index_indpred))
oidIndex = index->indexrelid;
+
+ if (index->indisunique &&
+ index->indimmediate &&
+ heap_attisnull(htup, Anum_pg_index_indpred))
+ {
+ /* always prefer primary keys */
+ if (index->indisprimary)
+ pkeyIndex = index->indexrelid;
+ else if (!OidIsValid(pkeyIndex)
+ && !OidIsValid(oidIndex)
+ && !OidIsValid(candidateIndex))
+ {
+ int key;
+ bool found = true;
+ for (key = 0; key < index->indnatts; key++)
+ {
+ int16 attno = index->indkey.values[key];
+ Form_pg_attribute attr;
+ /* internal column, like oid */
+ if (attno <= 0)
+ continue;
+
+ attr = relation->rd_att->attrs[attno - 1];
+ if (!attr->attnotnull)
+ {
+ found = false;
+ break;
+ }
+ }
+ if (found)
+ candidateIndex = index->indexrelid;
+ }
+ }
}
systable_endscan(indscan);
+
+ if (OidIsValid(pkeyIndex))
+ relation->rd_primary = pkeyIndex;
+ /* prefer oid indexes over normal candidate ones */
+ else if (OidIsValid(oidIndex))
+ relation->rd_primary = oidIndex;
+ else if (OidIsValid(candidateIndex))
+ relation->rd_primary = candidateIndex;
+
heap_close(indrel, AccessShareLock);
/* Now save a copy of the completed list in the relcache entry. */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 589c9a8..0281b4b 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -111,6 +111,18 @@ typedef struct RelationData
TriggerDesc *trigdesc; /* Trigger info, or NULL if rel has none */
/*
+ * The 'best' primary or candidate key that has been found, only set
+ * correctly if RelationGetIndexList has been called/rd_indexvalid > 0.
+ *
+ * Indexes are chosen in the following order:
+ * * Primary Key
+ * * oid index
+ * * the first (OID order) unique, immediate, non-partial and
+ * non-expression index over one or more NOT NULL'ed columns
+ */
+ Oid rd_primary;
+
+ /*
* rd_options is set whenever rd_rel is loaded into the relcache entry.
* Note that you can NOT look into rd_rel for this data. NULL means "use
* defaults".
--
1.8.4.21.g992c386.dirty
0004-wal_decoding-Introduce-wal-decoding-via-catalog-time.patchtext/x-patch; charset=us-asciiDownload
>From 848b60044c7ed27f205abbf67ed2389f6827934c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 4/8] wal_decoding: Introduce wal decoding via catalog
timetravel
This introduces several things:
* 'reorderbuffer' module which reassembles transactions from a stream of interspersed changes
* 'snapbuilder' which builds catalog snapshots so that tuples from wal can be understood
* logging more data into wal to facilitate logical decoding
* wal decoding into an reorderbuffer
* shared library output plugins with 5 callbacks
* init
* begin
* change
* commit
* walsender infrastructur to stream out changes and to keep the global xmin low enough
* INIT_LOGICAL_REPLICATION $plugin; waits till a consistent snapshot is built and returns
* initial LSN
* replication slot identifier
* id of a pg_export() style snapshot
* START_LOGICAL_REPLICATION $id $lsn; streams out changes
* uses named output plugins for output specification
Todo:
* better integrated testing infrastructure
* more docs about the internals
Lowlevel:
* resource owner handling is suboptimal
* invalidations from uninteresting transactions (e.g. from other databases, old ones)
need to be processed anyway
* error handling in walsender is suboptimal
* pg_receivellog needs to send a reply immediately when postgres is shutting down
Input, Testing and Review by:
Heikki Linnakangas
Kevin Grittner
Michael Paquier
Abhijit Menon-Sen
Peter Gheogegan
Robert Haas
Simon Riggs
Steve Singer
Code By:
Andres Freund
With code contributions by:
Abhijit Menon-Sen
Craig Ringer
Alvaro Herrera
Conflicts:
src/backend/replication/repl_gram.y
---
src/backend/access/common/reloptions.c | 10 +
src/backend/access/heap/heapam.c | 465 ++++-
src/backend/access/heap/pruneheap.c | 2 +
src/backend/access/index/indexam.c | 14 +-
src/backend/access/rmgrdesc/heapdesc.c | 9 +
src/backend/access/rmgrdesc/xlogdesc.c | 1 +
src/backend/access/transam/twophase.c | 4 +-
src/backend/access/transam/xact.c | 48 +-
src/backend/access/transam/xlog.c | 14 +-
src/backend/catalog/catalog.c | 14 +-
src/backend/catalog/index.c | 15 +-
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/analyze.c | 2 +-
src/backend/commands/cluster.c | 2 +
src/backend/commands/trigger.c | 3 +-
src/backend/commands/vacuum.c | 5 +-
src/backend/commands/vacuumlazy.c | 3 +
src/backend/postmaster/postmaster.c | 2 +-
src/backend/replication/Makefile | 2 +
src/backend/replication/logical/Makefile | 19 +
src/backend/replication/logical/decode.c | 687 ++++++
src/backend/replication/logical/logical.c | 1046 ++++++++++
src/backend/replication/logical/logicalfuncs.c | 361 ++++
src/backend/replication/logical/reorderbuffer.c | 2548 +++++++++++++++++++++++
src/backend/replication/logical/snapbuild.c | 1581 ++++++++++++++
src/backend/replication/repl_gram.y | 75 +-
src/backend/replication/repl_scanner.l | 55 +-
src/backend/replication/walreceiver.c | 2 +-
src/backend/replication/walsender.c | 733 ++++++-
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procarray.c | 72 +-
src/backend/storage/ipc/standby.c | 15 +
src/backend/utils/cache/inval.c | 4 +-
src/backend/utils/cache/relcache.c | 113 +-
src/backend/utils/misc/guc.c | 12 +
src/backend/utils/misc/postgresql.conf.sample | 11 +-
src/backend/utils/time/snapmgr.c | 7 +-
src/backend/utils/time/tqual.c | 270 ++-
src/bin/initdb/initdb.c | 4 +-
src/bin/pg_controldata/pg_controldata.c | 2 +
src/include/access/heapam_xlog.h | 59 +-
src/include/access/transam.h | 5 +
src/include/access/xact.h | 1 +
src/include/access/xlog.h | 8 +-
src/include/access/xlogreader.h | 13 +-
src/include/catalog/catalog.h | 1 +
src/include/catalog/pg_proc.h | 6 +
src/include/commands/vacuum.h | 2 +-
src/include/nodes/nodes.h | 3 +
src/include/nodes/replnodes.h | 35 +
src/include/replication/decode.h | 20 +
src/include/replication/logical.h | 198 ++
src/include/replication/logicalfuncs.h | 19 +
src/include/replication/output_plugin.h | 70 +
src/include/replication/reorderbuffer.h | 342 +++
src/include/replication/snapbuild.h | 79 +
src/include/replication/walsender_private.h | 6 +-
src/include/storage/itemptr.h | 3 +
src/include/storage/lwlock.h | 1 +
src/include/storage/procarray.h | 2 +-
src/include/storage/sinval.h | 2 +
src/include/utils/inval.h | 1 +
src/include/utils/rel.h | 30 +-
src/include/utils/relcache.h | 11 +-
src/include/utils/snapmgr.h | 3 +
src/include/utils/tqual.h | 20 +-
src/test/regress/expected/rules.out | 9 +-
src/tools/pgindent/typedefs.list | 40 +
68 files changed, 9028 insertions(+), 206 deletions(-)
create mode 100644 src/backend/replication/logical/Makefile
create mode 100644 src/backend/replication/logical/decode.c
create mode 100644 src/backend/replication/logical/logical.c
create mode 100644 src/backend/replication/logical/logicalfuncs.c
create mode 100644 src/backend/replication/logical/reorderbuffer.c
create mode 100644 src/backend/replication/logical/snapbuild.c
create mode 100644 src/include/replication/decode.h
create mode 100644 src/include/replication/logical.h
create mode 100644 src/include/replication/logicalfuncs.h
create mode 100644 src/include/replication/output_plugin.h
create mode 100644 src/include/replication/reorderbuffer.h
create mode 100644 src/include/replication/snapbuild.h
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index b5fd30a..e1e5040 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -63,6 +63,14 @@ static relopt_bool boolRelOpts[] =
},
{
{
+ "treat_as_catalog_table",
+ "Treat table as a catalog table for the purpose of logical replication",
+ RELOPT_KIND_HEAP
+ },
+ false
+ },
+ {
+ {
"fastupdate",
"Enables \"fast update\" feature for this GIN index",
RELOPT_KIND_GIN
@@ -1166,6 +1174,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
offsetof(StdRdOptions, security_barrier)},
{"check_option", RELOPT_TYPE_STRING,
offsetof(StdRdOptions, check_option_offset)},
+ {"treat_as_catalog_table", RELOPT_TYPE_BOOL,
+ offsetof(StdRdOptions, treat_as_catalog_table)}
};
options = parseRelOptions(reloptions, validate, kind, &numoptions);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b1a5d9f..b09dde2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -85,12 +85,14 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
- HeapTuple newtup, bool all_visible_cleared,
- bool new_all_visible_cleared);
+ HeapTuple newtup, HeapTuple old_idx_tup,
+ bool all_visible_cleared, bool new_all_visible_cleared);
static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
- Bitmapset *hot_attrs, Bitmapset *key_attrs,
- bool *satisfies_hot, bool *satisfies_key,
- HeapTuple oldtup, HeapTuple newtup);
+ Bitmapset *hot_attrs,
+ Bitmapset *key_attrs, Bitmapset *ckey_attrs,
+ bool *satisfies_hot, bool *satisfies_key,
+ bool *satisfies_ckey,
+ HeapTuple oldtup, HeapTuple newtup);
static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
uint16 old_infomask2, TransactionId add_to_xmax,
LockTupleMode mode, bool is_update,
@@ -108,6 +110,8 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
static bool ConditionalMultiXactIdWait(MultiXactId multi,
MultiXactStatus status, int *remaining,
uint16 infomask);
+static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
+static HeapTuple ExtractKeyTuple(Relation rel, HeapTuple tup);
/*
@@ -342,8 +346,10 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- Assert(TransactionIdIsValid(RecentGlobalXmin));
- heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
+ if (IsSystemRelation(scan->rs_rd) || RelationIsDoingTimetravel(scan->rs_rd))
+ heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
+ else
+ heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalDataXmin);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1743,10 +1749,16 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
*/
if (!skip)
{
+ /* setup the redirected t_self for the benefit of timetravel access */
+ ItemPointerSet(&(heapTuple->t_self), BufferGetBlockNumber(buffer), offnum);
+
/* If it's visible per the snapshot, we must return it */
valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);
CheckForSerializableConflictOut(valid, relation, heapTuple,
buffer, snapshot);
+ /* reset original, non-redirected, tid */
+ heapTuple->t_self = *tid;
+
if (valid)
{
ItemPointerSetOffsetNumber(tid, offnum);
@@ -2101,11 +2113,24 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- XLogRecData rdata[3];
+ XLogRecData rdata[4];
Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
+ bool need_tuple_data;
+
+ /*
+ * For logical replication, we need the tuple even if we're doing a
+ * full page write, so make sure to log it separately. (XXX We could
+ * alternatively store a pointer into the FPW).
+ *
+ * Also, if this is a catalog, we need to transmit combocids to
+ * properly decode, so log that as well.
+ */
+ need_tuple_data = RelationIsLogicallyLogged(relation);
+ if (RelationIsDoingTimetravel(relation))
+ log_heap_new_cid(relation, heaptup);
- xlrec.all_visible_cleared = all_visible_cleared;
+ xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
xlrec.target.node = relation->rd_node;
xlrec.target.tid = heaptup->t_self;
rdata[0].data = (char *) &xlrec;
@@ -2124,18 +2149,35 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
*/
rdata[1].data = (char *) &xlhdr;
rdata[1].len = SizeOfHeapHeader;
- rdata[1].buffer = buffer;
+ rdata[1].buffer = need_tuple_data ? InvalidBuffer : buffer;
rdata[1].buffer_std = true;
rdata[1].next = &(rdata[2]);
/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
rdata[2].data = (char *) heaptup->t_data + offsetof(HeapTupleHeaderData, t_bits);
rdata[2].len = heaptup->t_len - offsetof(HeapTupleHeaderData, t_bits);
- rdata[2].buffer = buffer;
+ rdata[2].buffer = need_tuple_data ? InvalidBuffer : buffer;
rdata[2].buffer_std = true;
rdata[2].next = NULL;
/*
+ * add record for the buffer without actual content thats removed if
+ * fpw is done for that buffer
+ */
+ if (need_tuple_data)
+ {
+ rdata[2].next = &(rdata[3]);
+
+ rdata[3].data = NULL;
+ rdata[3].len = 0;
+ rdata[3].buffer = buffer;
+ rdata[3].buffer_std = true;
+ rdata[3].next = NULL;
+
+ xlrec.flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+ }
+
+ /*
* If this is the single and first tuple on page, we can reinit the
* page instead of restoring the whole thing. Set flag, and hide
* buffer references from XLogInsert.
@@ -2144,7 +2186,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
{
info |= XLOG_HEAP_INIT_PAGE;
- rdata[1].buffer = rdata[2].buffer = InvalidBuffer;
+ rdata[1].buffer = rdata[2].buffer = rdata[3].buffer = InvalidBuffer;
}
recptr = XLogInsert(RM_HEAP_ID, info, rdata);
@@ -2270,6 +2312,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
Page page;
bool needwal;
Size saveFreeSpace;
+ bool need_tuple_data = RelationIsLogicallyLogged(relation);
+ bool need_cids = RelationIsDoingTimetravel(relation);
needwal = !(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation);
saveFreeSpace = RelationGetTargetPageFreeSpace(relation,
@@ -2356,7 +2400,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
{
XLogRecPtr recptr;
xl_heap_multi_insert *xlrec;
- XLogRecData rdata[2];
+ XLogRecData rdata[3];
uint8 info = XLOG_HEAP2_MULTI_INSERT;
char *tupledata;
int totaldatalen;
@@ -2386,7 +2430,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
/* the rest of the scratch space is used for tuple data */
tupledata = scratchptr;
- xlrec->all_visible_cleared = all_visible_cleared;
+ xlrec->flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
xlrec->node = relation->rd_node;
xlrec->blkno = BufferGetBlockNumber(buffer);
xlrec->ntuples = nthispage;
@@ -2418,6 +2462,13 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
datalen);
tuphdr->datalen = datalen;
scratchptr += datalen;
+
+ /*
+ * We don't use heap_multi_insert for catalog tuples yet, but
+ * better be prepared...
+ */
+ if (need_cids)
+ log_heap_new_cid(relation, heaptup);
}
totaldatalen = scratchptr - tupledata;
Assert((scratchptr - scratch) < BLCKSZ);
@@ -2429,17 +2480,33 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
rdata[1].data = tupledata;
rdata[1].len = totaldatalen;
- rdata[1].buffer = buffer;
+ rdata[1].buffer = need_tuple_data ? InvalidBuffer : buffer;
rdata[1].buffer_std = true;
rdata[1].next = NULL;
/*
+ * add record for the buffer without actual content thats removed if
+ * fpw is done for that buffer
+ */
+ if (need_tuple_data)
+ {
+ rdata[1].next = &(rdata[2]);
+
+ rdata[2].data = NULL;
+ rdata[2].len = 0;
+ rdata[2].buffer = buffer;
+ rdata[2].buffer_std = true;
+ rdata[2].next = NULL;
+ xlrec->flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+ }
+
+ /*
* If we're going to reinitialize the whole page using the WAL
* record, hide buffer reference from XLogInsert.
*/
if (init)
{
- rdata[1].buffer = InvalidBuffer;
+ rdata[1].buffer = rdata[2].buffer = InvalidBuffer;
info |= XLOG_HEAP_INIT_PAGE;
}
@@ -2559,6 +2626,9 @@ heap_delete(Relation relation, ItemPointer tid,
bool have_tuple_lock = false;
bool iscombo;
bool all_visible_cleared = false;
+ bool need_tuple_data = RelationNeedsWAL(relation) &&
+ RelationIsLogicallyLogged(relation);
+ HeapTuple idx_tuple = NULL; /* primary key of the tuple */
Assert(ItemPointerIsValid(tid));
@@ -2732,6 +2802,15 @@ l1:
/* replace cid with a combo cid if necessary */
HeapTupleHeaderAdjustCmax(tp.t_data, &cid, &iscombo);
+ /*
+ * Compute primary key tuple before entering the critical section so we
+ * don't PANIC uppon a memory allocation failure.
+ */
+ if (need_tuple_data)
+ {
+ idx_tuple = ExtractKeyTuple(relation, &tp);
+ }
+
START_CRIT_SECTION();
/*
@@ -2784,9 +2863,13 @@ l1:
{
xl_heap_delete xlrec;
XLogRecPtr recptr;
- XLogRecData rdata[2];
+ XLogRecData rdata[4];
+
+ /* For logical decode we need combocids to properly decode the catalog */
+ if (RelationIsDoingTimetravel(relation))
+ log_heap_new_cid(relation, &tp);
- xlrec.all_visible_cleared = all_visible_cleared;
+ xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
xlrec.infobits_set = compute_infobits(tp.t_data->t_infomask,
tp.t_data->t_infomask2);
xlrec.target.node = relation->rd_node;
@@ -2803,6 +2886,34 @@ l1:
rdata[1].buffer_std = true;
rdata[1].next = NULL;
+ /*
+ * Log primary key of the deleted tuple
+ */
+ if (need_tuple_data && idx_tuple != NULL)
+ {
+ xl_heap_header xlhdr;
+
+ xlhdr.t_infomask2 = idx_tuple->t_data->t_infomask2;
+ xlhdr.t_infomask = idx_tuple->t_data->t_infomask;
+ xlhdr.t_hoff = idx_tuple->t_data->t_hoff;
+
+ rdata[1].next = &(rdata[2]);
+ rdata[2].data = (char*)&xlhdr;
+ rdata[2].len = SizeOfHeapHeader;
+ rdata[2].buffer = InvalidBuffer;
+ rdata[2].next = NULL;
+
+ rdata[2].next = &(rdata[3]);
+ rdata[3].data = (char *) idx_tuple->t_data
+ + offsetof(HeapTupleHeaderData, t_bits);
+ rdata[3].len = idx_tuple->t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+ rdata[3].buffer = InvalidBuffer;
+ rdata[3].next = NULL;
+
+ xlrec.flags |= XLOG_HEAP_CONTAINS_OLD_KEY;
+ }
+
recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE, rdata);
PageSetLSN(page, recptr);
@@ -2932,9 +3043,11 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
TransactionId xid = GetCurrentTransactionId();
Bitmapset *hot_attrs;
Bitmapset *key_attrs;
+ Bitmapset *ckey_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
+ HeapTuple old_idx_tuple = NULL;
Page page;
BlockNumber block;
MultiXactStatus mxact_status;
@@ -2950,6 +3063,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool iscombo;
bool satisfies_hot;
bool satisfies_key;
+ bool satisfies_ckey;
bool use_hot_update = false;
bool key_intact;
bool all_visible_cleared = false;
@@ -2977,8 +3091,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
* Note that we get a copy here, so we need not worry about relcache flush
* happening midway through.
*/
- hot_attrs = RelationGetIndexAttrBitmap(relation, false);
- key_attrs = RelationGetIndexAttrBitmap(relation, true);
+ hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL);
+ key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
+ ckey_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_CANDIDATE_KEY);
block = ItemPointerGetBlockNumber(otid);
buffer = ReadBuffer(relation, block);
@@ -3036,9 +3152,9 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitiously arrive at the same key values.
*/
- HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs,
+ HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs, ckey_attrs,
&satisfies_hot, &satisfies_key,
- &oldtup, newtup);
+ &satisfies_ckey, &oldtup, newtup);
if (satisfies_key)
{
*lockmode = LockTupleNoKeyExclusive;
@@ -3508,6 +3624,12 @@ l2:
PageSetFull(page);
}
+ /* compute tuple for loggical logging */
+ if (!satisfies_ckey && RelationIsLogicallyLogged(relation))
+ {
+ old_idx_tuple = ExtractKeyTuple(relation, &oldtup);
+ }
+
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -3583,11 +3705,20 @@ l2:
/* XLOG stuff */
if (RelationNeedsWAL(relation))
{
- XLogRecPtr recptr = log_heap_update(relation, buffer,
- newbuf, &oldtup, heaptup,
- all_visible_cleared,
- all_visible_cleared_new);
+ XLogRecPtr recptr;
+
+ /* For logical decode we need combocids to properly decode the catalog */
+ if (RelationIsDoingTimetravel(relation))
+ {
+ log_heap_new_cid(relation, &oldtup);
+ log_heap_new_cid(relation, heaptup);
+ }
+ recptr = log_heap_update(relation, buffer,
+ newbuf, &oldtup, heaptup,
+ old_idx_tuple,
+ all_visible_cleared,
+ all_visible_cleared_new);
if (newbuf != buffer)
{
PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -3739,18 +3870,23 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* modify columns used in the key.
*/
static void
-HeapSatisfiesHOTandKeyUpdate(Relation relation,
- Bitmapset *hot_attrs, Bitmapset *key_attrs,
+HeapSatisfiesHOTandKeyUpdate(Relation relation, Bitmapset *hot_attrs,
+ Bitmapset *key_attrs, Bitmapset *ckey_attrs,
bool *satisfies_hot, bool *satisfies_key,
+ bool *satisfies_ckey,
HeapTuple oldtup, HeapTuple newtup)
{
int next_hot_attnum;
int next_key_attnum;
+ int next_ckey_attnum;
bool hot_result = true;
bool key_result = true;
- bool key_done = false;
+ bool ckey_result = true;
bool hot_done = false;
+ Assert(bms_is_subset(ckey_attrs, key_attrs));
+ Assert(bms_is_subset(key_attrs, hot_attrs));
+
next_hot_attnum = bms_first_member(hot_attrs);
if (next_hot_attnum == -1)
hot_done = true;
@@ -3759,28 +3895,25 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
next_hot_attnum += FirstLowInvalidHeapAttributeNumber;
next_key_attnum = bms_first_member(key_attrs);
- if (next_key_attnum == -1)
- key_done = true;
- else
+ if (next_key_attnum != -1)
/* Adjust for system attributes */
next_key_attnum += FirstLowInvalidHeapAttributeNumber;
+ next_ckey_attnum = bms_first_member(ckey_attrs);
+ if (next_ckey_attnum != -1)
+ /* Adjust for system attributes */
+ next_ckey_attnum += FirstLowInvalidHeapAttributeNumber;
+
for (;;)
{
int check_now;
bool changed;
- /* both bitmapsets are now empty */
- if (key_done && hot_done)
+ /* bitmapsets are now empty, hot includes others */
+ if (hot_done)
break;
- /* XXX there's probably an easier way ... */
- if (hot_done)
- check_now = next_key_attnum;
- if (key_done)
- check_now = next_hot_attnum;
- else
- check_now = Min(next_hot_attnum, next_key_attnum);
+ check_now = next_hot_attnum;
changed = !heap_tuple_attr_equals(RelationGetDescr(relation),
check_now, oldtup, newtup);
@@ -3790,11 +3923,15 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
hot_result = false;
if (check_now == next_key_attnum)
key_result = false;
+ if (check_now == next_ckey_attnum)
+ ckey_result = false;
}
/* if both are false now, we can stop checking */
- if (!hot_result && !key_result)
+ if (!hot_result && !key_result && !ckey_result)
+ {
break;
+ }
if (check_now == next_hot_attnum)
{
@@ -3808,16 +3945,22 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
if (check_now == next_key_attnum)
{
next_key_attnum = bms_first_member(key_attrs);
- if (next_key_attnum == -1)
- key_done = true;
- else
+ if (next_key_attnum != -1)
/* Adjust for system attributes */
next_key_attnum += FirstLowInvalidHeapAttributeNumber;
}
+ if (check_now == next_ckey_attnum)
+ {
+ next_ckey_attnum = bms_first_member(ckey_attrs);
+ if (next_ckey_attnum != -1)
+ /* Adjust for system attributes */
+ next_ckey_attnum += FirstLowInvalidHeapAttributeNumber;
+ }
}
*satisfies_hot = hot_result;
*satisfies_key = key_result;
+ *satisfies_ckey = ckey_result;
}
/*
@@ -5839,15 +5982,22 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
static XLogRecPtr
log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
+ HeapTuple idx_tuple,
bool all_visible_cleared, bool new_all_visible_cleared)
{
xl_heap_update xlrec;
- xl_heap_header xlhdr;
+ xl_heap_header_len xlhdr;
+ xl_heap_header_len xlhdr_idx;
uint8 info;
XLogRecPtr recptr;
- XLogRecData rdata[4];
+ XLogRecData rdata[7];
Page page = BufferGetPage(newbuf);
+ /*
+ * Just as for XLOG_HEAP_INSERT we need to make sure the tuple
+ */
+ bool need_tuple_data = RelationIsLogicallyLogged(reln);
+
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
@@ -5862,9 +6012,12 @@ log_heap_update(Relation reln, Buffer oldbuf,
xlrec.old_infobits_set = compute_infobits(oldtup->t_data->t_infomask,
oldtup->t_data->t_infomask2);
xlrec.new_xmax = HeapTupleHeaderGetRawXmax(newtup->t_data);
- xlrec.all_visible_cleared = all_visible_cleared;
+ xlrec.flags = 0;
+ if (all_visible_cleared)
+ xlrec.flags |= XLOG_HEAP_ALL_VISIBLE_CLEARED;
xlrec.newtid = newtup->t_self;
- xlrec.new_all_visible_cleared = new_all_visible_cleared;
+ if (new_all_visible_cleared)
+ xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
rdata[0].data = (char *) &xlrec;
rdata[0].len = SizeOfHeapUpdate;
@@ -5877,33 +6030,78 @@ log_heap_update(Relation reln, Buffer oldbuf,
rdata[1].buffer_std = true;
rdata[1].next = &(rdata[2]);
- xlhdr.t_infomask2 = newtup->t_data->t_infomask2;
- xlhdr.t_infomask = newtup->t_data->t_infomask;
- xlhdr.t_hoff = newtup->t_data->t_hoff;
+ xlhdr.header.t_infomask2 = newtup->t_data->t_infomask2;
+ xlhdr.header.t_infomask = newtup->t_data->t_infomask;
+ xlhdr.header.t_hoff = newtup->t_data->t_hoff;
+ xlhdr.t_len = newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
- /*
- * As with insert records, we need not store the rdata[2] segment if we
- * decide to store the whole buffer instead.
- */
rdata[2].data = (char *) &xlhdr;
- rdata[2].len = SizeOfHeapHeader;
- rdata[2].buffer = newbuf;
+ rdata[2].len = SizeOfHeapHeaderLen;
+ rdata[2].buffer = need_tuple_data ? InvalidBuffer : newbuf;
rdata[2].buffer_std = true;
rdata[2].next = &(rdata[3]);
/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
- rdata[3].data = (char *) newtup->t_data + offsetof(HeapTupleHeaderData, t_bits);
+ rdata[3].data = (char *) newtup->t_data
+ + offsetof(HeapTupleHeaderData, t_bits);
rdata[3].len = newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
- rdata[3].buffer = newbuf;
+ rdata[3].buffer = need_tuple_data ? InvalidBuffer : newbuf;
rdata[3].buffer_std = true;
rdata[3].next = NULL;
+ /*
+ * separate storage for the buffer reference of the new page in the
+ * wal_level >= logical case
+ */
+ if(need_tuple_data)
+ {
+ rdata[3].next = &(rdata[4]);
+
+ rdata[4].data = NULL,
+ rdata[4].len = 0;
+ rdata[4].buffer = newbuf;
+ rdata[4].buffer_std = true;
+ rdata[4].next = NULL;
+ xlrec.flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+
+ /* candidate key changed and we have a candidate key */
+ if (idx_tuple)
+ {
+ /* don't really need this, but its more comfy */
+ xlhdr_idx.header.t_infomask2 = idx_tuple->t_data->t_infomask2;
+ xlhdr_idx.header.t_infomask = idx_tuple->t_data->t_infomask;
+ xlhdr_idx.header.t_hoff = idx_tuple->t_data->t_hoff;
+ xlhdr_idx.t_len = idx_tuple->t_len;
+
+ rdata[4].next = &(rdata[5]);
+ rdata[5].data = (char *) &xlhdr_idx;
+ rdata[5].len = SizeOfHeapHeaderLen;
+ rdata[5].buffer = InvalidBuffer;
+ rdata[5].next = &(rdata[6]);
+
+ /* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
+ rdata[6].data = (char *) idx_tuple->t_data
+ + offsetof(HeapTupleHeaderData, t_bits);
+ rdata[6].len = idx_tuple->t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+ rdata[6].buffer = InvalidBuffer;
+ rdata[6].next = NULL;
+
+ xlrec.flags |= XLOG_HEAP_CONTAINS_OLD_KEY;
+ }
+ }
+
/* If new tuple is the single and first tuple on page... */
if (ItemPointerGetOffsetNumber(&(newtup->t_self)) == FirstOffsetNumber &&
PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
{
+ XLogRecData *rcur = &rdata[0];
info |= XLOG_HEAP_INIT_PAGE;
- rdata[2].buffer = rdata[3].buffer = InvalidBuffer;
+ while (rcur != NULL)
+ {
+ rcur->buffer = InvalidBuffer;
+ rcur = rcur->next;
+ }
}
recptr = XLogInsert(RM_HEAP_ID, info, rdata);
@@ -6010,6 +6208,112 @@ log_newpage_buffer(Buffer buffer)
}
/*
+ * Perform XLogInsert of a XLOG_HEAP2_NEW_CID record
+ *
+ * This is only used in wal_level >= WAL_LEVEL_LOGICAL
+ */
+static XLogRecPtr
+log_heap_new_cid(Relation relation, HeapTuple tup)
+{
+ xl_heap_new_cid xlrec;
+
+ XLogRecPtr recptr;
+ XLogRecData rdata[1];
+ HeapTupleHeader hdr = tup->t_data;
+
+ Assert(ItemPointerIsValid(&tup->t_self));
+ Assert(tup->t_tableOid != InvalidOid);
+
+ xlrec.top_xid = GetTopTransactionId();
+ xlrec.target.node = relation->rd_node;
+ xlrec.target.tid = tup->t_self;
+
+ /*
+ * if the tuple got inserted & deleted in the same TX we definitely have a
+ * combocid, set cmin and cmax.
+ */
+ if (hdr->t_infomask & HEAP_COMBOCID)
+ {
+ xlrec.cmin = HeapTupleHeaderGetCmin(hdr);
+ xlrec.cmax = HeapTupleHeaderGetCmax(hdr);
+ xlrec.combocid = HeapTupleHeaderGetRawCommandId(hdr);
+ }
+ /* No combocid, so only cmin or cmax can be set by this TX */
+ else
+ {
+ /* tuple inserted */
+ if (hdr->t_infomask & HEAP_XMAX_INVALID)
+ {
+ xlrec.cmin = HeapTupleHeaderGetRawCommandId(hdr);
+ xlrec.cmax = InvalidCommandId;
+ }
+ /* tuple from a different tx updated or deleted */
+ else
+ {
+ xlrec.cmin = InvalidCommandId;
+ xlrec.cmax = HeapTupleHeaderGetRawCommandId(hdr);
+
+ }
+ xlrec.combocid = InvalidCommandId;
+ }
+
+ rdata[0].data = (char *) &xlrec;
+ rdata[0].len = SizeOfHeapNewCid;
+ rdata[0].buffer = InvalidBuffer;
+ rdata[0].next = NULL;
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_NEW_CID, rdata);
+
+ return recptr;
+}
+
+static HeapTuple
+ExtractKeyTuple(Relation relation, HeapTuple tp)
+{
+ HeapTuple idx_tuple = NULL;
+ TupleDesc desc = RelationGetDescr(relation);
+ Relation idx_rel;
+ TupleDesc idx_desc;
+ Datum idx_vals[INDEX_MAX_KEYS];
+ bool idx_isnull[INDEX_MAX_KEYS];
+ int natt;
+
+ /* needs to already have been fetched? */
+ if (relation->rd_indexvalid == 0)
+ RelationGetIndexList(relation);
+
+ if (!OidIsValid(relation->rd_primary))
+ {
+ elog(DEBUG1, "Could not find primary key for table with oid %u",
+ RelationGetRelid(relation));
+ }
+ else
+ {
+ idx_rel = RelationIdGetRelation(relation->rd_primary);
+ idx_desc = RelationGetDescr(idx_rel);
+
+ for (natt = 0; natt < idx_desc->natts; natt++)
+ {
+ int attno = idx_rel->rd_index->indkey.values[natt];
+ if (attno == ObjectIdAttributeNumber)
+ {
+ idx_vals[natt] = HeapTupleGetOid(tp);
+ idx_isnull[natt] = false;
+ }
+ else
+ {
+ idx_vals[natt] =
+ fastgetattr(tp, attno, desc, &idx_isnull[natt]);
+ }
+ Assert(!idx_isnull[natt]);
+ }
+ idx_tuple = heap_form_tuple(idx_desc, idx_vals, idx_isnull);
+ RelationClose(idx_rel);
+ }
+ return idx_tuple;
+}
+
+/*
* Handles CLEANUP_INFO
*/
static void
@@ -6370,7 +6674,7 @@ heap_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
Buffer vmbuffer = InvalidBuffer;
@@ -6419,7 +6723,7 @@ heap_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
/* Mark the page as a candidate for pruning */
PageSetPrunable(page, record->xl_xid);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
/* Make sure there is no forward chain link in t_ctid */
@@ -6453,7 +6757,7 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
Buffer vmbuffer = InvalidBuffer;
@@ -6524,7 +6828,7 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
PageSetLSN(page, lsn);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
MarkBufferDirty(buffer);
@@ -6587,7 +6891,7 @@ heap_xlog_multi_insert(XLogRecPtr lsn, XLogRecord *record)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->node);
Buffer vmbuffer = InvalidBuffer;
@@ -6670,7 +6974,7 @@ heap_xlog_multi_insert(XLogRecPtr lsn, XLogRecord *record)
PageSetLSN(page, lsn);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
MarkBufferDirty(buffer);
@@ -6709,7 +7013,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
HeapTupleHeaderData hdr;
char data[MaxHeapTupleSize];
} tbuf;
- xl_heap_header xlhdr;
+ xl_heap_header_len xlhdr;
int hsize;
uint32 newlen;
Size freespace;
@@ -6718,7 +7022,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
BlockNumber block = ItemPointerGetBlockNumber(&xlrec->target.tid);
@@ -6796,7 +7100,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
/* Mark the page as a candidate for pruning */
PageSetPrunable(page, record->xl_xid);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
/*
@@ -6820,7 +7124,7 @@ newt:;
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->new_all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
BlockNumber block = ItemPointerGetBlockNumber(&xlrec->newtid);
@@ -6878,13 +7182,13 @@ newsame:;
if (PageGetMaxOffsetNumber(page) + 1 < offnum)
elog(PANIC, "heap_update_redo: invalid max offset number");
- hsize = SizeOfHeapUpdate + SizeOfHeapHeader;
+ hsize = SizeOfHeapUpdate + SizeOfHeapHeaderLen;
- newlen = record->xl_len - hsize;
- Assert(newlen <= MaxHeapTupleSize);
memcpy((char *) &xlhdr,
(char *) xlrec + SizeOfHeapUpdate,
- SizeOfHeapHeader);
+ SizeOfHeapHeaderLen);
+ newlen = xlhdr.t_len;
+ Assert(newlen <= MaxHeapTupleSize);
htup = &tbuf.hdr;
MemSet((char *) htup, 0, sizeof(HeapTupleHeaderData));
/* PG73FORMAT: get bitmap [+ padding] [+ oid] + data */
@@ -6892,9 +7196,9 @@ newsame:;
(char *) xlrec + hsize,
newlen);
newlen += offsetof(HeapTupleHeaderData, t_bits);
- htup->t_infomask2 = xlhdr.t_infomask2;
- htup->t_infomask = xlhdr.t_infomask;
- htup->t_hoff = xlhdr.t_hoff;
+ htup->t_infomask2 = xlhdr.header.t_infomask2;
+ htup->t_infomask = xlhdr.header.t_infomask;
+ htup->t_hoff = xlhdr.header.t_hoff;
HeapTupleHeaderSetXmin(htup, record->xl_xid);
HeapTupleHeaderSetCmin(htup, FirstCommandId);
@@ -6906,7 +7210,7 @@ newsame:;
if (offnum == InvalidOffsetNumber)
elog(PANIC, "heap_update_redo: failed to add tuple");
- if (xlrec->new_all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
@@ -7157,6 +7461,9 @@ heap2_redo(XLogRecPtr lsn, XLogRecord *record)
case XLOG_HEAP2_LOCK_UPDATED:
heap_xlog_lock_updated(lsn, record);
break;
+ case XLOG_HEAP2_NEW_CID:
+ /* nothing to do on a real replay, only during logical decoding */
+ break;
default:
elog(PANIC, "heap2_redo: unknown op code %u", info);
}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3ec10a0..7fe9f32 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -75,6 +75,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, TransactionId OldestXmin)
Page page = BufferGetPage(buffer);
Size minfree;
+ Assert(TransactionIdIsValid(OldestXmin));
+
/*
* Let's see if we really need pruning.
*
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index b878155..3bac4a5 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -67,7 +67,10 @@
#include "access/relscan.h"
#include "access/transam.h"
+#include "access/xlog.h"
+
#include "catalog/index.h"
+#include "catalog/catalog.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -520,8 +523,15 @@ index_fetch_heap(IndexScanDesc scan)
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != scan->xs_cbuf)
- heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
- RecentGlobalXmin);
+ {
+ if (IsSystemRelation(scan->heapRelation)
+ || RelationIsDoingTimetravel(scan->heapRelation))
+ heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
+ RecentGlobalXmin);
+ else
+ heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
+ RecentGlobalDataXmin);
+ }
}
/* Obtain share-lock on the buffer so we can examine visibility */
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index bc8b985..c750fef 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -184,6 +184,15 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
xlrec->infobits_set);
out_target(buf, &(xlrec->target));
}
+ else if (info == XLOG_HEAP2_NEW_CID)
+ {
+ xl_heap_new_cid *xlrec = (xl_heap_new_cid *) rec;
+
+ appendStringInfo(buf, "new_cid: ");
+ out_target(buf, &(xlrec->target));
+ appendStringInfo(buf, "; cmin: %u, cmax: %u, combo: %u",
+ xlrec->cmin, xlrec->cmax, xlrec->combocid);
+ }
else
appendStringInfo(buf, "UNKNOWN");
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1b36f9a..e0900e2 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -28,6 +28,7 @@ const struct config_enum_entry wal_level_options[] = {
{"minimal", WAL_LEVEL_MINIMAL, false},
{"archive", WAL_LEVEL_ARCHIVE, false},
{"hot_standby", WAL_LEVEL_HOT_STANDBY, false},
+ {"logical", WAL_LEVEL_LOGICAL, false},
{NULL, 0, false}
};
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e975f8d..d46a50e 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -47,6 +47,7 @@
#include "access/twophase.h"
#include "access/twophase_rmgr.h"
#include "access/xact.h"
+#include "access/xlog.h"
#include "access/xlogutils.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
@@ -1920,7 +1921,8 @@ RecoverPreparedTransactions(void)
* the prepared transaction generated xid assignment records. Test
* here must match one used in AssignTransactionId().
*/
- if (InHotStandby && hdr->nsubxacts >= PGPROC_MAX_CACHED_SUBXIDS)
+ if (InHotStandby && (hdr->nsubxacts >= PGPROC_MAX_CACHED_SUBXIDS ||
+ XLogLogicalInfoActive()))
overwriteOK = true;
/*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0591f3f..b937ffe 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -146,6 +146,7 @@ typedef struct TransactionStateData
int prevSecContext; /* previous SecurityRestrictionContext */
bool prevXactReadOnly; /* entry-time xact r/o state */
bool startedInRecovery; /* did we start in recovery? */
+ bool guaranteedlyLogged; /* has xid been logged? */
struct TransactionStateData *parent; /* back link to parent */
} TransactionStateData;
@@ -175,6 +176,7 @@ static TransactionStateData TopTransactionStateData = {
0, /* previous SecurityRestrictionContext */
false, /* entry-time xact r/o state */
false, /* startedInRecovery */
+ false, /* guaranteedlyLogged */
NULL /* link to parent state block */
};
@@ -391,6 +393,21 @@ GetCurrentTransactionIdIfAny(void)
}
/*
+ * MarkCurrentTransactionIdLoggedIfAny
+ *
+ * Remember that the current xid - if it is assigned - now has been wal logged.
+ */
+void
+MarkCurrentTransactionIdLoggedIfAny(void)
+{
+ if (TransactionIdIsValid(CurrentTransactionState->transactionId))
+ {
+ CurrentTransactionState->guaranteedlyLogged = true;
+ }
+}
+
+
+/*
* GetStableLatestTransactionId
*
* Get the transaction's XID if it has one, else read the next-to-be-assigned
@@ -431,6 +448,7 @@ AssignTransactionId(TransactionState s)
{
bool isSubXact = (s->parent != NULL);
ResourceOwner currentOwner;
+ bool log_unknown_top = false;
/* Assert that caller didn't screw up */
Assert(!TransactionIdIsValid(s->transactionId));
@@ -438,7 +456,7 @@ AssignTransactionId(TransactionState s)
/*
* Ensure parent(s) have XIDs, so that a child always has an XID later
- * than its parent. Musn't recurse here, or we might get a stack overflow
+ * than its parent. May not recurse here, or we might get a stack overflow
* if we're at the bottom of a huge stack of subtransactions none of which
* have XIDs yet.
*/
@@ -455,6 +473,8 @@ AssignTransactionId(TransactionState s)
p = p->parent;
}
+ Assert(parentOffset);
+
/*
* This is technically a recursive call, but the recursion will never
* be more than one layer deep.
@@ -466,6 +486,21 @@ AssignTransactionId(TransactionState s)
}
/*
+ * When wal_level=logical, guarantee that a subtransaction's xid can only
+ * be seen in the WAL stream if its toplevel xid has been logged before. If
+ * necessary we log a xact_assignment record with fewer than
+ * PGPROC_MAX_CACHED_SUBXIDS. Note that it is fine if guaranteedlyLogged
+ * isn't set for a transaction even though it appears in a wal record,
+ * we'll just superfluously log something.
+ */
+ if (isSubXact && XLogLogicalInfoActive() &&
+ !TopTransactionStateData.guaranteedlyLogged)
+ {
+ log_unknown_top = true;
+ }
+
+
+ /*
* Generate a new Xid and record it in PG_PROC and pg_subtrans.
*
* NB: we must make the subtrans entry BEFORE the Xid appears anywhere in
@@ -519,6 +554,9 @@ AssignTransactionId(TransactionState s)
* top-level transaction that each subxact belongs to. This is correct in
* recovery only because aborted subtransactions are separately WAL
* logged.
+ *
+ * This is correct even for the case where several levels above us didn't
+ * have an xid assigned as we recursed up to them beforehand.
*/
if (isSubXact && XLogStandbyInfoActive())
{
@@ -529,7 +567,8 @@ AssignTransactionId(TransactionState s)
* ensure this test matches similar one in
* RecoverPreparedTransactions()
*/
- if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS)
+ if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS ||
+ log_unknown_top)
{
XLogRecData rdata[2];
xl_xact_assignment xlrec;
@@ -548,13 +587,15 @@ AssignTransactionId(TransactionState s)
rdata[0].next = &rdata[1];
rdata[1].data = (char *) unreportedXids;
- rdata[1].len = PGPROC_MAX_CACHED_SUBXIDS * sizeof(TransactionId);
+ rdata[1].len = nUnreportedXids * sizeof(TransactionId);
rdata[1].buffer = InvalidBuffer;
rdata[1].next = NULL;
(void) XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT, rdata);
nUnreportedXids = 0;
+ /* mark top, not current xact as having been logged */
+ TopTransactionStateData.guaranteedlyLogged = true;
}
}
}
@@ -1733,6 +1774,7 @@ StartTransaction(void)
* initialize reported xid accounting
*/
nUnreportedXids = 0;
+ s->guaranteedlyLogged = false;
/*
* must initialize resource-management stuff first
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dc47c47..1f13250 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
#include "postmaster/startup.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
+#include "replication/logical.h"
#include "storage/barrier.h"
#include "storage/bufmgr.h"
#include "storage/fd.h"
@@ -1191,6 +1192,8 @@ begin:;
*/
WALInsertSlotRelease();
+ MarkCurrentTransactionIdLoggedIfAny();
+
END_CRIT_SECTION();
/*
@@ -6328,6 +6331,13 @@ StartupXLOG(void)
XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
XLogCtl->ckptXid = checkPoint.nextXid;
+
+ /*
+ * Startup logical state, needs to be setup now so we have proper data
+ * during restore. XXX
+ */
+ StartupLogicalReplication(checkPoint.redo);
+
/*
* Initialize unlogged LSN. On a clean shutdown, it's restored from the
* control file. On recovery, all unlogged relations are blown away, so
@@ -8308,7 +8318,7 @@ CreateCheckPoint(int flags)
* StartupSUBTRANS hasn't been called yet.
*/
if (!RecoveryInProgress())
- TruncateSUBTRANS(GetOldestXmin(true, false));
+ TruncateSUBTRANS(GetOldestXmin(true, true, false, false));
/* Real work is done, but log and update stats before releasing lock. */
LogCheckpointEnd(false);
@@ -8668,7 +8678,7 @@ CreateRestartPoint(int flags)
* this because StartupSUBTRANS hasn't been called yet.
*/
if (EnableHotStandby)
- TruncateSUBTRANS(GetOldestXmin(true, false));
+ TruncateSUBTRANS(GetOldestXmin(true, true, false, false));
/* Real work is done, but log and update before releasing lock. */
LogCheckpointEnd(true);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index c1287a7..0d4cfcb 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -106,7 +106,6 @@ GetDatabasePath(Oid dbNode, Oid spcNode)
return path;
}
-
/*
* IsSystemRelation
* True iff the relation is a system catalog relation.
@@ -123,8 +122,17 @@ GetDatabasePath(Oid dbNode, Oid spcNode)
bool
IsSystemRelation(Relation relation)
{
- return IsSystemNamespace(RelationGetNamespace(relation)) ||
- IsToastNamespace(RelationGetNamespace(relation));
+ return IsSystemRelationId(RelationGetRelid(relation));
+}
+
+/*
+ * IsSystemRelationId
+ * True iff the relation is a system catalog relation.
+ */
+bool
+IsSystemRelationId(Oid relid)
+{
+ return relid < FirstNormalObjectId;
}
/*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b73ee4f..49ea38b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2174,9 +2174,20 @@ IndexBuildHeapScan(Relation heapRelation,
}
else
{
+ /*
+ * We can ignore a) pegged xmins b) shared relations if we don't scan
+ * something acting as a catalog.
+ */
+ bool include_systables =
+ IsSystemRelation(heapRelation) ||
+ RelationIsDoingTimetravel(heapRelation);
+
snapshot = SnapshotAny;
/* okay to ignore lazy VACUUMs here */
- OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true);
+ OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared,
+ include_systables,
+ true,
+ false);
}
scan = heap_beginscan_strat(heapRelation, /* relation */
@@ -3340,7 +3351,7 @@ reindex_relation(Oid relid, int flags)
/* Ensure rd_indexattr is valid; see comments for RelationSetIndexList */
if (is_pg_class)
- (void) RelationGetIndexAttrBitmap(rel, false);
+ (void) RelationGetIndexAttrBitmap(rel, INDEX_ATTR_BITMAP_ALL);
PG_TRY();
{
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 575a40f..2acaf54 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -613,6 +613,16 @@ CREATE VIEW pg_stat_replication AS
WHERE S.usesysid = U.oid AND
S.pid = W.pid;
+CREATE VIEW pg_stat_logical_decoding AS
+ SELECT
+ L.slot_name,
+ L.plugin,
+ L.database,
+ L.active,
+ L.xmin,
+ L.restart_decoding_lsn
+ FROM pg_stat_get_logical_decoding_slots() AS L;
+
CREATE VIEW pg_stat_database AS
SELECT
D.oid AS datid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9845b0b..7a05cea 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1081,7 +1081,7 @@ acquire_sample_rows(Relation onerel, int elevel,
totalblocks = RelationGetNumberOfBlocks(onerel);
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
- OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true);
+ OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true, true, false);
/* Prepare for sampling block numbers */
BlockSampler_Init(&bs, totalblocks, targrows);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 051b806..240782e 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -858,6 +858,8 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
*/
vacuum_set_xid_limits(freeze_min_age, freeze_table_age,
OldHeap->rd_rel->relisshared,
+ IsSystemRelation(OldHeap)
+ || RelationIsDoingTimetravel(OldHeap),
&OldestXmin, &FreezeXid, NULL, &MultiXactFrzLimit);
/*
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index d86e9ad..912f7a8 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2355,7 +2355,8 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
* concurrency.
*/
modifiedCols = GetModifiedColumns(relinfo, estate);
- keyCols = RelationGetIndexAttrBitmap(relinfo->ri_RelationDesc, true);
+ keyCols = RelationGetIndexAttrBitmap(relinfo->ri_RelationDesc,
+ INDEX_ATTR_BITMAP_KEY);
if (bms_overlap(keyCols, modifiedCols))
lockmode = LockTupleExclusive;
else
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 2f2c6ac..dc647ad 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -381,6 +381,7 @@ void
vacuum_set_xid_limits(int freeze_min_age,
int freeze_table_age,
bool sharedRel,
+ bool catalogRel,
TransactionId *oldestXmin,
TransactionId *freezeLimit,
TransactionId *freezeTableLimit,
@@ -399,7 +400,7 @@ vacuum_set_xid_limits(int freeze_min_age,
* working on a particular table at any time, and that each vacuum is
* always an independent transaction.
*/
- *oldestXmin = GetOldestXmin(sharedRel, true);
+ *oldestXmin = GetOldestXmin(sharedRel, catalogRel, true, false);
Assert(TransactionIdIsNormal(*oldestXmin));
@@ -720,7 +721,7 @@ vac_update_datfrozenxid(void)
* committed pg_class entries for new tables; see AddNewRelationTuple().
* So we cannot produce a wrong minimum by starting with this.
*/
- newFrozenXid = GetOldestXmin(true, true);
+ newFrozenXid = GetOldestXmin(true, true, true, false);
/*
* Similarly, initialize the MultiXact "min" with the value that would be
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c34aa53..b650eee 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -44,6 +44,7 @@
#include "access/multixact.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
+#include "catalog/catalog.h"
#include "catalog/storage.h"
#include "commands/dbcommands.h"
#include "commands/vacuum.h"
@@ -202,6 +203,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
vacuum_set_xid_limits(vacstmt->freeze_min_age, vacstmt->freeze_table_age,
onerel->rd_rel->relisshared,
+ IsSystemRelation(onerel)
+ || RelationIsDoingTimetravel(onerel),
&OldestXmin, &FreezeLimit, &freezeTableLimit,
&MultiXactFrzLimit);
scan_all = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a31b01d..8a52cdc 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -818,7 +818,7 @@ PostmasterMain(int argc, char *argv[])
(errmsg("WAL archival (archive_mode=on) requires wal_level \"archive\" or \"hot_standby\"")));
if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
ereport(ERROR,
- (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"archive\" or \"hot_standby\"")));
+ (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"archive\", \"logical\" or \"hot_standby\"")));
/*
* Other one-time internal sanity checks can go here, if they are fast.
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 2dde011..2e13e27 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,8 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
OBJS = walsender.o walreceiverfuncs.o walreceiver.o basebackup.o \
repl_gram.o syncrep.o
+SUBDIRS = logical
+
include $(top_srcdir)/src/backend/common.mk
# repl_scanner is compiled as part of repl_gram
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
new file mode 100644
index 0000000..310a45c
--- /dev/null
+++ b/src/backend/replication/logical/Makefile
@@ -0,0 +1,19 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for src/backend/replication/logical
+#
+# IDENTIFICATION
+# src/backend/replication/logical/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/logical
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
+
+OBJS = decode.o logical.o logicalfuncs.o reorderbuffer.o snapbuild.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
new file mode 100644
index 0000000..53043b9
--- /dev/null
+++ b/src/backend/replication/logical/decode.c
@@ -0,0 +1,687 @@
+/*-------------------------------------------------------------------------
+ *
+ * decode.c
+ * Decodes WAL records fed from xlogreader.h read into an reorderbuffer
+ * while simultaneously letting snapbuild.c build an appropriate snapshots
+ * to decode those.
+ *
+ * NOTE:
+ * This basically tries to handle all low level xlog stuff for
+ * reorderbuffer.c and snapbuild.c. There's some minor leakage where a
+ * specific record's struct is used to pass data along, but that's just
+ * because those are convenient and uncomplicated to read.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logical/decode.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+
+#include "access/heapam.h"
+#include "access/heapam_xlog.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+
+#include "catalog/pg_control.h"
+
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+
+#include "storage/standby.h"
+
+/* RMGR Handlers */
+static void DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+
+/* individual record(group)'s handlers */
+static void DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeMultiInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ TransactionId xid, int nsubxacts, TransactionId *sub_xids,
+ int ninval_msgs, SharedInvalidationMessage *msg);
+static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecPtr lsn,
+ TransactionId xid, TransactionId *sub_xids, int nsubxacts,
+ bool was_commit);
+
+/* common function to decode tuples */
+static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
+
+void
+DecodeRecordIntoReorderBuffer(LogicalDecodingContext *ctx,
+ XLogRecordBuffer *buf)
+{
+ /* cast so we get a warning when new rmgrs are added */
+ switch ((RmgrIds) buf->record.xl_rmid)
+ {
+ case RM_XLOG_ID:
+ DecodeXLogOp(ctx, buf);
+ break;
+
+ case RM_XACT_ID:
+ DecodeXactOp(ctx, buf);
+ break;
+
+ case RM_STANDBY_ID:
+ DecodeStandbyOp(ctx, buf);
+ break;
+
+ case RM_HEAP_ID:
+ DecodeHeapOp(ctx, buf);
+ break;
+
+ case RM_HEAP2_ID:
+ DecodeHeap2Op(ctx, buf);
+ break;
+
+ /* irrelevant for changeset extraction */
+ case RM_SMGR_ID:
+ case RM_CLOG_ID:
+ case RM_DBASE_ID:
+ case RM_TBLSPC_ID:
+ case RM_MULTIXACT_ID:
+ case RM_RELMAP_ID:
+ case RM_BTREE_ID:
+ case RM_HASH_ID:
+ case RM_GIN_ID:
+ case RM_GIST_ID:
+ case RM_SEQ_ID:
+ case RM_SPGIST_ID:
+ break;
+ case RM_NEXT_ID:
+ elog(ERROR, "unexpected NEXT_ID record");
+ }
+}
+
+static void
+DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ SnapBuild *builder = ctx->snapshot_builder;
+ ReorderBuffer *reorder = ctx->reorder;
+ XLogRecord *r = &buf->record;
+
+ /* no point in doing anything yet */
+ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+ return;
+
+ switch (r->xl_info & ~XLR_INFO_MASK)
+ {
+ case XLOG_XACT_COMMIT:
+ {
+ xl_xact_commit *xlrec;
+ TransactionId *subxacts = NULL;
+ SharedInvalidationMessage *invals = NULL;
+
+ xlrec = (xl_xact_commit *) buf->record_data;
+
+ subxacts = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+ invals = (SharedInvalidationMessage *) &(subxacts[xlrec->nsubxacts]);
+
+ /* FIXME: skip if wrong db? */
+
+ DecodeCommit(ctx, buf, r->xl_xid, xlrec->nsubxacts, subxacts,
+ xlrec->nmsgs, invals);
+
+ break;
+ }
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ xl_xact_commit_prepared *prec;
+ xl_xact_commit *xlrec;
+ TransactionId *subxacts;
+ SharedInvalidationMessage *invals = NULL;
+
+
+ prec = (xl_xact_commit_prepared *) buf->record_data;
+ xlrec = &prec->crec;
+
+ subxacts = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+ invals = (SharedInvalidationMessage *) &(subxacts[xlrec->nsubxacts]);
+
+ /* FIXME: skip if wrong db? */
+
+ DecodeCommit(ctx, buf, r->xl_xid, xlrec->nsubxacts, subxacts,
+ xlrec->nmsgs, invals);
+
+ break;
+ }
+ case XLOG_XACT_COMMIT_COMPACT:
+ {
+ xl_xact_commit_compact *xlrec;
+
+#if 0
+ /* FIXME: should we error out? */
+ elog(WARNING, "unexpectedly got compact commit");
+#endif
+ xlrec = (xl_xact_commit_compact *) buf->record_data;
+
+ DecodeCommit(ctx, buf, r->xl_xid,
+ xlrec->nsubxacts, xlrec->subxacts,
+ 0, NULL);
+ break;
+ }
+ case XLOG_XACT_ABORT:
+ {
+ xl_xact_abort *xlrec;
+ TransactionId *sub_xids;
+
+ xlrec = (xl_xact_abort *) buf->record_data;
+
+ sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+
+ DecodeAbort(ctx, buf->origptr, r->xl_xid,
+ sub_xids, xlrec->nsubxacts, false);
+ break;
+ }
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ xl_xact_abort_prepared *prec;
+ xl_xact_abort *xlrec;
+ TransactionId *sub_xids;
+
+ prec = (xl_xact_abort_prepared *) buf->record_data;
+ xlrec = &prec->arec;
+
+ sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+
+ /* r->xl_xid is committed in a separate record */
+ DecodeAbort(ctx, buf->origptr, prec->xid,
+ sub_xids, xlrec->nsubxacts, false);
+ break;
+ }
+
+ case XLOG_XACT_ASSIGNMENT:
+ {
+ xl_xact_assignment *xlrec;
+ int i;
+ TransactionId *sub_xid;
+
+ xlrec = (xl_xact_assignment *) buf->record_data;
+
+ sub_xid = &xlrec->xsub[0];
+
+ for (i = 0; i < xlrec->nsubxacts; i++)
+ {
+ ReorderBufferAssignChild(reorder, xlrec->xtop,
+ *(sub_xid++), buf->origptr);
+ }
+ break;
+ }
+ case XLOG_XACT_PREPARE:
+
+ /*
+ * XXX: we could replay the transaction and prepare it
+ * as well.
+ */
+ break;
+ default:
+ break;
+ }
+}
+
+static void
+DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ SnapBuild *builder = ctx->snapshot_builder;
+ XLogRecord *r = &buf->record;
+
+ switch (r->xl_info & ~XLR_INFO_MASK)
+ {
+ case XLOG_RUNNING_XACTS:
+ SnapBuildProcessRunningXacts(builder, buf->origptr,
+ (xl_running_xacts *) buf->record_data);
+ break;
+ case XLOG_STANDBY_LOCK:
+ break;
+ default:
+ elog(ERROR, "unexpected standby record type");
+ }
+}
+static void
+DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ SnapBuild *builder = ctx->snapshot_builder;
+
+ switch (buf->record.xl_info & ~XLR_INFO_MASK)
+ {
+ /* this is also used in END_OF_RECOVERY checkpoints */
+ case XLOG_CHECKPOINT_SHUTDOWN:
+ case XLOG_END_OF_RECOVERY:
+ SnapBuildSerializationPoint(builder, buf->origptr);
+
+ /*
+ * abort all transactions that still deemed to be in progress, they
+ * aren't actually in progress anymore. Do not abort prepared
+ * transactions that have been prepared for commit.
+ *
+ * FIXME: implement.
+ */
+ break;
+ case XLOG_CHECKPOINT_ONLINE:
+ /*
+ * a RUNNING_XACTS record will have been logged near to this, we
+ * can restart from there.
+ */
+ break;
+ case XLOG_NOOP:
+ case XLOG_NEXTOID:
+ case XLOG_SWITCH:
+ case XLOG_BACKUP_END:
+ case XLOG_PARAMETER_CHANGE:
+ case XLOG_RESTORE_POINT:
+ case XLOG_FPW_CHANGE:
+ case XLOG_FPI:
+ break;
+ }
+}
+
+static void
+DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ uint8 info = buf->record.xl_info & XLOG_HEAP_OPMASK;
+ TransactionId xid = buf->record.xl_xid;
+ SnapBuild *builder = ctx->snapshot_builder;
+
+ /* no point in doing anything yet */
+ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+ return;
+
+ switch (info)
+ {
+ case XLOG_HEAP_INSERT:
+ if (SnapBuildProcessChange(builder, xid, buf->origptr))
+ DecodeInsert(ctx, buf);
+ break;
+
+ /*
+ * Treat HOT update as normal updates, there is no useful
+ * information in the fact that we could make it a HOT update
+ * locally and the WAL layout is compatible.
+ */
+ case XLOG_HEAP_HOT_UPDATE:
+ case XLOG_HEAP_UPDATE:
+ if (SnapBuildProcessChange(builder, xid, buf->origptr))
+ DecodeUpdate(ctx, buf);
+ break;
+
+ case XLOG_HEAP_DELETE:
+ if (SnapBuildProcessChange(builder, xid, buf->origptr))
+ DecodeDelete(ctx, buf);
+ break;
+
+ case XLOG_HEAP_NEWPAGE:
+ /*
+ * XXX: There doesn't seem to be a usecase for decoding
+ * HEAP_NEWPAGE's. Its only used in various indexam's and CLUSTER,
+ * neither of which should be relevant for the logical
+ * changestream.
+ */
+ break;
+ case XLOG_HEAP_INPLACE:
+ /* cannot be important for our purposes, not part of transaction */
+ if (!TransactionIdIsValid(xid))
+ break;
+
+ SnapBuildProcessChange(builder, xid, buf->origptr);
+ /* heap_inplace is only done in catalog modifying txns */
+ ReorderBufferXidSetTimetravel(ctx->reorder, xid, buf->origptr);
+ break;
+ case XLOG_HEAP_LOCK:
+ break;
+ default:
+ elog(ERROR, "unexpected info value %u", info);
+ break;
+ }
+}
+
+static void
+DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ uint8 info = buf->record.xl_info & XLOG_HEAP_OPMASK;
+ TransactionId xid = buf->record.xl_xid;
+ SnapBuild *builder = ctx->snapshot_builder;
+
+ /* no point in doing anything yet */
+ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+ return;
+
+ switch (info)
+ {
+ case XLOG_HEAP2_MULTI_INSERT:
+ if (SnapBuildProcessChange(builder, xid, buf->origptr))
+ DecodeMultiInsert(ctx, buf);
+ break;
+ case XLOG_HEAP2_NEW_CID:
+ {
+ xl_heap_new_cid *xlrec;
+ xlrec = (xl_heap_new_cid *) buf->record_data;
+ SnapBuildProcessNewCid(builder, xid, buf->origptr, xlrec);
+
+ break;
+ }
+ /*
+ * everything else here is just low level stuff we're not
+ * interested in
+ */
+ case XLOG_HEAP2_FREEZE:
+ case XLOG_HEAP2_CLEAN:
+ case XLOG_HEAP2_CLEANUP_INFO:
+ case XLOG_HEAP2_VISIBLE:
+ case XLOG_HEAP2_LOCK_UPDATED:
+ break;
+ default:
+ elog(ERROR, "unexpected info value %u", info);
+ }
+}
+
+static void
+DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, TransactionId xid,
+ int nsubxacts, TransactionId *sub_xids,
+ int ninval_msgs, SharedInvalidationMessage *msgs)
+{
+ int i;
+
+ /* always need the invalidation messages */
+ if (ninval_msgs > 0)
+ {
+ ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
+ ninval_msgs, msgs);
+ ReorderBufferXidSetTimetravel(ctx->reorder, xid, buf->origptr);
+ }
+
+ SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
+ nsubxacts, sub_xids);
+
+ /*
+ * If we are not interested in anything up to this LSN convert the commit
+ * into an ABORT to cleanup.
+ *
+ * FIXME: this needs to replay invalidations anyway!
+ */
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr))
+ {
+ DecodeAbort(ctx, buf->origptr, xid, sub_xids, nsubxacts, true);
+ return;
+ }
+
+ for (i = 0; i < nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, *sub_xids,
+ buf->origptr, buf->endptr);
+ sub_xids++;
+ }
+
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr);
+}
+
+static void
+DecodeAbort(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
+ TransactionId *sub_xids, int nsubxacts, bool was_commit)
+{
+ int i;
+
+ /*
+ * this is a bit grotty, but if we're "faking" an abort we've already gone
+ * through
+ */
+ if (!was_commit)
+ SnapBuildAbortTxn(ctx->snapshot_builder, xid,
+ nsubxacts, sub_xids);
+
+
+ /* FIXME: process invalidations anyway if was_commit */
+
+ for (i = 0; i < nsubxacts; i++)
+ {
+ ReorderBufferAbort(ctx->reorder, *sub_xids, lsn);
+ sub_xids++;
+ }
+
+ ReorderBufferAbort(ctx->reorder, xid, lsn);
+}
+
+static void
+DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_insert *xlrec;
+ ReorderBufferChange *change;
+
+ xlrec = (xl_heap_insert *) buf->record_data;
+
+ /* XXX: nicer */
+ if (xlrec->target.node.dbNode != ctx->slot->database)
+ return;
+
+ change = ReorderBufferGetChange(ctx->reorder);
+ change->action = REORDER_BUFFER_CHANGE_INSERT;
+ memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+ {
+ Assert(r->xl_len > (SizeOfHeapInsert + SizeOfHeapHeader));
+
+ change->newtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+
+ DecodeXLogTuple((char *) xlrec + SizeOfHeapInsert,
+ r->xl_len - SizeOfHeapInsert,
+ change->newtuple);
+ }
+
+ ReorderBufferQueueChange(ctx->reorder, r->xl_xid, buf->origptr, change);
+}
+
+static void
+DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_update *xlrec;
+ xl_heap_header_len *xlhdr;
+ ReorderBufferChange *change;
+ char *data;
+
+ xlrec = (xl_heap_update *) buf->record_data;
+ xlhdr = (xl_heap_header_len *) (buf->record_data + SizeOfHeapUpdate);
+
+ /* XXX: nicer */
+ if (xlrec->target.node.dbNode != ctx->slot->database)
+ return;
+
+ change = ReorderBufferGetChange(ctx->reorder);
+ change->action = REORDER_BUFFER_CHANGE_UPDATE;
+ memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+ data = (char *) &xlhdr->header;
+
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+ {
+ Assert(r->xl_len > (SizeOfHeapUpdate + SizeOfHeapHeaderLen));
+
+ change->newtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+
+ DecodeXLogTuple(data,
+ xlhdr->t_len + SizeOfHeapHeader,
+ change->newtuple);
+ /* skip over the rest of the tuple header */
+ data += SizeOfHeapHeader;
+ /* skip over the tuple data */
+ data += xlhdr->t_len;
+ }
+
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_OLD_KEY)
+ {
+ xlhdr = (xl_heap_header_len *) data;
+ change->oldtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+ DecodeXLogTuple((char *) &xlhdr->header,
+ xlhdr->t_len + SizeOfHeapHeader,
+ change->oldtuple);
+ data = (char *) &xlhdr->header;
+ data += SizeOfHeapHeader;
+ data += xlhdr->t_len;
+ }
+
+ ReorderBufferQueueChange(ctx->reorder, r->xl_xid, buf->origptr, change);
+}
+
+static void
+DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_delete *xlrec;
+ ReorderBufferChange *change;
+
+ xlrec = (xl_heap_delete *) buf->record_data;
+
+ /* XXX: nicer */
+ if (xlrec->target.node.dbNode != ctx->slot->database)
+ return;
+
+ change = ReorderBufferGetChange(ctx->reorder);
+ change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+ memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+ /* old primary key stored */
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_OLD_KEY)
+ {
+ Assert(r->xl_len > (SizeOfHeapDelete + SizeOfHeapHeader));
+
+ change->oldtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+
+ DecodeXLogTuple((char *) xlrec + SizeOfHeapDelete,
+ r->xl_len - SizeOfHeapDelete,
+ change->oldtuple);
+ }
+ ReorderBufferQueueChange(ctx->reorder, r->xl_xid, buf->origptr, change);
+}
+
+/*
+ * Decode xl_heap_multi_insert record into multiple changes.
+ */
+static void
+DecodeMultiInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_multi_insert *xlrec;
+ int i;
+ char *data;
+ bool isinit = (r->xl_info & XLOG_HEAP_INIT_PAGE) != 0;
+
+ xlrec = (xl_heap_multi_insert *) buf->record_data;
+
+ /* XXX: nicer */
+ if (xlrec->node.dbNode != ctx->slot->database)
+ return;
+
+ data = buf->record_data + SizeOfHeapMultiInsert;
+
+ /*
+ * OffsetNumbers (which are not of interest to us) are stored when
+ * XLOG_HEAP_INIT_PAGE is not set -- skip over them.
+ */
+ if (!isinit)
+ data += sizeof(OffsetNumber) * xlrec->ntuples;
+
+ for (i = 0; i < xlrec->ntuples; i++)
+ {
+ ReorderBufferChange *change;
+ xl_multi_insert_tuple *xlhdr;
+ int datalen;
+ ReorderBufferTupleBuf *tuple;
+
+ change = ReorderBufferGetChange(ctx->reorder);
+ change->action = REORDER_BUFFER_CHANGE_INSERT;
+ memcpy(&change->relnode, &xlrec->node, sizeof(RelFileNode));
+
+ /*
+ * CONTAINS_NEW_TUPLE will always be set currently as multi_insert
+ * isn't used for catalogs, but better be future proof.
+ *
+ * We decode the tuple in pretty much the same way as DecodeXLogTuple,
+ * but since the layout is slightly different, we can't use it here.
+ */
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+ {
+ change->newtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+
+ tuple = change->newtuple;
+
+ /* not a disk based tuple */
+ ItemPointerSetInvalid(&tuple->tuple.t_self);
+
+ xlhdr = (xl_multi_insert_tuple *) SHORTALIGN(data);
+ data = ((char *) xlhdr) + SizeOfMultiInsertTuple;
+ datalen = xlhdr->datalen;
+
+ /* we can only figure this out after reassembling the transactions */
+ tuple->tuple.t_tableOid = InvalidOid;
+ tuple->tuple.t_data = &tuple->header;
+ tuple->tuple.t_len = datalen + offsetof(HeapTupleHeaderData, t_bits);
+
+ memset(&tuple->header, 0, sizeof(HeapTupleHeaderData));
+
+ memcpy((char *) &tuple->header + offsetof(HeapTupleHeaderData, t_bits),
+ (char *) data,
+ datalen);
+ data += datalen;
+
+ tuple->header.t_infomask = xlhdr->t_infomask;
+ tuple->header.t_infomask2 = xlhdr->t_infomask2;
+ tuple->header.t_hoff = xlhdr->t_hoff;
+ }
+
+ ReorderBufferQueueChange(ctx->reorder, r->xl_xid, buf->origptr, change);
+ }
+}
+
+/*
+ * Read a tuple of size 'len' from 'data' into 'tuple'.
+ */
+static void
+DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
+{
+ xl_heap_header xlhdr;
+ int datalen = len - SizeOfHeapHeader;
+
+ Assert(datalen >= 0);
+ Assert(datalen <= MaxHeapTupleSize);
+
+ tuple->tuple.t_len = datalen + offsetof(HeapTupleHeaderData, t_bits);
+
+ /* not a disk based tuple */
+ ItemPointerSetInvalid(&tuple->tuple.t_self);
+
+ /* we can only figure this out after reassembling the transactions */
+ tuple->tuple.t_tableOid = InvalidOid;
+ tuple->tuple.t_data = &tuple->header;
+
+ /* data is not stored aligned, copy to aligned storage */
+ memcpy((char *) &xlhdr,
+ data,
+ SizeOfHeapHeader);
+
+ memset(&tuple->header, 0, sizeof(HeapTupleHeaderData));
+
+ memcpy((char *) &tuple->header + offsetof(HeapTupleHeaderData, t_bits),
+ data + SizeOfHeapHeader,
+ datalen);
+
+ tuple->header.t_infomask = xlhdr.t_infomask;
+ tuple->header.t_infomask2 = xlhdr.t_infomask2;
+ tuple->header.t_hoff = xlhdr.t_hoff;
+}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
new file mode 100644
index 0000000..656e995
--- /dev/null
+++ b/src/backend/replication/logical/logical.c
@@ -0,0 +1,1046 @@
+/*-------------------------------------------------------------------------
+ *
+ * logical.c
+ *
+ * Logical decoding shared memory management
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logical/logical.c
+ *
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+#include <sys/stat.h>
+
+#include "access/transam.h"
+
+#include "fmgr.h"
+#include "miscadmin.h"
+
+#include "replication/logical.h"
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/fd.h"
+
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+/*
+ * logical replication on-disk data structures.
+ */
+typedef struct LogicalDecodingSlotOnDisk
+{
+ uint32 magic;
+ LogicalDecodingSlot slot;
+} LogicalDecodingSlotOnDisk;
+
+#define LOGICAL_MAGIC 0x1051CA1 /* format identifier */
+
+/* Control array for logical decoding */
+LogicalDecodingCtlData *LogicalDecodingCtl = NULL;
+
+/* My slot for logical rep in the shared memory array */
+LogicalDecodingSlot *MyLogicalDecodingSlot = NULL;
+
+/* user settable parameters */
+int max_logical_slots = 0; /* the maximum number of logical slots */
+
+static void LogicalSlotKill(int code, Datum arg);
+
+/* persistency functions */
+static void RestoreLogicalSlot(const char *name);
+static void CreateLogicalSlot(LogicalDecodingSlot *slot);
+static void SaveLogicalSlot(LogicalDecodingSlot *slot);
+static void SaveLogicalSlotInternal(LogicalDecodingSlot *slot, const char *path);
+static void DeleteLogicalSlot(LogicalDecodingSlot *slot);
+
+
+/* Report shared-memory space needed by LogicalDecodingShmemInit */
+Size
+LogicalDecodingShmemSize(void)
+{
+ Size size = 0;
+
+ if (max_logical_slots == 0)
+ return size;
+
+ size = offsetof(LogicalDecodingCtlData, logical_slots);
+ size = add_size(size,
+ mul_size(max_logical_slots, sizeof(LogicalDecodingSlot)));
+
+ return size;
+}
+
+/* Allocate and initialize walsender-related shared memory */
+void
+LogicalDecodingShmemInit(void)
+{
+ bool found;
+
+ if (max_logical_slots == 0)
+ return;
+
+ LogicalDecodingCtl = (LogicalDecodingCtlData *)
+ ShmemInitStruct("Logical Decoding Ctl", LogicalDecodingShmemSize(),
+ &found);
+
+ if (!found)
+ {
+ int i;
+
+ /* First time through, so initialize */
+ MemSet(LogicalDecodingCtl, 0, LogicalDecodingShmemSize());
+
+ LogicalDecodingCtl->xmin = InvalidTransactionId;
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *slot =
+ &LogicalDecodingCtl->logical_slots[i];
+
+ slot->xmin = InvalidTransactionId;
+ slot->effective_xmin = InvalidTransactionId;
+ SpinLockInit(&slot->mutex);
+ }
+ }
+}
+
+static void
+LogicalSlotKill(int code, Datum arg)
+{
+ /* LOCK? */
+ if (MyLogicalDecodingSlot && MyLogicalDecodingSlot->active)
+ {
+ MyLogicalDecodingSlot->active = false;
+ }
+ MyLogicalDecodingSlot = NULL;
+}
+
+/*
+ * Set the xmin required for catalog timetravel for the specific decoding slot.
+ */
+void
+IncreaseLogicalXminForSlot(XLogRecPtr lsn, TransactionId xmin)
+{
+ Assert(MyLogicalDecodingSlot != NULL);
+
+ SpinLockAcquire(&MyLogicalDecodingSlot->mutex);
+
+ /*
+ * Only increase if the previous values have been applied, otherwise we
+ * might never end up updating if the receiver acks too slowly.
+ */
+ if (MyLogicalDecodingSlot->candidate_lsn == InvalidXLogRecPtr ||
+ (lsn == MyLogicalDecodingSlot->candidate_lsn &&
+ !TransactionIdIsValid(MyLogicalDecodingSlot->candidate_xmin)))
+ {
+ MyLogicalDecodingSlot->candidate_lsn = lsn;
+ MyLogicalDecodingSlot->candidate_xmin = xmin;
+ elog(DEBUG1, "got new xmin %u at %X/%X", xmin,
+ (uint32) (lsn >> 32), (uint32) lsn);
+ }
+ SpinLockRelease(&MyLogicalDecodingSlot->mutex);
+}
+
+void
+IncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart_lsn)
+{
+ Assert(MyLogicalDecodingSlot != NULL);
+ Assert(restart_lsn != InvalidXLogRecPtr);
+ Assert(current_lsn != InvalidXLogRecPtr);
+
+ SpinLockAcquire(&MyLogicalDecodingSlot->mutex);
+
+ /*
+ * Only increase if the previous values have been applied, otherwise we
+ * might never end up updating if the receiver acks too slowly. A missed
+ * value here will just cause some extra effort after reconnecting.
+ */
+ if (MyLogicalDecodingSlot->candidate_lsn == InvalidXLogRecPtr ||
+ (current_lsn == MyLogicalDecodingSlot->candidate_lsn &&
+ MyLogicalDecodingSlot->candidate_restart_decoding == InvalidXLogRecPtr))
+ {
+ MyLogicalDecodingSlot->candidate_lsn = current_lsn;
+ MyLogicalDecodingSlot->candidate_restart_decoding = restart_lsn;
+
+ elog(DEBUG1, "got new restart lsn %X/%X at %X/%X",
+ (uint32) (restart_lsn >> 32), (uint32) restart_lsn,
+ (uint32) (current_lsn >> 32), (uint32) current_lsn);
+
+ }
+ SpinLockRelease(&MyLogicalDecodingSlot->mutex);
+}
+
+void
+LogicalConfirmReceivedLocation(XLogRecPtr lsn)
+{
+ Assert(lsn != InvalidXLogRecPtr);
+
+ /* Do an unlocked check for candidate_lsn first. */
+ if (MyLogicalDecodingSlot->candidate_lsn != InvalidXLogRecPtr)
+ {
+ bool updated_xmin = false;
+ bool updated_restart = false;
+
+ /* use volatile pointer to prevent code rearrangement */
+ volatile LogicalDecodingSlot *slot = MyLogicalDecodingSlot;
+
+ SpinLockAcquire(&slot->mutex);
+
+ slot->confirmed_flush = lsn;
+
+ /* if were past the location required for bumping xmin, do so */
+ if (slot->candidate_lsn != InvalidXLogRecPtr &&
+ slot->candidate_lsn < lsn)
+ {
+ /*
+ * We have to write the changed xmin to disk *before* we change
+ * the in-memory value, otherwise after a crash we wouldn't know
+ * that some catalog tuples might have been removed already.
+ *
+ * Ensure that by first writing to ->xmin and only update
+ * ->effective_xmin once the new state is fsynced to disk. After a
+ * crash ->effective_xmin is set to ->xmin.
+ */
+ if (TransactionIdIsValid(slot->candidate_xmin) &&
+ slot->xmin != slot->candidate_xmin)
+ {
+ slot->xmin = slot->candidate_xmin;
+ updated_xmin = true;
+ }
+
+ if (slot->candidate_restart_decoding != InvalidXLogRecPtr &&
+ slot->restart_decoding != slot->candidate_restart_decoding)
+ {
+ slot->restart_decoding = slot->candidate_restart_decoding;
+ updated_restart = true;
+ }
+
+ slot->candidate_lsn = InvalidXLogRecPtr;
+ slot->candidate_xmin = InvalidTransactionId;
+ slot->candidate_restart_decoding = InvalidXLogRecPtr;
+ }
+
+ SpinLockRelease(&slot->mutex);
+
+ /* first write new xmin to disk, so we know whats up after a crash */
+ if (updated_xmin || updated_restart)
+ /* cast away volatile, thats ok. */
+ SaveLogicalSlot((LogicalDecodingSlot *) slot);
+
+ /*
+ * now the new xmin is safely on disk, we can let the global value
+ * advance
+ */
+ if (updated_xmin)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->effective_xmin = slot->xmin;
+ SpinLockRelease(&slot->mutex);
+
+ ComputeLogicalXmin();
+ }
+ }
+ else
+ {
+ volatile LogicalDecodingSlot *slot = MyLogicalDecodingSlot;
+
+ SpinLockAcquire(&slot->mutex);
+ slot->confirmed_flush = lsn;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
+/*
+ * Compute the xmin between all of the decoding slots and store it in
+ * WalSndCtlData.
+ */
+void
+ComputeLogicalXmin(void)
+{
+ int i;
+ TransactionId xmin = InvalidTransactionId;
+ LogicalDecodingSlot *slot;
+
+ Assert(LogicalDecodingCtl);
+
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&slot->mutex);
+ if (slot->in_use &&
+ TransactionIdIsValid(slot->effective_xmin) && (
+ !TransactionIdIsValid(xmin) ||
+ TransactionIdPrecedes(slot->effective_xmin, xmin))
+ )
+ {
+ xmin = slot->effective_xmin;
+ }
+ SpinLockRelease(&slot->mutex);
+ }
+ LogicalDecodingCtl->xmin = xmin;
+ LWLockRelease(ProcArrayLock);
+
+ elog(DEBUG1, "computed new global xmin for decoding: %u", xmin);
+}
+
+/*
+ * Make sure the current settings & environment are capable of doing logical
+ * replication.
+ */
+void
+CheckLogicalReplicationRequirements(void)
+{
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ /* XXX invent class 51 for code 51028? */
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication requires wal_level=logical")));
+
+ if (MyDatabaseId == InvalidOid)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication requires to be connected to a database")));
+
+ if (max_logical_slots == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ (errmsg("logical replication requires needs max_logical_slots > 0"))));
+}
+
+/*
+ * Search for a free slot, mark it as used and acquire a valid xmin horizon
+ * value.
+ */
+void
+LogicalDecodingAcquireFreeSlot(const char *name, const char *plugin)
+{
+ LogicalDecodingSlot *slot;
+ bool name_in_use;
+ int i;
+
+ Assert(!MyLogicalDecodingSlot);
+
+ CheckLogicalReplicationRequirements();
+
+ LWLockAcquire(LogicalReplicationCtlLock, LW_EXCLUSIVE);
+
+ /* First, make sure the requested name is not in use. */
+
+ name_in_use = false;
+ for (i = 0; i < max_logical_slots && !name_in_use; i++)
+ {
+ LogicalDecodingSlot *s = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&s->mutex);
+ if (s->in_use && strcmp(name, NameStr(s->name)) == 0)
+ name_in_use = true;
+ SpinLockRelease(&s->mutex);
+ }
+
+ if (name_in_use)
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("There already is a logical slot named \"%s\"", name)));
+
+ /* Find the first available (not in_use (=> not active)) slot. */
+
+ slot = NULL;
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *s = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&s->mutex);
+ if (!s->in_use)
+ {
+ Assert(!s->active);
+ /* NOT releasing the lock yet */
+ slot = s;
+ break;
+ }
+ SpinLockRelease(&s->mutex);
+ }
+
+ LWLockRelease(LogicalReplicationCtlLock);
+
+ if (!slot)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+ errmsg("couldn't find free logical slot. free one or increase max_logical_slots")));
+
+ MyLogicalDecodingSlot = slot;
+
+ /* Lets start with enough information if we can */
+ if (!RecoveryInProgress())
+ slot->restart_decoding = LogStandbySnapshot();
+ else
+ slot->restart_decoding = GetRedoRecPtr();
+
+ slot->in_use = true;
+ slot->active = true;
+ slot->database = MyDatabaseId;
+ /* XXX: do we want to use truncate identifier instead? */
+ strncpy(NameStr(slot->plugin), plugin, NAMEDATALEN);
+ NameStr(slot->plugin)[NAMEDATALEN - 1] = '\0';
+ strncpy(NameStr(slot->name), name, NAMEDATALEN);
+ NameStr(slot->name)[NAMEDATALEN - 1] = '\0';
+
+ /* Arrange to clean up at exit/error */
+ on_shmem_exit(LogicalSlotKill, 0);
+
+ /* release slot so it can be examined by others */
+ SpinLockRelease(&slot->mutex);
+
+ /* XXX: verify that the specified plugin is valid */
+
+ /*
+ * Acquire the current global xmin value and directly set the logical xmin
+ * before releasing the lock if necessary. We do this so wal decoding is
+ * guaranteed to have all catalog rows produced by xacts with an xid >
+ * walsnd->xmin available.
+ *
+ * We can't use ComputeLogicalXmin here as that acquires ProcArrayLock
+ * separately which would open a short window for the global xmin to
+ * advance above walsnd->xmin.
+ */
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+ slot->effective_xmin = GetOldestXmin(true, true, true, true);
+ slot->xmin = slot->effective_xmin;
+
+ if (!TransactionIdIsValid(LogicalDecodingCtl->xmin) ||
+ NormalTransactionIdPrecedes(slot->effective_xmin, LogicalDecodingCtl->xmin))
+ LogicalDecodingCtl->xmin = slot->effective_xmin;
+ LWLockRelease(ProcArrayLock);
+
+ Assert(slot->effective_xmin <= GetOldestXmin(true, true, true, false));
+
+ LWLockAcquire(LogicalReplicationCtlLock, LW_EXCLUSIVE);
+ CreateLogicalSlot(slot);
+ LWLockRelease(LogicalReplicationCtlLock);
+}
+
+/*
+ * Find an previously initiated slot and mark it as used again.
+ */
+void
+LogicalDecodingReAcquireSlot(const char *name)
+{
+ LogicalDecodingSlot *slot;
+ int i;
+
+ CheckLogicalReplicationRequirements();
+
+ Assert(!MyLogicalDecodingSlot);
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&slot->mutex);
+ if (slot->in_use && strcmp(name, NameStr(slot->name)) == 0)
+ {
+ MyLogicalDecodingSlot = slot;
+ /* NOT releasing the lock yet */
+ break;
+ }
+ SpinLockRelease(&slot->mutex);
+ }
+
+ if (!MyLogicalDecodingSlot)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("couldn't find logical slot \"%s\"", name)));
+
+ slot = MyLogicalDecodingSlot;
+
+ if (slot->active)
+ {
+ SpinLockRelease(&slot->mutex);
+ MyLogicalDecodingSlot = NULL;
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg("slot already active")));
+ }
+
+ slot->active = true;
+ /* now that we've marked it as active, we release our lock */
+ SpinLockRelease(&slot->mutex);
+
+ /* Don't let the user switch the database... */
+ if (slot->database != MyDatabaseId)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->active = false;
+ SpinLockRelease(&slot->mutex);
+
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ (errmsg("START_LOGICAL_REPLICATION needs to be run in the same database as INIT_LOGICAL_REPLICATION"))));
+ }
+
+ /* Arrange to clean up at exit */
+ on_shmem_exit(LogicalSlotKill, 0);
+
+ SaveLogicalSlot(slot);
+}
+
+/*
+ * Temporarily remove a logical decoding slot, this or another backend can
+ * reacquire it later.
+ */
+void
+LogicalDecodingReleaseSlot(void)
+{
+ LogicalDecodingSlot *slot;
+
+ CheckLogicalReplicationRequirements();
+
+ slot = MyLogicalDecodingSlot;
+
+ Assert(slot != NULL && slot->active);
+
+ SpinLockAcquire(&slot->mutex);
+ slot->active = false;
+ SpinLockRelease(&slot->mutex);
+
+ MyLogicalDecodingSlot = NULL;
+
+ SaveLogicalSlot(slot);
+
+ cancel_shmem_exit(LogicalSlotKill, 0);
+}
+
+/*
+ * Permanently remove a logical decoding slot.
+ */
+void
+LogicalDecodingFreeSlot(const char *name)
+{
+ LogicalDecodingSlot *slot = NULL;
+ int i;
+
+ CheckLogicalReplicationRequirements();
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&slot->mutex);
+ if (slot->in_use && strcmp(name, NameStr(slot->name)) == 0)
+ {
+ /* NOT releasing the lock yet */
+ break;
+ }
+ SpinLockRelease(&slot->mutex);
+ slot = NULL;
+ }
+
+ if (!slot)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("couldn't find logical slot \"%s\"", name)));
+
+ if (slot->active)
+ {
+ SpinLockRelease(&slot->mutex);
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg("cannot free active logical slot \"%s\"", name)));
+ }
+
+ /*
+ * Mark it as as active, so nobody can claim this slot while we are
+ * working on it. We don't want to hold the spinlock while doing stuff
+ * like fsyncing the state file to disk.
+ */
+ slot->active = true;
+
+ SpinLockRelease(&slot->mutex);
+
+ /*
+ * Start critical section, we can't to be interrupted while on-disk/memory
+ * state aren't coherent.
+ */
+ START_CRIT_SECTION();
+
+ DeleteLogicalSlot(slot);
+
+ /* ok, everything gone, after a crash we now wouldn't restore this slot */
+ SpinLockAcquire(&slot->mutex);
+ slot->active = false;
+ slot->in_use = false;
+ SpinLockRelease(&slot->mutex);
+
+ END_CRIT_SECTION();
+
+ /* slot is dead and doesn't nail the xmin anymore */
+ ComputeLogicalXmin();
+}
+
+/*
+ * Load replication state from disk into memory at server startup.
+ */
+void
+StartupLogicalReplication(XLogRecPtr checkPointRedo)
+{
+ DIR *logical_dir;
+ struct dirent *logical_de;
+
+ ereport(DEBUG1,
+ (errmsg("starting up logical decoding from %X/%X",
+ (uint32) (checkPointRedo >> 32), (uint32) checkPointRedo)));
+
+ /* restore all slots */
+ logical_dir = AllocateDir("pg_llog");
+ while ((logical_de = ReadDir(logical_dir, "pg_llog")) != NULL)
+ {
+ if (strcmp(logical_de->d_name, ".") == 0 ||
+ strcmp(logical_de->d_name, "..") == 0)
+ continue;
+
+ /* one of our own directories */
+ if (strcmp(logical_de->d_name, "snapshots") == 0)
+ continue;
+
+ /* we crashed while a slot was being setup or deleted, clean up */
+ if (strcmp(logical_de->d_name, "new") == 0 ||
+ strcmp(logical_de->d_name, "old") == 0)
+ {
+ char path[MAXPGPATH];
+
+ sprintf(path, "pg_llog/%s", logical_de->d_name);
+
+ if (!rmtree(path, true))
+ {
+ FreeDir(logical_dir);
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not remove directory \"%s\": %m",
+ path)));
+ }
+ continue;
+ }
+
+ RestoreLogicalSlot(logical_de->d_name);
+ }
+ FreeDir(logical_dir);
+
+ if (max_logical_slots <= 0)
+ return;
+
+ /* Now that we have recovered all the data, compute logical xmin */
+ ComputeLogicalXmin();
+
+ ReorderBufferStartup();
+}
+
+/* ----
+ * Manipulation of ondisk state of logical slots
+ * ----
+ */
+static void
+CreateLogicalSlot(LogicalDecodingSlot *slot)
+{
+ char tmppath[MAXPGPATH];
+ char path[MAXPGPATH];
+
+ START_CRIT_SECTION();
+
+ sprintf(tmppath, "pg_llog/new");
+ sprintf(path, "pg_llog/%s", NameStr(slot->name));
+
+ if (mkdir(tmppath, S_IRWXU) < 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m",
+ tmppath)));
+
+ fsync_fname(tmppath, true);
+
+ SaveLogicalSlotInternal(slot, tmppath);
+
+ if (rename(tmppath, path) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+ tmppath, path)));
+ }
+
+ fsync_fname(path, true);
+
+ END_CRIT_SECTION();
+}
+
+static void
+SaveLogicalSlot(LogicalDecodingSlot *slot)
+{
+ char path[MAXPGPATH];
+
+ sprintf(path, "pg_llog/%s", NameStr(slot->name));
+ SaveLogicalSlotInternal(slot, path);
+}
+
+/*
+ * Shared functionality between saving and creating a logical slot.
+ */
+static void
+SaveLogicalSlotInternal(LogicalDecodingSlot *slot, const char *dir)
+{
+ char tmppath[MAXPGPATH];
+ char path[MAXPGPATH];
+ int fd;
+ LogicalDecodingSlotOnDisk cp;
+
+ /* silence valgrind :( */
+ memset(&cp, 0, sizeof(LogicalDecodingSlotOnDisk));
+
+ sprintf(tmppath, "%s/state.tmp", dir);
+ sprintf(path, "%s/state", dir);
+
+ START_CRIT_SECTION();
+
+ fd = OpenTransientFile(tmppath,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+ if (fd < 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not create logical checkpoint file \"%s\": %m",
+ tmppath)));
+
+ cp.magic = LOGICAL_MAGIC;
+
+ SpinLockAcquire(&slot->mutex);
+
+ cp.slot.xmin = slot->xmin;
+ cp.slot.effective_xmin = slot->effective_xmin;
+
+ strcpy(NameStr(cp.slot.name), NameStr(slot->name));
+ strcpy(NameStr(cp.slot.plugin), NameStr(slot->plugin));
+
+ cp.slot.database = slot->database;
+ cp.slot.confirmed_flush = slot->confirmed_flush;
+ cp.slot.restart_decoding = slot->restart_decoding;
+ cp.slot.candidate_lsn = InvalidXLogRecPtr;
+ cp.slot.candidate_xmin = InvalidTransactionId;
+ cp.slot.candidate_restart_decoding = InvalidXLogRecPtr;
+ cp.slot.in_use = slot->in_use;
+ cp.slot.active = false;
+
+ SpinLockRelease(&slot->mutex);
+
+ if ((write(fd, &cp, sizeof(cp))) != sizeof(cp))
+ {
+ CloseTransientFile(fd);
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not write logical checkpoint file \"%s\": %m",
+ tmppath)));
+ }
+
+ /* fsync the file */
+ if (pg_fsync(fd) != 0)
+ {
+ CloseTransientFile(fd);
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not fsync logical checkpoint \"%s\": %m",
+ tmppath)));
+ }
+
+ CloseTransientFile(fd);
+
+ /* rename to permanent file, fsync file and directory */
+ if (rename(tmppath, path) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+ tmppath, path)));
+ }
+
+ fsync_fname((char *) dir, true);
+ fsync_fname(path, false);
+
+ END_CRIT_SECTION();
+}
+
+
+static void
+DeleteLogicalSlot(LogicalDecodingSlot *slot)
+{
+ char path[MAXPGPATH];
+ char tmppath[] = "pg_llog/old";
+
+ START_CRIT_SECTION();
+
+ sprintf(path, "pg_llog/%s", NameStr(slot->name));
+
+ if (rename(path, tmppath) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+ path, tmppath)));
+ }
+
+ /* make sure no partial state is visible after a crash */
+ fsync_fname(tmppath, true);
+ fsync_fname("pg_llog", true);
+
+ if (!rmtree(tmppath, true))
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not remove directory \"%s\": %m",
+ tmppath)));
+ }
+
+ END_CRIT_SECTION();
+}
+
+/*
+ * Load a single ondisk slot into memory.
+ */
+static void
+RestoreLogicalSlot(const char *name)
+{
+ LogicalDecodingSlotOnDisk cp;
+ int i;
+ char path[MAXPGPATH];
+ int fd;
+ bool restored = false;
+ int readBytes;
+
+ START_CRIT_SECTION();
+
+ /* delete temp file if it exists */
+ sprintf(path, "pg_llog/%s/state.tmp", name);
+ if (unlink(path) < 0 && errno != ENOENT)
+ ereport(PANIC, (errmsg("failed while unlinking %s", path)));
+
+ sprintf(path, "pg_llog/%s/state", name);
+
+ elog(DEBUG1, "restoring logical slot from %s", path);
+
+ fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+
+ /*
+ * We do not need to handle this as we are rename()ing the directory into
+ * place only after we fsync()ed the state file.
+ */
+ if (fd < 0)
+ ereport(PANIC, (errmsg("could not open state file %s", path)));
+
+ readBytes = read(fd, &cp, sizeof(cp));
+ if (readBytes != sizeof(cp))
+ {
+ int saved_errno = errno;
+
+ CloseTransientFile(fd);
+ errno = saved_errno;
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not read logical checkpoint file \"%s\": %m, read %d of %zu",
+ path, readBytes, sizeof(cp))));
+ }
+
+ CloseTransientFile(fd);
+
+ if (cp.magic != LOGICAL_MAGIC)
+ ereport(PANIC, (errmsg("Logical checkpoint has wrong magic %u instead of %u",
+ cp.magic, LOGICAL_MAGIC)));
+
+ /* nothing can be active yet, don't lock anything */
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *slot;
+
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ if (slot->in_use)
+ continue;
+
+ slot->xmin = cp.slot.xmin;
+ /* XXX: after a crash, always use xmin, not effective_xmin */
+ slot->effective_xmin = cp.slot.xmin;
+ strcpy(NameStr(slot->name), NameStr(cp.slot.name));
+ strcpy(NameStr(slot->plugin), NameStr(cp.slot.plugin));
+ slot->database = cp.slot.database;
+ slot->restart_decoding = cp.slot.restart_decoding;
+ slot->confirmed_flush = cp.slot.confirmed_flush;
+ slot->candidate_lsn = InvalidXLogRecPtr;
+ slot->candidate_xmin = InvalidTransactionId;
+ slot->candidate_restart_decoding = InvalidXLogRecPtr;
+ slot->in_use = true;
+ slot->active = false;
+ restored = true;
+
+ /*
+ * FIXME: Do some validation here.
+ */
+ break;
+ }
+
+ if (!restored)
+ ereport(PANIC,
+ (errmsg("too many logical slots active before shutdown, increase max_logical_slots and try again")));
+
+ END_CRIT_SECTION();
+}
+
+
+static void
+LoadOutputPlugin(OutputPluginCallbacks *callbacks, char *plugin)
+{
+ /* lookup symbols in the shared libarary */
+
+ /* optional */
+ callbacks->init_cb = (LogicalDecodeInitCB)
+ load_external_function(plugin, "pg_decode_init", false, NULL);
+
+ /* required */
+ callbacks->begin_cb = (LogicalDecodeBeginCB)
+ load_external_function(plugin, "pg_decode_begin_txn", true, NULL);
+
+ /* required */
+ callbacks->change_cb = (LogicalDecodeChangeCB)
+ load_external_function(plugin, "pg_decode_change", true, NULL);
+
+ /* required */
+ callbacks->commit_cb = (LogicalDecodeCommitCB)
+ load_external_function(plugin, "pg_decode_commit_txn", true, NULL);
+
+ /* optional */
+ callbacks->cleanup_cb = (LogicalDecodeCleanupCB)
+ load_external_function(plugin, "pg_decode_clean", false, NULL);
+}
+
+/*
+ * Context management functions to make coordination between the different
+ * logical decoding pieces.
+ */
+
+/*
+ * Callbacks for ReorderBuffer which add in some more information and then call
+ * output_plugin.h plugins.
+ */
+static void
+begin_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+
+ ctx->callbacks.begin_cb(ctx, txn);
+}
+
+static void
+commit_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+
+ ctx->callbacks.commit_cb(ctx, txn, commit_lsn);
+}
+
+static void
+change_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+
+ ctx->callbacks.change_cb(ctx, txn, relation, change);
+}
+
+LogicalDecodingContext *
+CreateLogicalDecodingContext(LogicalDecodingSlot *slot,
+ bool is_init,
+ XLogRecPtr start_lsn,
+ List *output_plugin_options,
+ XLogPageReadCB read_page,
+ LogicalOutputPluginWriterPrepareWrite prepare_write,
+ LogicalOutputPluginWriterWrite do_write)
+{
+ MemoryContext context;
+ MemoryContext old_context;
+ TransactionId xmin_horizon;
+ LogicalDecodingContext *ctx;
+
+ context = AllocSetContextCreate(TopMemoryContext,
+ "ReorderBuffer",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ old_context = MemoryContextSwitchTo(context);
+ ctx = palloc0(sizeof(LogicalDecodingContext));
+
+
+ /* load output plugins first, so we detect a wrong output plugin early */
+ LoadOutputPlugin(&ctx->callbacks, NameStr(slot->plugin));
+
+ if (is_init && start_lsn != InvalidXLogRecPtr)
+ elog(ERROR, "cannot initially start at a specified lsn");
+
+ if (is_init)
+ xmin_horizon = slot->xmin;
+ else
+ xmin_horizon = InvalidTransactionId;
+
+ ctx->slot = slot;
+
+ ctx->reader = XLogReaderAllocate(read_page, ctx);
+ ctx->reader->private_data = ctx;
+
+ ctx->reorder = ReorderBufferAllocate();
+ ctx->snapshot_builder =
+ AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn);
+
+ ctx->reorder->private_data = ctx;
+
+ ctx->reorder->begin = begin_txn_wrapper;
+ ctx->reorder->apply_change = change_wrapper;
+ ctx->reorder->commit = commit_txn_wrapper;
+
+ ctx->out = makeStringInfo();
+ ctx->prepare_write = prepare_write;
+ ctx->write = do_write;
+
+ ctx->output_plugin_options = output_plugin_options;
+
+ if (is_init)
+ ctx->stop_after_consistent = true;
+ else
+ ctx->stop_after_consistent = false;
+
+ /* call output plugin initialization callback */
+ if (ctx->callbacks.init_cb != NULL)
+ ctx->callbacks.init_cb(ctx, is_init);
+
+ MemoryContextSwitchTo(old_context);
+
+ return ctx;
+}
+
+void
+FreeLogicalDecodingContext(LogicalDecodingContext *ctx)
+{
+ if (ctx->callbacks.cleanup_cb != NULL)
+ ctx->callbacks.cleanup_cb(ctx);
+}
+
+
+/* has the initial snapshot found a consistent state? */
+bool
+LogicalDecodingContextReady(LogicalDecodingContext *ctx)
+{
+ return SnapBuildCurrentState(ctx->snapshot_builder) == SNAPBUILD_CONSISTENT;
+}
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
new file mode 100644
index 0000000..9837a95
--- /dev/null
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -0,0 +1,361 @@
+/*-------------------------------------------------------------------------
+ *
+ * logicalfuncs.c
+ *
+ * Support functions for using xlog decoding
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logicalfuncs.c
+ *
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "fmgr.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "storage/fd.h"
+
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+
+Datum init_logical_replication(PG_FUNCTION_ARGS);
+Datum stop_logical_replication(PG_FUNCTION_ARGS);
+Datum pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS);
+
+/* FIXME: duplicate code with pg_xlogdump, similar to walsender.c */
+static void
+XLogRead(char *buf, XLogRecPtr startptr, Size count)
+{
+ char *p;
+ XLogRecPtr recptr;
+ Size nbytes;
+
+ static int sendFile = -1;
+ static XLogSegNo sendSegNo = 0;
+ static uint32 sendOff = 0;
+
+ p = buf;
+ recptr = startptr;
+ nbytes = count;
+
+ while (nbytes > 0)
+ {
+ uint32 startoff;
+ int segbytes;
+ int readbytes;
+
+ startoff = recptr % XLogSegSize;
+
+ if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))
+ {
+ char path[MAXPGPATH];
+
+ /* Switch to another logfile segment */
+ if (sendFile >= 0)
+ close(sendFile);
+
+ XLByteToSeg(recptr, sendSegNo);
+
+ XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+ sendFile = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);
+
+ if (sendFile < 0)
+ {
+ if (errno == ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("requested WAL segment %s has already been removed",
+ path)));
+ else
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m",
+ path)));
+ }
+ sendOff = 0;
+ }
+
+ /* Need to seek in the file? */
+ if (sendOff != startoff)
+ {
+ if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0)
+ {
+ char path[MAXPGPATH];
+
+ XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not seek in log segment %s to offset %u: %m",
+ path, startoff)));
+ }
+ sendOff = startoff;
+ }
+
+ /* How many bytes are within this segment? */
+ if (nbytes > (XLogSegSize - startoff))
+ segbytes = XLogSegSize - startoff;
+ else
+ segbytes = nbytes;
+
+ readbytes = read(sendFile, p, segbytes);
+ if (readbytes <= 0)
+ {
+ char path[MAXPGPATH];
+
+ XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read from log segment %s, offset %u, length %lu: %m",
+ path, sendOff, (unsigned long) segbytes)));
+ }
+
+ /* Update state for read */
+ recptr += readbytes;
+
+ sendOff += readbytes;
+ nbytes -= readbytes;
+ p += readbytes;
+ }
+}
+
+int
+logical_read_local_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr,
+ int reqLen, XLogRecPtr targetRecPtr, char *cur_page, TimeLineID *pageTLI)
+{
+ XLogRecPtr flushptr,
+ loc;
+ int count;
+
+ loc = targetPagePtr + reqLen;
+ while (1)
+ {
+ flushptr = GetFlushRecPtr();
+ if (loc <= flushptr)
+ break;
+ pg_usleep(1000L);
+ }
+
+ /* more than one block available */
+ if (targetPagePtr + XLOG_BLCKSZ <= flushptr)
+ count = XLOG_BLCKSZ;
+ /* not enough data there */
+ else if (targetPagePtr + reqLen > flushptr)
+ return -1;
+ /* part of the page available */
+ else
+ count = flushptr - targetPagePtr;
+
+ /* FIXME: more sensible/efficient implementation */
+ XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+
+ return count;
+}
+
+static void
+DummyWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ elog(ERROR, "init_logical_replication shouldn't be writing anything");
+}
+
+Datum
+init_logical_replication(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+ Name plugin = PG_GETARG_NAME(1);
+
+ char xpos[MAXFNAMELEN];
+
+ TupleDesc tupdesc;
+ HeapTuple tuple;
+ Datum result;
+ Datum values[2];
+ bool nulls[2];
+ LogicalDecodingContext *ctx = NULL;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ /* Acquire a logical replication slot */
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingAcquireFreeSlot(NameStr(*name), NameStr(*plugin));
+
+ /* make sure we don't end up with an unreleased slot */
+ PG_TRY();
+ {
+ XLogRecPtr startptr;
+
+ /*
+ * Use the same initial_snapshot_reader, but with our own read_page
+ * callback that does not depend on walsender.
+ */
+ ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, true,
+ InvalidXLogRecPtr, NIL,
+ logical_read_local_xlog_page,
+ DummyWrite, DummyWrite);
+
+ /* setup from where to read xlog */
+ startptr = ctx->slot->restart_decoding;
+
+ /* Wait for a consistent starting point */
+ for (;;)
+ {
+ XLogRecord *record;
+ XLogRecordBuffer buf;
+ char *err = NULL;
+
+ /* the read_page callback waits for new WAL */
+ record = XLogReadRecord(ctx->reader, startptr, &err);
+ if (err)
+ elog(ERROR, "%s", err);
+
+ Assert(record);
+
+ startptr = InvalidXLogRecPtr;
+
+ buf.origptr = ctx->reader->ReadRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+ DecodeRecordIntoReorderBuffer(ctx, &buf);
+
+ /* only continue till we found a consistent spot */
+ if (LogicalDecodingContextReady(ctx))
+ break;
+ }
+
+ /* Extract the values we want */
+ MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+ snprintf(xpos, sizeof(xpos), "%X/%X",
+ (uint32) (MyLogicalDecodingSlot->confirmed_flush >> 32),
+ (uint32) MyLogicalDecodingSlot->confirmed_flush);
+ }
+ PG_CATCH();
+ {
+ LogicalDecodingReleaseSlot();
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ values[0] = CStringGetTextDatum(NameStr(MyLogicalDecodingSlot->name));
+ values[1] = CStringGetTextDatum(xpos);
+
+ memset(nulls, 0, sizeof(nulls));
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ result = HeapTupleGetDatum(tuple);
+
+ LogicalDecodingReleaseSlot();
+
+ PG_RETURN_DATUM(result);
+}
+
+Datum
+stop_logical_replication(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingFreeSlot(NameStr(*name));
+
+ PG_RETURN_INT32(0);
+}
+
+/*
+ * Return one row for each logical replication slot currently in use.
+ */
+
+Datum
+pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS 6
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ MemoryContext per_query_ctx;
+ MemoryContext oldcontext;
+ int i;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("materialize mode required, but it is not " \
+ "allowed in this context")));
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+ tupstore = tuplestore_begin_heap(true, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = tupstore;
+ rsinfo->setDesc = tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *slot = &LogicalDecodingCtl->logical_slots[i];
+ Datum values[PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS];
+ bool nulls[PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS];
+ char location[MAXFNAMELEN];
+ const char *slot_name;
+ const char *plugin;
+ TransactionId xmin;
+ XLogRecPtr last_req;
+ bool active;
+ Oid database;
+
+ SpinLockAcquire(&slot->mutex);
+ if (!slot->in_use)
+ {
+ SpinLockRelease(&slot->mutex);
+ continue;
+ }
+ else
+ {
+ xmin = slot->xmin;
+ active = slot->active;
+ database = slot->database;
+ last_req = slot->restart_decoding;
+ slot_name = pstrdup(NameStr(slot->name));
+ plugin = pstrdup(NameStr(slot->plugin));
+ }
+ SpinLockRelease(&slot->mutex);
+
+ memset(nulls, 0, sizeof(nulls));
+
+ snprintf(location, sizeof(location), "%X/%X",
+ (uint32) (last_req >> 32), (uint32) last_req);
+
+ values[0] = CStringGetTextDatum(slot_name);
+ values[1] = CStringGetTextDatum(plugin);
+ values[2] = database;
+ values[3] = BoolGetDatum(active);
+ values[4] = TransactionIdGetDatum(xmin);
+ values[5] = CStringGetTextDatum(location);
+
+ tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+ }
+
+ tuplestore_donestoring(tupstore);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
new file mode 100644
index 0000000..b6df411
--- /dev/null
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -0,0 +1,2548 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer.c
+ *
+ * PostgreSQL logical replay buffer management
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/replication/reorderbuffer.c
+ *
+ * NOTES
+ * This module gets handed individual pieces of transactions in the order
+ * they are written to the WAL and is responsible to reassemble them into
+ * toplevel transaction sized pieces. When a transaction is completely
+ * reassembled - signalled by reading the transaction commit record - it
+ * will then call the output plugin (c.f. ReorderBufferCommit()) with the
+ * individual changes. The output plugins rely on snapshots built by
+ * snapbuild.c which hands them to us.
+ *
+ * Transactions and subtransactions/savepoints in postgres are not
+ * immediately linked to each other from outside the performing
+ * backend. Only at commit/abort (or special xact_assignment records) they
+ * are linked together. Which means that we will have to splice together a
+ * toplevel transaction from its subtransactions. To do that efficiently we
+ * build a binary heap indexed by the smallest current lsn of the individual
+ * subtransactions' changestreams. As the individual streams are inherently
+ * ordered by LSN - since that is where we build them from - the transaction
+ * can easily be reassembled by always using the subtransaction with the
+ * smallest current LSN from the heap.
+ *
+ * In order to cope with large transactions - which can be several times as
+ * big as the available memory - this module supports spooling the contents
+ * of a large transactions to disk. When the transaction is replayed the
+ * contents of individual (sub-)transactions will be read from disk in
+ * chunks.
+ *
+ * This module also has to deal with reassembling toast records from the
+ * individual chunks stored in WAL. When a new (or initial) version of a
+ * tuple is stored in WAL it will always be preceded by the toast chunks
+ * emitted for the columns stored out of line. Within a single toplevel
+ * transaction there will be no other data carrying records between a row's
+ * toast chunks and the row data itself. See ReorderBufferToast* for
+ * details.
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "access/transam.h"
+#include "access/xact.h"
+
+#include "catalog/catalog.h"
+
+#include "common/relpath.h"
+
+#include "lib/binaryheap.h"
+
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h" /* just for SnapBuildSnapDecRefcount */
+#include "replication/logical.h"
+
+#include "storage/bufmgr.h"
+#include "storage/fd.h"
+#include "storage/sinval.h"
+
+#include "utils/builtins.h"
+#include "utils/combocid.h"
+#include "utils/memdebug.h"
+#include "utils/memutils.h"
+#include "utils/relcache.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+#include "utils/tqual.h"
+
+/*
+ * For efficiency and simplicity reasons we want to keep Snapshots, CommandIds
+ * and ComboCids in the same list with the user visible INSERT/UPDATE/DELETE
+ * changes. We don't want to leak those internal values to external users
+ * though (they would just use switch()...default:) because that would make it
+ * harder to add to new user visible values.
+ *
+ * This needs to be synchronized with ReorderBufferChangeType! Adjust the
+ * StaticAssertExpr's in ReorderBufferAllocate if you add anything!
+ */
+typedef enum
+{
+ REORDER_BUFFER_CHANGE_INTERNAL_INSERT,
+ REORDER_BUFFER_CHANGE_INTERNAL_UPDATE,
+ REORDER_BUFFER_CHANGE_INTERNAL_DELETE,
+ REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT,
+ REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
+ REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID
+} ReorderBufferChangeTypeInternal;
+
+/* entry for a hash table we use to map from xid to our transaction state */
+typedef struct ReorderBufferTXNByIdEnt
+{
+ TransactionId xid;
+ ReorderBufferTXN *txn;
+} ReorderBufferTXNByIdEnt;
+
+/* data structures for (relfilenode, ctid) => (cmin, cmax) mapping */
+typedef struct ReorderBufferTupleCidKey
+{
+ RelFileNode relnode;
+ ItemPointerData tid;
+} ReorderBufferTupleCidKey;
+
+typedef struct ReorderBufferTupleCidEnt
+{
+ ReorderBufferTupleCidKey key;
+ CommandId cmin;
+ CommandId cmax;
+ CommandId combocid; /* just for debugging */
+} ReorderBufferTupleCidEnt;
+
+/* k-way in-order change iteration support structures */
+typedef struct ReorderBufferIterTXNEntry
+{
+ XLogRecPtr lsn;
+ ReorderBufferChange *change;
+ ReorderBufferTXN *txn;
+ int fd;
+ XLogSegNo segno;
+} ReorderBufferIterTXNEntry;
+
+typedef struct ReorderBufferIterTXNState
+{
+ binaryheap *heap;
+ Size nr_txns;
+ dlist_head old_change;
+ ReorderBufferIterTXNEntry entries[FLEXIBLE_ARRAY_MEMBER];
+} ReorderBufferIterTXNState;
+
+/* toast datastructures */
+typedef struct ReorderBufferToastEnt
+{
+ Oid chunk_id; /* toast_table.chunk_id */
+ int32 last_chunk_seq; /* toast_table.chunk_seq of the last chunk we
+ * have seen */
+ Size num_chunks; /* number of chunks we've already seen */
+ Size size; /* combined size of chunks seen */
+ dlist_head chunks; /* linked list of chunks */
+ struct varlena *reconstructed; /* reconstructed varlena now pointed
+ * to in main tup */
+} ReorderBufferToastEnt;
+
+
+/* number of changes kept in memory, per transaction */
+const Size max_memtries = 4096;
+
+/* Size of the slab caches used for frequently allocated objects */
+const Size max_cached_changes = 4096 * 2;
+const Size max_cached_tuplebufs = 1024; /* ~8MB */
+const Size max_cached_transactions = 512;
+
+
+/* ---------------------------------------
+ * primary reorderbuffer support routines
+ * ---------------------------------------
+ */
+static ReorderBufferTXN *ReorderBufferGetTXN(ReorderBuffer *rb);
+static void ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static ReorderBufferTXN *ReorderBufferTXNByXid(ReorderBuffer *rb,
+ TransactionId xid, bool create, bool *is_new,
+ XLogRecPtr lsn, bool create_as_top);
+
+static void AssertTXNLsnOrder(ReorderBuffer *rb);
+
+/* ---------------------------------------
+ * support functions for lsn-order iterating over the ->changes of a
+ * transaction and its subtransactions
+ *
+ * used for iteration over the k-way heap merge of a transaction and its
+ * subtransactions
+ * ---------------------------------------
+ */
+static ReorderBufferIterTXNState *ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static ReorderBufferChange *
+ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state);
+static void ReorderBufferIterTXNFinish(ReorderBuffer *rb,
+ ReorderBufferIterTXNState *state);
+static void ReorderBufferExecuteInvalidations(ReorderBuffer *rb, ReorderBufferTXN *txn);
+
+/*
+ * ---------------------------------------
+ * Disk serialization support functions
+ * ---------------------------------------
+ */
+static void ReorderBufferCheckSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ int fd, ReorderBufferChange *change);
+static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ int *fd, XLogSegNo *segno);
+static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ char *change);
+static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
+
+static void ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap);
+static Snapshot ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
+ ReorderBufferTXN *txn, CommandId cid);
+
+/* ---------------------------------------
+ * toast reassembly support
+ * ---------------------------------------
+ */
+/* Size of an EXTERNAL datum that contains a standard TOAST pointer */
+#define TOAST_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_external))
+
+/* Size of an indirect datum that contains a standard TOAST pointer */
+#define INDIRECT_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_indirect))
+
+static void ReorderBufferToastInitHash(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferToastReset(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change);
+static void ReorderBufferToastAppendChunk(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change);
+
+
+/*
+ * Allocate a new ReorderBuffer
+ */
+ReorderBuffer *
+ReorderBufferAllocate(void)
+{
+ ReorderBuffer *buffer;
+ HASHCTL hash_ctl;
+ MemoryContext new_ctx;
+
+ StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_INSERT == (int) REORDER_BUFFER_CHANGE_INSERT, "out of sync enums");
+ StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_UPDATE == (int) REORDER_BUFFER_CHANGE_UPDATE, "out of sync enums");
+ StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_DELETE == (int) REORDER_BUFFER_CHANGE_DELETE, "out of sync enums");
+
+ new_ctx = AllocSetContextCreate(TopMemoryContext,
+ "ReorderBuffer",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ buffer = (ReorderBuffer *) MemoryContextAlloc(new_ctx, sizeof(ReorderBuffer));
+
+ memset(&hash_ctl, 0, sizeof(hash_ctl));
+
+ buffer->context = new_ctx;
+
+ hash_ctl.keysize = sizeof(TransactionId);
+ hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt);
+ hash_ctl.hash = tag_hash;
+ hash_ctl.hcxt = buffer->context;
+
+ buffer->by_txn = hash_create("ReorderBufferByXid", 1000, &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+
+ buffer->by_txn_last_xid = InvalidTransactionId;
+ buffer->by_txn_last_txn = NULL;
+
+ buffer->nr_cached_transactions = 0;
+ buffer->nr_cached_changes = 0;
+ buffer->nr_cached_tuplebufs = 0;
+
+ buffer->outbuf = NULL;
+ buffer->outbufsize = 0;
+
+ buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
+
+ dlist_init(&buffer->toplevel_by_lsn);
+ dlist_init(&buffer->cached_transactions);
+ dlist_init(&buffer->cached_changes);
+ slist_init(&buffer->cached_tuplebufs);
+
+ return buffer;
+}
+
+/*
+ * Free a ReorderBuffer
+ */
+void
+ReorderBufferFree(ReorderBuffer *rb)
+{
+ MemoryContext context = rb->context;
+
+ /*
+ * We free separately allocated data by entirely scrapping oure personal
+ * memory context.
+ */
+ MemoryContextDelete(context);
+}
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferTXN.
+ */
+static ReorderBufferTXN *
+ReorderBufferGetTXN(ReorderBuffer *rb)
+{
+ ReorderBufferTXN *txn;
+
+ if (rb->nr_cached_transactions > 0)
+ {
+ rb->nr_cached_transactions--;
+ txn = (ReorderBufferTXN *)
+ dlist_container(ReorderBufferTXN, node,
+ dlist_pop_head_node(&rb->cached_transactions));
+ }
+ else
+ {
+ txn = (ReorderBufferTXN *)
+ MemoryContextAlloc(rb->context, sizeof(ReorderBufferTXN));
+ }
+
+ memset(txn, 0, sizeof(ReorderBufferTXN));
+
+ dlist_init(&txn->changes);
+ dlist_init(&txn->tuplecids);
+ dlist_init(&txn->subtxns);
+
+ return txn;
+}
+
+/*
+ * Free an ReorderBufferTXN. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ /* clean the lookup cache if we were cached (quite likely) */
+ if (rb->by_txn_last_xid == txn->xid)
+ {
+ rb->by_txn_last_xid = InvalidTransactionId;
+ rb->by_txn_last_txn = NULL;
+ }
+
+ if (txn->tuplecid_hash != NULL)
+ {
+ hash_destroy(txn->tuplecid_hash);
+ txn->tuplecid_hash = NULL;
+ }
+
+ if (txn->invalidations)
+ {
+ pfree(txn->invalidations);
+ txn->invalidations = NULL;
+ }
+
+ if (rb->nr_cached_transactions < max_cached_transactions)
+ {
+ rb->nr_cached_transactions++;
+ dlist_push_head(&rb->cached_transactions, &txn->node);
+ VALGRIND_MAKE_MEM_UNDEFINED(txn, sizeof(ReorderBufferTXN));
+ VALGRIND_MAKE_MEM_DEFINED(&txn->node, sizeof(txn->node));
+ }
+ else
+ {
+ pfree(txn);
+ }
+}
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferChange.
+ */
+ReorderBufferChange *
+ReorderBufferGetChange(ReorderBuffer *rb)
+{
+ ReorderBufferChange *change;
+
+ if (rb->nr_cached_changes)
+ {
+ rb->nr_cached_changes--;
+ change = (ReorderBufferChange *)
+ dlist_container(ReorderBufferChange, node,
+ dlist_pop_head_node(&rb->cached_changes));
+ }
+ else
+ {
+ change = (ReorderBufferChange *)
+ MemoryContextAlloc(rb->context, sizeof(ReorderBufferChange));
+ }
+
+ memset(change, 0, sizeof(ReorderBufferChange));
+ return change;
+}
+
+/*
+ * Free an ReorderBufferChange. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
+{
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ if (change->newtuple)
+ {
+ ReorderBufferReturnTupleBuf(rb, change->newtuple);
+ change->newtuple = NULL;
+ }
+
+ if (change->oldtuple)
+ {
+ ReorderBufferReturnTupleBuf(rb, change->oldtuple);
+ change->oldtuple = NULL;
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ if (change->snapshot)
+ {
+ ReorderBufferFreeSnap(rb, change->snapshot);
+ change->snapshot = NULL;
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ break;
+ }
+
+ if (rb->nr_cached_changes < max_cached_changes)
+ {
+ rb->nr_cached_changes++;
+ dlist_push_head(&rb->cached_changes, &change->node);
+ VALGRIND_MAKE_MEM_UNDEFINED(change, sizeof(ReorderBufferChange));
+ VALGRIND_MAKE_MEM_DEFINED(&change->node, sizeof(change->node));
+ }
+ else
+ {
+ pfree(change);
+ }
+}
+
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferTupleBuf
+ */
+ReorderBufferTupleBuf *
+ReorderBufferGetTupleBuf(ReorderBuffer *rb)
+{
+ ReorderBufferTupleBuf *tuple;
+
+ if (rb->nr_cached_tuplebufs)
+ {
+ rb->nr_cached_tuplebufs--;
+ tuple = slist_container(ReorderBufferTupleBuf, node,
+ slist_pop_head_node(&rb->cached_tuplebufs));
+#ifdef USE_ASSERT_CHECKING
+ memset(tuple, 0xdeadbeef, sizeof(ReorderBufferTupleBuf));
+#endif
+ }
+ else
+ {
+ tuple = (ReorderBufferTupleBuf *)
+ MemoryContextAlloc(rb->context, sizeof(ReorderBufferTupleBuf));
+ }
+
+ return tuple;
+}
+
+/*
+ * Free an ReorderBufferTupleBuf. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnTupleBuf(ReorderBuffer *rb, ReorderBufferTupleBuf *tuple)
+{
+ if (rb->nr_cached_tuplebufs < max_cached_tuplebufs)
+ {
+ rb->nr_cached_tuplebufs++;
+ slist_push_head(&rb->cached_tuplebufs, &tuple->node);
+ VALGRIND_MAKE_MEM_UNDEFINED(tuple, sizeof(ReorderBufferTupleBuf));
+ VALGRIND_MAKE_MEM_DEFINED(&tuple->node, sizeof(tuple->node));
+ }
+ else
+ {
+ pfree(tuple);
+ }
+}
+
+/*
+ * Return the ReorderBufferTXN from the given buffer, specified by Xid.
+ * If create is true, and a transaction doesn't already exist, create it
+ * (with the given LSN, and as top transaction if that's specified);
+ * when this happens, is_new is set to true.
+ */
+static ReorderBufferTXN *
+ReorderBufferTXNByXid(ReorderBuffer *rb, TransactionId xid, bool create,
+ bool *is_new, XLogRecPtr lsn, bool create_as_top)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferTXNByIdEnt *ent;
+ bool found;
+
+ Assert(TransactionIdIsValid(xid));
+ Assert(!create || lsn != InvalidXLogRecPtr);
+
+ /*
+ * Check the one-entry lookup cache first
+ */
+ if (TransactionIdIsValid(rb->by_txn_last_xid) &&
+ rb->by_txn_last_xid == xid)
+ {
+ txn = rb->by_txn_last_txn;
+
+ if (txn != NULL)
+ {
+ /* found it, and it's valid */
+ if (is_new)
+ *is_new = false;
+ return txn;
+ }
+
+ /*
+ * cached as non-existant, and asked not to create? Then nothing else
+ * to do.
+ */
+ if (!create)
+ return NULL;
+ /* otherwise fall through to create it */
+ }
+
+ /*
+ * If the cache wasn't hit or it yielded an "does-not-exist" and we want
+ * to create an entry.
+ */
+
+ /* search the lookup table */
+ ent = (ReorderBufferTXNByIdEnt *)
+ hash_search(rb->by_txn,
+ (void *) &xid,
+ create ? HASH_ENTER : HASH_FIND,
+ &found);
+ if (found)
+ txn = ent->txn;
+ else if (create)
+ {
+ /* initialize the new entry, if creation was requested */
+ Assert(ent != NULL);
+
+ ent->txn = ReorderBufferGetTXN(rb);
+ ent->txn->xid = xid;
+ txn = ent->txn;
+ txn->first_lsn = lsn;
+ txn->restart_decoding_lsn = rb->current_restart_decoding_lsn;
+
+ if (create_as_top)
+ {
+ dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
+ AssertTXNLsnOrder(rb);
+ }
+ }
+ else
+ txn = NULL; /* not found and not asked to create */
+
+ /* update cache */
+ rb->by_txn_last_xid = xid;
+ rb->by_txn_last_txn = txn;
+
+ if (is_new)
+ *is_new = !found;
+
+ Assert(!create || !!txn);
+ return txn;
+}
+
+/*
+ * Queue a change into a transaction so it can be replayed upon commit.
+ */
+void
+ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn,
+ ReorderBufferChange *change)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+
+ change->lsn = lsn;
+ Assert(InvalidXLogRecPtr != lsn);
+ dlist_push_tail(&txn->changes, &change->node);
+ txn->nentries++;
+ txn->nentries_mem++;
+
+ ReorderBufferCheckSerializeTXN(rb, txn);
+}
+
+static void
+AssertTXNLsnOrder(ReorderBuffer *rb)
+{
+#ifdef USE_ASSERT_CHECKING
+ dlist_iter iter;
+ XLogRecPtr prev_first_lsn = InvalidXLogRecPtr;
+
+ dlist_foreach(iter, &rb->toplevel_by_lsn)
+ {
+ ReorderBufferTXN *cur_txn;
+
+ cur_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
+ Assert(cur_txn->first_lsn != InvalidXLogRecPtr);
+
+ if (cur_txn->end_lsn != InvalidXLogRecPtr)
+ Assert(cur_txn->first_lsn <= cur_txn->end_lsn);
+
+ if (prev_first_lsn != InvalidXLogRecPtr)
+ Assert(prev_first_lsn < cur_txn->first_lsn);
+
+ Assert(!cur_txn->is_known_as_subxact);
+ prev_first_lsn = cur_txn->first_lsn;
+ }
+#endif
+}
+
+ReorderBufferTXN *
+ReorderBufferGetOldestTXN(ReorderBuffer *rb)
+{
+ ReorderBufferTXN *txn;
+
+ if (dlist_is_empty(&rb->toplevel_by_lsn))
+ return NULL;
+
+ AssertTXNLsnOrder(rb);
+
+ txn = dlist_head_element(ReorderBufferTXN, node, &rb->toplevel_by_lsn);
+
+ Assert(!txn->is_known_as_subxact);
+ Assert(txn->first_lsn != InvalidXLogRecPtr);
+ return txn;
+}
+
+void
+ReorderBufferSetRestartPoint(ReorderBuffer *rb, XLogRecPtr ptr)
+{
+ rb->current_restart_decoding_lsn = ptr;
+}
+
+void
+ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
+ TransactionId subxid, XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferTXN *subtxn;
+ bool new_top;
+ bool new_sub;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, &new_top, lsn, true);
+ subtxn = ReorderBufferTXNByXid(rb, subxid, true, &new_sub, lsn, false);
+
+ if (new_sub)
+ {
+ /*
+ * we assign subtransactions to top level transaction even if we don't
+ * have data for it yet, assignment records frequently reference xids
+ * that have not yet produced any records. Knowing those aren't top
+ * level xids allows us to make processing cheaper in some places.
+ */
+ dlist_push_tail(&txn->subtxns, &subtxn->node);
+ txn->nsubtxns++;
+ }
+ else if (!subtxn->is_known_as_subxact)
+ {
+ subtxn->is_known_as_subxact = true;
+ Assert(subtxn->nsubtxns == 0);
+
+ /* remove from lsn order list of top-level transactions */
+ dlist_delete(&subtxn->node);
+
+ /* add to toplevel transaction */
+ dlist_push_tail(&txn->subtxns, &subtxn->node);
+ txn->nsubtxns++;
+ }
+ else if (new_top)
+ {
+ elog(ERROR, "existing subxact assigned to unknown toplevel xact");
+ }
+}
+
+/*
+ * Associate a subtransaction with its toplevel transaction at commit
+ * time. There may be no further changes added after this.
+ */
+void
+ReorderBufferCommitChild(ReorderBuffer *rb, TransactionId xid,
+ TransactionId subxid, XLogRecPtr commit_lsn,
+ XLogRecPtr end_lsn)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferTXN *subtxn;
+
+ subtxn = ReorderBufferTXNByXid(rb, subxid, false, NULL,
+ InvalidXLogRecPtr, false);
+
+ /*
+ * No need to do anything if that subtxn didn't contain any changes
+ */
+ if (!subtxn)
+ return;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, commit_lsn, true);
+
+ if (txn == NULL)
+ elog(ERROR, "subxact logged without previous toplevel record");
+
+ subtxn->final_lsn = commit_lsn;
+ subtxn->end_lsn = end_lsn;
+
+ if (!subtxn->is_known_as_subxact)
+ {
+ subtxn->is_known_as_subxact = true;
+ Assert(subtxn->nsubtxns == 0);
+
+ /* remove from lsn order list of top-level transactions */
+ dlist_delete(&subtxn->node);
+
+ /* add to subtransaction list */
+ dlist_push_tail(&txn->subtxns, &subtxn->node);
+ txn->nsubtxns++;
+ }
+}
+
+
+/*
+ * Support for efficiently iterating over a transaction's and its
+ * subtransactions' changes.
+ *
+ * We do by doing a k-way merge between transactions/subtransactions. For that
+ * we model the current heads of the different transactions as a binary heap so
+ * we easily know which (sub-)transaction has the change with the smallest lsn
+ * next.
+ *
+ * We assume the changes in individual transactions are already sorted by LSN.
+ */
+
+/*
+ * Binary heap comparison function.
+ */
+static int
+ReorderBufferIterCompare(Datum a, Datum b, void *arg)
+{
+ ReorderBufferIterTXNState *state = (ReorderBufferIterTXNState *) arg;
+ XLogRecPtr pos_a = state->entries[DatumGetInt32(a)].lsn;
+ XLogRecPtr pos_b = state->entries[DatumGetInt32(b)].lsn;
+
+ if (pos_a < pos_b)
+ return 1;
+ else if (pos_a == pos_b)
+ return 0;
+ return -1;
+}
+
+/*
+ * Allocate & initialize an iterator which iterates in lsn order over a
+ * transaction and all its subtransactions.
+ */
+static ReorderBufferIterTXNState *
+ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ Size nr_txns = 0;
+ ReorderBufferIterTXNState *state;
+ dlist_iter cur_txn_i;
+ int32 off;
+
+ /*
+ * Calculate the size of our heap: one element for every transaction that
+ * contains changes. (Besides the transactions already in the reorder
+ * buffer, we count the one we were directly passed.)
+ */
+ if (txn->nentries > 0)
+ nr_txns++;
+
+ dlist_foreach(cur_txn_i, &txn->subtxns)
+ {
+ ReorderBufferTXN *cur_txn;
+
+ cur_txn = dlist_container(ReorderBufferTXN, node, cur_txn_i.cur);
+
+ if (cur_txn->nentries > 0)
+ nr_txns++;
+ }
+
+ /*
+ * XXX: Add fastpath for the rather common nr_txns=1 case, no need to
+ * allocate/build a heap in that case.
+ */
+
+ /* allocate iteration state */
+ state = (ReorderBufferIterTXNState *)
+ MemoryContextAllocZero(rb->context,
+ sizeof(ReorderBufferIterTXNState) +
+ sizeof(ReorderBufferIterTXNEntry) * nr_txns);
+
+ state->nr_txns = nr_txns;
+ dlist_init(&state->old_change);
+
+ for (off = 0; off < state->nr_txns; off++)
+ {
+ state->entries[off].fd = -1;
+ state->entries[off].segno = 0;
+ }
+
+ /* allocate heap */
+ state->heap = binaryheap_allocate(state->nr_txns, ReorderBufferIterCompare,
+ state);
+
+ /*
+ * Now insert items into the binary heap, unordered. (We will run a heap
+ * assembly step at the end; this is more efficient.)
+ */
+
+ off = 0;
+
+ /* add toplevel transaction if it contains changes */
+ if (txn->nentries > 0)
+ {
+ ReorderBufferChange *cur_change;
+
+ if (txn->nentries != txn->nentries_mem)
+ ReorderBufferRestoreChanges(rb, txn, &state->entries[off].fd,
+ &state->entries[off].segno);
+
+ cur_change = dlist_head_element(ReorderBufferChange, node,
+ &txn->changes);
+
+ state->entries[off].lsn = cur_change->lsn;
+ state->entries[off].change = cur_change;
+ state->entries[off].txn = txn;
+
+ binaryheap_add_unordered(state->heap, Int32GetDatum(off++));
+ }
+
+ /* add subtransactions if they contain changes */
+ dlist_foreach(cur_txn_i, &txn->subtxns)
+ {
+ ReorderBufferTXN *cur_txn;
+
+ cur_txn = dlist_container(ReorderBufferTXN, node, cur_txn_i.cur);
+
+ if (cur_txn->nentries > 0)
+ {
+ ReorderBufferChange *cur_change;
+
+ if (txn->nentries != txn->nentries_mem)
+ ReorderBufferRestoreChanges(rb, cur_txn,
+ &state->entries[off].fd,
+ &state->entries[off].segno);
+
+ cur_change = dlist_head_element(ReorderBufferChange, node,
+ &cur_txn->changes);
+
+ state->entries[off].lsn = cur_change->lsn;
+ state->entries[off].change = cur_change;
+ state->entries[off].txn = cur_txn;
+
+ binaryheap_add_unordered(state->heap, Int32GetDatum(off++));
+ }
+ }
+
+ /* assemble a valid binary heap */
+ binaryheap_build(state->heap);
+
+ return state;
+}
+
+/*
+ * FIXME: better comment and/or name
+ */
+static void
+ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ XLogSegNo first;
+ XLogSegNo cur;
+ XLogSegNo last;
+
+ Assert(txn->first_lsn != InvalidXLogRecPtr);
+ Assert(txn->final_lsn != InvalidXLogRecPtr);
+
+ XLByteToSeg(txn->first_lsn, first);
+ XLByteToSeg(txn->final_lsn, last);
+
+ for (cur = first; cur <= last; cur++)
+ {
+ char path[MAXPGPATH];
+ XLogRecPtr recptr;
+
+ XLogSegNoOffsetToRecPtr(cur, 0, recptr);
+
+ sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+ NameStr(MyLogicalDecodingSlot->name), txn->xid,
+ (uint32) (recptr >> 32), (uint32) recptr);
+ if (unlink(path) != 0 && errno != ENOENT)
+ elog(FATAL, "could not unlink file \"%s\": %m", path);
+ }
+}
+
+/*
+ * Return the next change when iterating over a transaction and its
+ * subtransaction.
+ *
+ * Returns NULL when no further changes exist.
+ */
+static ReorderBufferChange *
+ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
+{
+ ReorderBufferChange *change;
+ ReorderBufferIterTXNEntry *entry;
+ int32 off;
+
+ /* nothing there anymore */
+ if (state->heap->bh_size == 0)
+ return NULL;
+
+ off = DatumGetInt32(binaryheap_first(state->heap));
+ entry = &state->entries[off];
+
+ if (!dlist_is_empty(&entry->txn->subtxns))
+ elog(LOG, "tx with subtxn %u", entry->txn->xid);
+
+ /* free memory we might have "leaked" in the previous *Next call */
+ if (!dlist_is_empty(&state->old_change))
+ {
+ change = dlist_container(ReorderBufferChange, node,
+ dlist_pop_head_node(&state->old_change));
+ ReorderBufferReturnChange(rb, change);
+ Assert(dlist_is_empty(&state->old_change));
+ }
+
+ change = entry->change;
+
+ /*
+ * update heap with information about which transaction has the next
+ * relevant change in LSN order
+ */
+
+ /* there are in-memory changes */
+ if (dlist_has_next(&entry->txn->changes, &entry->change->node))
+ {
+ dlist_node *next = dlist_next_node(&entry->txn->changes, &change->node);
+ ReorderBufferChange *next_change =
+ dlist_container(ReorderBufferChange, node, next);
+
+ /* txn stays the same */
+ state->entries[off].lsn = next_change->lsn;
+ state->entries[off].change = next_change;
+
+ binaryheap_replace_first(state->heap, Int32GetDatum(off));
+ return change;
+ }
+
+ /* try to load changes from disk */
+ if (entry->txn->nentries != entry->txn->nentries_mem)
+ {
+ /*
+ * Ugly: restoring changes will reuse *Change records, thus delete the
+ * current one from the per-tx list and only free in the next call.
+ */
+ dlist_delete(&change->node);
+ dlist_push_tail(&state->old_change, &change->node);
+
+ if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->fd,
+ &state->entries[off].segno))
+ {
+ /* successfully restored changes from disk */
+ ReorderBufferChange *next_change =
+ dlist_head_element(ReorderBufferChange, node,
+ &entry->txn->changes);
+
+ elog(DEBUG2, "restored %zu/%zu changes from disk",
+ entry->txn->nentries_mem, entry->txn->nentries);
+ Assert(entry->txn->nentries_mem);
+ /* txn stays the same */
+ state->entries[off].lsn = next_change->lsn;
+ state->entries[off].change = next_change;
+ binaryheap_replace_first(state->heap, Int32GetDatum(off));
+
+ return change;
+ }
+ }
+
+ /* ok, no changes there anymore, remove */
+ binaryheap_remove_first(state->heap);
+
+ return change;
+}
+
+/*
+ * Deallocate the iterator
+ */
+static void
+ReorderBufferIterTXNFinish(ReorderBuffer *rb,
+ ReorderBufferIterTXNState *state)
+{
+ int32 off;
+
+ for (off = 0; off < state->nr_txns; off++)
+ {
+ if (state->entries[off].fd != -1)
+ CloseTransientFile(state->entries[off].fd);
+ }
+
+ /* free memory we might have "leaked" in the last *Next call */
+ if (!dlist_is_empty(&state->old_change))
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node,
+ dlist_pop_head_node(&state->old_change));
+ ReorderBufferReturnChange(rb, change);
+ Assert(dlist_is_empty(&state->old_change));
+ }
+
+ binaryheap_free(state->heap);
+ pfree(state);
+}
+
+/*
+ * Cleanup the contents of a transaction, usually after the transaction
+ * committed or aborted.
+ */
+static void
+ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ bool found;
+ dlist_mutable_iter iter;
+
+ /* cleanup subtransactions & their changes */
+ dlist_foreach_modify(iter, &txn->subtxns)
+ {
+ ReorderBufferTXN *subtxn;
+
+ subtxn = dlist_container(ReorderBufferTXN, node, iter.cur);
+ Assert(subtxn->is_known_as_subxact);
+ Assert(subtxn->nsubtxns == 0);
+
+ /*
+ * subtransactions are always associated to the toplevel TXN, even if
+ * they originally were happening inside another subtxn, so we won't
+ * ever recurse more than one level here.
+ */
+ ReorderBufferCleanupTXN(rb, subtxn);
+ }
+
+ /* cleanup changes in the toplevel txn */
+ dlist_foreach_modify(iter, &txn->changes)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ ReorderBufferReturnChange(rb, change);
+ }
+
+ /*
+ * cleanup the tuplecids we stored timetravel access. They are always
+ * stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+ Assert(change->action_internal == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+ ReorderBufferReturnChange(rb, change);
+ }
+
+ if (txn->base_snapshot != NULL)
+ {
+ SnapBuildSnapDecRefcount(txn->base_snapshot);
+ txn->base_snapshot = NULL;
+ }
+
+ /* delete from list of known subxacts */
+ if (txn->is_known_as_subxact)
+ {
+ dlist_delete(&txn->node);
+ }
+ /* delete from LSN ordered list of toplevel TXNs */
+ else
+ {
+ /* FIXME: adjust nsubxacts count of parent */
+ dlist_delete(&txn->node);
+ }
+
+ /* now remove reference from buffer */
+ hash_search(rb->by_txn,
+ (void *) &txn->xid,
+ HASH_REMOVE,
+ &found);
+ Assert(found);
+
+ /* remove entries spilled to disk */
+ if (txn->nentries != txn->nentries_mem)
+ ReorderBufferRestoreCleanup(rb, txn);
+
+ /* deallocate */
+ ReorderBufferReturnTXN(rb, txn);
+}
+
+/*
+ * Build a hash with a (relfilenode, ctid) -> (cmin, cmax) mapping for use by
+ * tqual.c's HeapTupleSatisfiesMVCCDuringDecoding.
+ */
+static void
+ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ dlist_iter iter;
+ HASHCTL hash_ctl;
+
+ if (!txn->does_timetravel || dlist_is_empty(&txn->tuplecids))
+ return;
+
+ memset(&hash_ctl, 0, sizeof(hash_ctl));
+
+ hash_ctl.keysize = sizeof(ReorderBufferTupleCidKey);
+ hash_ctl.entrysize = sizeof(ReorderBufferTupleCidEnt);
+ hash_ctl.hash = tag_hash;
+ hash_ctl.hcxt = rb->context;
+
+ /*
+ * create the hash with the exact number of to-be-stored tuplecids from
+ * the start
+ */
+ txn->tuplecid_hash =
+ hash_create("ReorderBufferTupleCid", txn->ntuplecids, &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+
+ dlist_foreach(iter, &txn->tuplecids)
+ {
+ ReorderBufferTupleCidKey key;
+ ReorderBufferTupleCidEnt *ent;
+ bool found;
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ Assert(change->action_internal == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* be careful about padding */
+ memset(&key, 0, sizeof(ReorderBufferTupleCidKey));
+
+ key.relnode = change->tuplecid.node;
+
+ ItemPointerCopy(&change->tuplecid.tid,
+ &key.tid);
+
+ ent = (ReorderBufferTupleCidEnt *)
+ hash_search(txn->tuplecid_hash,
+ (void *) &key,
+ HASH_ENTER | HASH_FIND,
+ &found);
+ if (!found)
+ {
+ ent->cmin = change->tuplecid.cmin;
+ ent->cmax = change->tuplecid.cmax;
+ ent->combocid = change->tuplecid.combocid;
+ }
+ else
+ {
+ Assert(ent->cmin == change->tuplecid.cmin);
+ Assert(ent->cmax == InvalidCommandId ||
+ ent->cmax == change->tuplecid.cmax);
+
+ /*
+ * if the tuple got valid in this transaction and now got deleted
+ * we already have a valid cmin stored. The cmax will be
+ * InvalidCommandId though.
+ */
+ ent->cmax = change->tuplecid.cmax;
+ }
+ }
+}
+
+/*
+ * Copy a provided snapshot so we can modify it privately. This is needed so
+ * that catalog modifying transactions can look into intermediate catalog
+ * states.
+ */
+static Snapshot
+ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
+ ReorderBufferTXN *txn, CommandId cid)
+{
+ Snapshot snap;
+ dlist_iter iter;
+ int i = 0;
+ Size size;
+
+ size = sizeof(SnapshotData) +
+ sizeof(TransactionId) * orig_snap->xcnt +
+ sizeof(TransactionId) * (txn->nsubtxns + 1);
+
+ elog(DEBUG1, "copying a non-transaction-specific snapshot into timetravel tx %u", txn->xid);
+
+ snap = MemoryContextAllocZero(rb->context, size);
+ memcpy(snap, orig_snap, sizeof(SnapshotData));
+
+ snap->copied = true;
+ snap->active_count = 0;
+ snap->regd_count = 0;
+ snap->xip = (TransactionId *) (snap + 1);
+
+ memcpy(snap->xip, orig_snap->xip, sizeof(TransactionId) * snap->xcnt);
+
+ /*
+ * ->subxip contains all txids that belong to our transaction which we
+ * need to check via cmin/cmax. Thats why we store the toplevel
+ * transaction in there as well.
+ */
+ snap->subxip = snap->xip + snap->xcnt;
+ snap->subxip[i++] = txn->xid;
+ snap->subxcnt = txn->nsubtxns + 1;
+
+ dlist_foreach(iter, &txn->subtxns)
+ {
+ ReorderBufferTXN *sub_txn;
+
+ sub_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
+ snap->subxip[i++] = sub_txn->xid;
+ }
+
+ /* sort so we can bsearch() later */
+ qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);
+
+ /* store the specified current CommandId */
+ snap->curcid = cid;
+
+ return snap;
+}
+
+/*
+ * Free a previously ReorderBufferCopySnap'ed snapshot
+ */
+static void
+ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap)
+{
+ if (snap->copied)
+ pfree(snap);
+ else
+ SnapBuildSnapDecRefcount(snap);
+}
+
+/*
+ * Commit a transaction and replay all actions that previously have been
+ * ReorderBufferQueueChange'd in the toplevel TX or any of the subtransactions
+ * assigned via ReorderBufferCommitChild.
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid, XLogRecPtr commit_lsn,
+ XLogRecPtr end_lsn)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferIterTXNState *iterstate = NULL;
+ ReorderBufferChange *change;
+ CommandId command_id = FirstCommandId;
+ volatile Snapshot snapshot_now;
+ Relation relation = NULL;
+ Oid reloid;
+ bool is_transaction_state = IsTransactionOrTransactionBlock();
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* empty transaction */
+ if (txn == NULL)
+ return;
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+
+ /* serialize the last bunch of changes if we need start earlier anyway */
+ if (txn->nentries_mem != txn->nentries)
+ ReorderBufferSerializeTXN(rb, txn);
+
+ /*
+ * If this transaction didn't have any real changes in our database, it's
+ * OK not to have a snapshot.
+ */
+ if (txn->base_snapshot == NULL)
+ return;
+
+ snapshot_now = txn->base_snapshot;
+
+ ReorderBufferBuildTupleCidHash(rb, txn);
+
+ /* setup initial snapshot */
+ SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+
+ PG_TRY();
+ {
+ /*
+ * Decoding needs access to syscaches et al., which in turn use
+ * heavyweight locks and such. Thus we need to have enough state around
+ * to keep track of those. The easiest way is to simply use a
+ * transaction internally. That also allows us to easily enforce that
+ * nothing writes to the database by checking for xid assignments.
+ *
+ * When we're called via the SQL SRF there's already a transaction
+ * started, so start an explicit subtransaction there.
+ */
+ if (is_transaction_state)
+ BeginInternalSubTransaction("replay");
+ else
+ StartTransactionCommand();
+
+ rb->begin(rb, txn);
+
+ iterstate = ReorderBufferIterTXNInit(rb, txn);
+ while ((change = ReorderBufferIterTXNNext(rb, iterstate)))
+ {
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ Assert(snapshot_now);
+
+ reloid = RelidByRelfilenode(change->relnode.spcNode,
+ change->relnode.relNode);
+
+ /*
+ * catalog tuple without data, while catalog has been
+ * rewritten
+ */
+ if (reloid == InvalidOid &&
+ change->newtuple == NULL && change->oldtuple == NULL)
+ continue;
+ else if (reloid == InvalidOid)
+ elog(ERROR, "could not lookup relation %s",
+ relpathperm(change->relnode, MAIN_FORKNUM));
+
+ relation = RelationIdGetRelation(reloid);
+
+ if (relation == NULL)
+ elog(ERROR, "could open relation descriptor %s",
+ relpathperm(change->relnode, MAIN_FORKNUM));
+
+ if (RelationIsLogicallyLogged(relation))
+ {
+ /* user-triggered change */
+ if (relation->rd_rel->relkind == RELKIND_SEQUENCE)
+ {
+ }
+ else if (!IsToastRelation(relation))
+ {
+ ReorderBufferToastReplace(rb, txn, relation, change);
+ rb->apply_change(rb, txn, relation, change);
+ ReorderBufferToastReset(rb, txn);
+ }
+ /* we're not interested in toast deletions */
+ else if (change->action == REORDER_BUFFER_CHANGE_INSERT)
+ {
+ /*
+ * need to reassemble change in memory, ensure it
+ * doesn't get reused till we're done.
+ */
+ dlist_delete(&change->node);
+ ReorderBufferToastAppendChunk(rb, txn, relation,
+ change);
+ }
+
+ }
+ RelationClose(relation);
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ /* XXX: we could skip snapshots in non toplevel txns */
+
+ /* get rid of the old */
+ RevertFromDecodingSnapshots();
+
+ if (snapshot_now->copied)
+ {
+ ReorderBufferFreeSnap(rb, snapshot_now);
+ snapshot_now =
+ ReorderBufferCopySnap(rb, change->snapshot,
+ txn, command_id);
+ }
+
+ /*
+ * restored from disk, we need to be careful not to double
+ * free. We could introduce refcounting for that, but for
+ * now this seems infrequent enough not to care.
+ */
+ else if (change->snapshot->copied)
+ {
+ snapshot_now =
+ ReorderBufferCopySnap(rb, change->snapshot,
+ txn, command_id);
+ }
+ else
+ {
+ snapshot_now = change->snapshot;
+ }
+
+
+ /* and start with the new one */
+ SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+ break;
+
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ if (!snapshot_now->copied)
+ {
+ /* we don't use the global one anymore */
+ snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
+ txn, command_id);
+ }
+
+ command_id = Max(command_id, change->command_id);
+
+ if (command_id != InvalidCommandId)
+ {
+ snapshot_now->curcid = command_id;
+
+ RevertFromDecodingSnapshots();
+ SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+ }
+
+ /*
+ * everytime the CommandId is incremented, we could see
+ * new catalog contents
+ */
+ ReorderBufferExecuteInvalidations(rb, txn);
+
+ break;
+
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ elog(ERROR, "tuplecid value in normal queue");
+ break;
+ }
+ }
+
+ ReorderBufferIterTXNFinish(rb, iterstate);
+
+ /* call commit callback */
+ rb->commit(rb, txn, commit_lsn);
+
+ /* make sure nothing has written anything */
+ if (GetTopTransactionIdIfAny() != InvalidTransactionId)
+ elog(ERROR, "cannot write during replay");
+
+ /*
+ * Abort subtransaction or aborting transaction as a whole has the
+ * right semantics. We want all locks acquired in here to be released,
+ * not reassinged to the parent and we do not want any database access
+ * have persistent effects.
+ */
+ if (is_transaction_state)
+ RollbackAndReleaseCurrentSubTransaction();
+ else
+ AbortCurrentTransaction();
+
+ /* make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+
+ /* cleanup */
+ RevertFromDecodingSnapshots();
+
+ if (snapshot_now->copied)
+ ReorderBufferFreeSnap(rb, snapshot_now);
+
+ ReorderBufferCleanupTXN(rb, txn);
+ }
+ PG_CATCH();
+ {
+ /* TODO: Encapsulate cleanup from the PG_TRY and PG_CATCH blocks */
+ if (iterstate)
+ ReorderBufferIterTXNFinish(rb, iterstate);
+
+ if (is_transaction_state)
+ RollbackAndReleaseCurrentSubTransaction();
+ else
+ AbortCurrentTransaction();
+
+ ReorderBufferExecuteInvalidations(rb, txn);
+
+ RevertFromDecodingSnapshots();
+
+ if (snapshot_now->copied)
+ ReorderBufferFreeSnap(rb, snapshot_now);
+
+ /*
+ * don't do a ReorderBufferCleanupTXN here, with the vague idea of
+ * allowing to retry decoding.
+ */
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Abort a transaction that possibly has previous changes. Needs to be done
+ * independently for toplevel and subtransactions.
+ */
+void
+ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* no changes in this commit */
+ if (txn == NULL)
+ return;
+
+ txn->final_lsn = lsn;
+
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
+ * Check whether a transaction is already known in this module
+ */
+bool
+ReorderBufferIsXidKnown(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+ return txn != NULL;
+}
+
+/*
+ * Add a new snapshot to this transaction that is only used after lsn 'lsn'.
+ */
+void
+ReorderBufferAddSnapshot(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, Snapshot snap)
+{
+ ReorderBufferChange *change = ReorderBufferGetChange(rb);
+
+ change->snapshot = snap;
+ change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT;
+
+ ReorderBufferQueueChange(rb, xid, lsn, change);
+}
+
+/*
+ * Setup the base snapshot of a transaction. That is the snapshot that is used
+ * to decode all changes until either this transaction modifies the catalog or
+ * another catalog modifying transaction commits.
+ */
+void
+ReorderBufferSetBaseSnapshot(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, Snapshot snap)
+{
+ ReorderBufferTXN *txn;
+ bool is_new;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, &is_new, lsn, true);
+ Assert(txn->base_snapshot == NULL);
+
+ txn->base_snapshot = snap;
+}
+
+/*
+ * Access the catalog with this CommandId at this point in the changestream.
+ *
+ * May only be called for command ids > 1
+ */
+void
+ReorderBufferAddNewCommandId(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, CommandId cid)
+{
+ ReorderBufferChange *change = ReorderBufferGetChange(rb);
+
+ change->command_id = cid;
+ change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID;
+
+ ReorderBufferQueueChange(rb, xid, lsn, change);
+}
+
+
+/*
+ * Add new (relfilenode, tid) -> (cmin, cmax) mappings.
+ */
+void
+ReorderBufferAddNewTupleCids(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, RelFileNode node,
+ ItemPointerData tid, CommandId cmin,
+ CommandId cmax, CommandId combocid)
+{
+ ReorderBufferChange *change = ReorderBufferGetChange(rb);
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+
+ change->tuplecid.node = node;
+ change->tuplecid.tid = tid;
+ change->tuplecid.cmin = cmin;
+ change->tuplecid.cmax = cmax;
+ change->tuplecid.combocid = combocid;
+ change->lsn = lsn;
+ change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
+
+ dlist_push_tail(&txn->tuplecids, &change->node);
+ txn->ntuplecids++;
+}
+
+/*
+ * Setup the invalidation of the toplevel transaction.
+ *
+ * This needs to be done before ReorderBufferCommit is called!
+ */
+void
+ReorderBufferAddInvalidations(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, Size nmsgs,
+ SharedInvalidationMessage *msgs)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+
+ if (txn->ninvalidations != 0)
+ elog(ERROR, "only ever add one set of invalidations");
+
+ Assert(nmsgs > 0);
+
+ txn->ninvalidations = nmsgs;
+ txn->invalidations = (SharedInvalidationMessage *)
+ MemoryContextAlloc(rb->context,
+ sizeof(SharedInvalidationMessage) * nmsgs);
+ memcpy(txn->invalidations, msgs, sizeof(SharedInvalidationMessage) * nmsgs);
+}
+
+/*
+ * Apply all invalidations we know. Possibly we only need parts at this point
+ * in the changestream but we don't know which those are.
+ */
+static void
+ReorderBufferExecuteInvalidations(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ int i;
+
+ for (i = 0; i < txn->ninvalidations; i++)
+ LocalExecuteInvalidationMessage(&txn->invalidations[i]);
+}
+
+/*
+ * Mark a transaction as doing timetravel.
+ */
+void
+ReorderBufferXidSetTimetravel(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+
+ txn->does_timetravel = true;
+}
+
+/*
+ * Query whether a transaction is already *known* to be doing timetravel. This
+ * can be wrong until directly before the commit!
+ */
+bool
+ReorderBufferXidDoesTimetravel(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+ if (txn == NULL)
+ return false;
+
+ return txn->does_timetravel;
+}
+
+/*
+ * Have we already added the first snapshot?
+ */
+bool
+ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* transaction isn't known yet, ergo no snapshot */
+ if (txn == NULL)
+ return false;
+
+ return txn->base_snapshot != NULL;
+}
+
+static void
+ReorderBufferSerializeReserve(ReorderBuffer *rb, Size sz)
+{
+ if (!rb->outbufsize)
+ {
+ rb->outbuf = MemoryContextAlloc(rb->context, sz);
+ rb->outbufsize = sz;
+ }
+ else if (rb->outbufsize < sz)
+ {
+ rb->outbuf = repalloc(rb->outbuf, sz);
+ rb->outbufsize = sz;
+ }
+}
+
+typedef struct ReorderBufferDiskChange
+{
+ Size size;
+ ReorderBufferChange change;
+ /* data follows */
+} ReorderBufferDiskChange;
+
+/*
+ * Persistency support
+ */
+static void
+ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ int fd, ReorderBufferChange *change)
+{
+ ReorderBufferDiskChange *ondisk;
+ Size sz = sizeof(ReorderBufferDiskChange);
+
+ ReorderBufferSerializeReserve(rb, sz);
+
+ ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+ memcpy(&ondisk->change, change, sizeof(ReorderBufferChange));
+
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ {
+ char *data;
+ Size oldlen = 0;
+ Size newlen = 0;
+
+ if (change->oldtuple)
+ oldlen = offsetof(ReorderBufferTupleBuf, data)
+ +change->oldtuple->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ if (change->newtuple)
+ newlen = offsetof(ReorderBufferTupleBuf, data)
+ +change->newtuple->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ sz += oldlen;
+ sz += newlen;
+
+ /* make sure we have enough space */
+ ReorderBufferSerializeReserve(rb, sz);
+
+ data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+ /* might have been reallocated above */
+ ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+
+ if (oldlen)
+ {
+ memcpy(data, change->oldtuple, oldlen);
+ data += oldlen;
+ Assert(&change->oldtuple->header == change->oldtuple->tuple.t_data);
+ }
+
+ if (newlen)
+ {
+ memcpy(data, change->newtuple, newlen);
+ data += newlen;
+ Assert(&change->newtuple->header == change->newtuple->tuple.t_data);
+ }
+ break;
+ }
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ {
+ char *data;
+
+ sz += sizeof(SnapshotData) +
+ sizeof(TransactionId) * change->snapshot->xcnt +
+ sizeof(TransactionId) * change->snapshot->subxcnt
+ ;
+
+ /* make sure we have enough space */
+ ReorderBufferSerializeReserve(rb, sz);
+ data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+ /* might have been reallocated above */
+ ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+
+ memcpy(data, change->snapshot, sizeof(SnapshotData));
+ data += sizeof(SnapshotData);
+
+ if (change->snapshot->xcnt)
+ {
+ memcpy(data, change->snapshot->xip,
+ sizeof(TransactionId) + change->snapshot->xcnt);
+ data += sizeof(TransactionId) + change->snapshot->xcnt;
+ }
+
+ if (change->snapshot->subxcnt)
+ {
+ memcpy(data, change->snapshot->subxip,
+ sizeof(TransactionId) + change->snapshot->subxcnt);
+ data += sizeof(TransactionId) + change->snapshot->subxcnt;
+ }
+ break;
+ }
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ /* ReorderBufferChange contains everything important */
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ /* ReorderBufferChange contains everything important */
+ break;
+ }
+
+ ondisk->size = sz;
+
+ if (write(fd, rb->outbuf, ondisk->size) != ondisk->size)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to xid data file \"%u\": %m",
+ txn->xid)));
+ }
+
+ Assert(ondisk->change.action_internal == change->action_internal);
+}
+
+static void
+ReorderBufferCheckSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ /* FIXME subtxn handling? */
+ if (txn->nentries_mem >= max_memtries)
+ {
+ ReorderBufferSerializeTXN(rb, txn);
+ Assert(txn->nentries_mem == 0);
+ }
+}
+
+static void
+ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ dlist_iter subtxn_i;
+ dlist_mutable_iter change_i;
+ int fd = -1;
+ XLogSegNo curOpenSegNo = 0;
+ Size spilled = 0;
+ char path[MAXPGPATH];
+
+ elog(DEBUG2, "spill %zu changes in tx %u to disk",
+ txn->nentries_mem, txn->xid);
+
+ /* do the same to all child TXs */
+ dlist_foreach(subtxn_i, &txn->subtxns)
+ {
+ ReorderBufferTXN *subtxn;
+
+ subtxn = dlist_container(ReorderBufferTXN, node, subtxn_i.cur);
+ ReorderBufferSerializeTXN(rb, subtxn);
+ }
+
+ /* serialize changestream */
+ dlist_foreach_modify(change_i, &txn->changes)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, change_i.cur);
+
+ /*
+ * store in segment in which it belongs by start lsn, don't split over
+ * multiple segments tho
+ */
+ if (fd == -1 || XLByteInSeg(change->lsn, curOpenSegNo))
+ {
+ XLogRecPtr recptr;
+
+ if (fd != -1)
+ CloseTransientFile(fd);
+
+ XLByteToSeg(change->lsn, curOpenSegNo);
+ XLogSegNoOffsetToRecPtr(curOpenSegNo, 0, recptr);
+
+ sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+ NameStr(MyLogicalDecodingSlot->name), txn->xid,
+ (uint32) (recptr >> 32), (uint32) recptr);
+
+ /* open segment, create it if necessary */
+ fd = OpenTransientFile(path,
+ O_CREAT | O_WRONLY | O_APPEND | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+
+ if (fd < 0)
+ ereport(ERROR, (errmsg("could not open reorderbuffer file %s for writing: %m", path)));
+ }
+
+ ReorderBufferSerializeChange(rb, txn, fd, change);
+ dlist_delete(&change->node);
+ ReorderBufferReturnChange(rb, change);
+
+ spilled++;
+ }
+
+ Assert(spilled == txn->nentries_mem);
+ Assert(dlist_is_empty(&txn->changes));
+ txn->nentries_mem = 0;
+
+ if (fd != -1)
+ CloseTransientFile(fd);
+
+ /* issue write barrier */
+ /* serialize main transaction state */
+}
+
+static Size
+ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ int *fd, XLogSegNo *segno)
+{
+ Size restored = 0;
+ XLogSegNo last_segno;
+ dlist_mutable_iter cleanup_iter;
+
+ Assert(txn->first_lsn != InvalidXLogRecPtr);
+ Assert(txn->final_lsn != InvalidXLogRecPtr);
+
+ /* free current entries, so we have memory for more */
+ dlist_foreach_modify(cleanup_iter, &txn->changes)
+ {
+ ReorderBufferChange *cleanup =
+ dlist_container(ReorderBufferChange, node, cleanup_iter.cur);
+
+ dlist_delete(&cleanup->node);
+ ReorderBufferReturnChange(rb, cleanup);
+ }
+ txn->nentries_mem = 0;
+ Assert(dlist_is_empty(&txn->changes));
+
+ XLByteToSeg(txn->final_lsn, last_segno);
+
+ while (restored < max_memtries && *segno <= last_segno)
+ {
+ int readBytes;
+ ReorderBufferDiskChange *ondisk;
+
+ if (*fd == -1)
+ {
+ XLogRecPtr recptr;
+ char path[MAXPGPATH];
+
+ /* first time in */
+ if (*segno == 0)
+ {
+ XLByteToSeg(txn->first_lsn, *segno);
+ elog(LOG, "initial restoring from %zu to %zu",
+ *segno, last_segno);
+ }
+
+ Assert(*segno != 0 || dlist_is_empty(&txn->changes));
+ XLogSegNoOffsetToRecPtr(*segno, 0, recptr);
+
+ sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+ NameStr(MyLogicalDecodingSlot->name), txn->xid,
+ (uint32) (recptr >> 32), (uint32) recptr);
+
+ elog(LOG, "opening file %s", path);
+
+ *fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+ if (*fd < 0 && errno == ENOENT)
+ {
+ *fd = -1;
+ (*segno)++;
+ continue;
+ }
+ else if (*fd < 0)
+ ereport(ERROR, (errmsg("could not open reorderbuffer file %s for reading: %m", path)));
+
+ }
+
+ ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
+
+
+ /*
+ * read the statically sized part of a change which has information
+ * about the total size. If we couldn't read a record, we're at the
+ * end of this file.
+ */
+
+ readBytes = read(*fd, rb->outbuf, sizeof(ReorderBufferDiskChange));
+
+ /* eof */
+ if (readBytes == 0)
+ {
+ CloseTransientFile(*fd);
+ *fd = -1;
+ (*segno)++;
+ continue;
+ }
+ else if (readBytes < 0)
+ elog(ERROR, "read failed: %m");
+ else if (readBytes != sizeof(ReorderBufferDiskChange))
+ elog(ERROR, "incomplete read, read %d instead of %zu",
+ readBytes, sizeof(ReorderBufferDiskChange));
+
+ ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+
+ ReorderBufferSerializeReserve(rb,
+ sizeof(ReorderBufferDiskChange) + ondisk->size);
+ ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+
+ readBytes = read(*fd, rb->outbuf + sizeof(ReorderBufferDiskChange),
+ ondisk->size - sizeof(ReorderBufferDiskChange));
+
+ if (readBytes < 0)
+ elog(ERROR, "read2 failed: %m");
+ else if (readBytes != ondisk->size - sizeof(ReorderBufferDiskChange))
+ elog(ERROR, "incomplete read2, read %d instead of %zu",
+ readBytes, ondisk->size - sizeof(ReorderBufferDiskChange));
+
+ /*
+ * ok, read a full change from disk, now restore it into proper
+ * in-memory format
+ */
+ ReorderBufferRestoreChange(rb, txn, rb->outbuf);
+ restored++;
+ }
+
+ return restored;
+}
+
+/*
+ * Convert change from its on-disk format to in-memory format and queue it onto
+ * the TXN's ->changes list.
+ */
+static void
+ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ char *data)
+{
+ ReorderBufferDiskChange *ondisk;
+ ReorderBufferChange *change;
+
+ ondisk = (ReorderBufferDiskChange *) data;
+
+ change = ReorderBufferGetChange(rb);
+
+ /* copy static part */
+ memcpy(change, &ondisk->change, sizeof(ReorderBufferChange));
+
+ data += sizeof(ReorderBufferDiskChange);
+
+ /* restore individual stuff */
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ if (change->newtuple)
+ {
+ Size len = offsetof(ReorderBufferTupleBuf, data)
+ +((ReorderBufferTupleBuf *) data)->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ change->newtuple = ReorderBufferGetTupleBuf(rb);
+ memcpy(change->newtuple, data, len);
+ change->newtuple->tuple.t_data = &change->newtuple->header;
+
+ data += len;
+ }
+
+ if (change->oldtuple)
+ {
+ Size len = offsetof(ReorderBufferTupleBuf, data)
+ +((ReorderBufferTupleBuf *) data)->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ change->oldtuple = ReorderBufferGetTupleBuf(rb);
+ memcpy(change->oldtuple, data, len);
+ change->oldtuple->tuple.t_data = &change->oldtuple->header;
+ data += len;
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ {
+ Snapshot oldsnap = (Snapshot) data;
+ Size size = sizeof(SnapshotData) +
+ sizeof(TransactionId) * oldsnap->xcnt +
+ sizeof(TransactionId) * (oldsnap->subxcnt + 0)
+ ;
+
+ Assert(change->snapshot != NULL);
+
+ change->snapshot = MemoryContextAllocZero(rb->context, size);
+
+ memcpy(change->snapshot, data, size);
+ change->snapshot->xip = (TransactionId *)
+ (((char *) change->snapshot) + sizeof(SnapshotData));
+ change->snapshot->subxip =
+ change->snapshot->xip + change->snapshot->xcnt + 0;
+ change->snapshot->copied = true;
+ break;
+ }
+ /* nothing needs to be done */
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ break;
+ }
+
+ dlist_push_tail(&txn->changes, &change->node);
+ txn->nentries_mem++;
+}
+
+/*
+ * Delete all data spilled to disk after we've restarted/crashed. It will be
+ * recreated when the respective slots are reused.
+ */
+void
+ReorderBufferStartup(void)
+{
+ DIR *logical_dir;
+ struct dirent *logical_de;
+
+ DIR *spill_dir;
+ struct dirent *spill_de;
+
+ logical_dir = AllocateDir("pg_llog");
+ while ((logical_de = ReadDir(logical_dir, "pg_llog")) != NULL)
+ {
+ char path[MAXPGPATH];
+
+ if (strcmp(logical_de->d_name, ".") == 0 ||
+ strcmp(logical_de->d_name, "..") == 0)
+ continue;
+
+ /* one of our own directories */
+ if (strcmp(logical_de->d_name, "snapshots") == 0)
+ continue;
+
+ /*
+ * ok, has to be a surviving logical slot, iterate and delete
+ * everythign starting with xid-*
+ */
+ sprintf(path, "pg_llog/%s", logical_de->d_name);
+
+ spill_dir = AllocateDir(path);
+ while ((spill_de = ReadDir(spill_dir, "pg_llog")) != NULL)
+ {
+ if (strcmp(spill_de->d_name, ".") == 0 ||
+ strcmp(spill_de->d_name, "..") == 0)
+ continue;
+
+ if (strncmp(spill_de->d_name, "xid", 3) == 0)
+ {
+ sprintf(path, "pg_llog/%s/%s", logical_de->d_name,
+ spill_de->d_name);
+
+ if (unlink(path) != 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not remove xid data file \"%s\": %m",
+ path)));
+ }
+ /* XXX: WARN? */
+ }
+ FreeDir(spill_dir);
+ }
+ FreeDir(logical_dir);
+}
+
+/*
+ * toast support
+ */
+
+/*
+ * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h?
+ */
+#define VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr) \
+do { \
+ varattrib_1b_e *attre = (varattrib_1b_e *) (attr); \
+ Assert(VARATT_IS_EXTERNAL(attre)); \
+ Assert(VARSIZE_EXTERNAL(attre) == sizeof(toast_pointer) + VARHDRSZ_EXTERNAL); \
+ memcpy(&(toast_pointer), VARDATA_EXTERNAL(attre), sizeof(toast_pointer)); \
+} while (0)
+
+#define VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) \
+ ((toast_pointer).va_extsize < (toast_pointer).va_rawsize - VARHDRSZ)
+
+/*
+ * Initialize per tuple toast reconstruction support.
+ */
+static void
+ReorderBufferToastInitHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ HASHCTL hash_ctl;
+
+ Assert(txn->toast_hash == NULL);
+
+ memset(&hash_ctl, 0, sizeof(hash_ctl));
+ hash_ctl.keysize = sizeof(Oid);
+ hash_ctl.entrysize = sizeof(ReorderBufferToastEnt);
+ hash_ctl.hash = tag_hash;
+ hash_ctl.hcxt = rb->context;
+ txn->toast_hash = hash_create("ReorderBufferToastHash", 5, &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+}
+
+/*
+ * Per toast-chunk handling for toast reconstruction
+ *
+ * Appends a toast chunk so we can reconstruct it when the tuple "owning" the
+ * toasted Datum comes along.
+ */
+static void
+ReorderBufferToastAppendChunk(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ ReorderBufferToastEnt *ent;
+ bool found;
+ int32 chunksize;
+ bool isnull;
+ Pointer chunk;
+ TupleDesc desc = RelationGetDescr(relation);
+ Oid chunk_id;
+ Oid chunk_seq;
+
+ if (txn->toast_hash == NULL)
+ ReorderBufferToastInitHash(rb, txn);
+
+ Assert(IsToastRelation(relation));
+
+ chunk_id = DatumGetObjectId(fastgetattr(&change->newtuple->tuple, 1, desc, &isnull));
+ Assert(!isnull);
+ chunk_seq = DatumGetInt32(fastgetattr(&change->newtuple->tuple, 2, desc, &isnull));
+ Assert(!isnull);
+
+ ent = (ReorderBufferToastEnt *)
+ hash_search(txn->toast_hash,
+ (void *) &chunk_id,
+ HASH_ENTER,
+ &found);
+
+ if (!found)
+ {
+ Assert(ent->chunk_id == chunk_id);
+ ent->num_chunks = 0;
+ ent->last_chunk_seq = 0;
+ ent->size = 0;
+ ent->reconstructed = NULL;
+ dlist_init(&ent->chunks);
+
+ if (chunk_seq != 0)
+ elog(ERROR, "got sequence entry %d for toast chunk %u instead of seq 0",
+ chunk_seq, chunk_id);
+ }
+ else if (found && chunk_seq != ent->last_chunk_seq + 1)
+ elog(ERROR, "got sequence entry %d for toast chunk %u instead of seq %d",
+ chunk_seq, chunk_id, ent->last_chunk_seq + 1);
+
+ chunk = DatumGetPointer(fastgetattr(&change->newtuple->tuple, 3, desc, &isnull));
+ Assert(!isnull);
+
+ /* calculate size so we can allocate the right size at once later */
+ if (!VARATT_IS_EXTENDED(chunk))
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ else if (VARATT_IS_SHORT(chunk))
+ /* could happen due to heap_form_tuple doing its thing */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ else
+ elog(ERROR, "unexpected type of toast chunk");
+
+ ent->size += chunksize;
+ ent->last_chunk_seq = chunk_seq;
+ ent->num_chunks++;
+ dlist_push_tail(&ent->chunks, &change->node);
+}
+
+/*
+ * Rejigger change->newtuple to point to in-memory toast tuples instead to
+ * on-disk toast tuples that may not longer exist (think DROP TABLE or VACUUM).
+ *
+ * We cannot replace unchanged toast tuples though, so those will still point
+ * to on-disk toast data.
+ */
+static void
+ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ TupleDesc desc;
+ int natt;
+ Datum *attrs;
+ bool *isnull;
+ bool *free;
+ HeapTuple newtup;
+ Relation toast_rel;
+ TupleDesc toast_desc;
+ MemoryContext oldcontext;
+
+ /* no toast tuples changed */
+ if (txn->toast_hash == NULL)
+ return;
+
+ oldcontext = MemoryContextSwitchTo(rb->context);
+
+ /* we should only have toast tuples in an INSERT or UPDATE */
+ Assert(change->newtuple);
+
+ desc = RelationGetDescr(relation);
+
+ toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
+ toast_desc = RelationGetDescr(toast_rel);
+
+ /* should we allocate from stack instead? */
+ attrs = palloc0(sizeof(Datum) * desc->natts);
+ isnull = palloc0(sizeof(bool) * desc->natts);
+ free = palloc0(sizeof(bool) * desc->natts);
+
+ heap_deform_tuple(&change->newtuple->tuple, desc,
+ attrs, isnull);
+
+ for (natt = 0; natt < desc->natts; natt++)
+ {
+ Form_pg_attribute attr = desc->attrs[natt];
+ ReorderBufferToastEnt *ent;
+ struct varlena *varlena;
+
+ /* va_rawsize is the size of the original datum -- including header */
+ struct varatt_external toast_pointer;
+ struct varatt_indirect redirect_pointer;
+ struct varlena *new_datum = NULL;
+ struct varlena *reconstructed;
+ dlist_iter it;
+ Size data_done = 0;
+
+ /* system columns aren't toasted */
+ if (attr->attnum < 0)
+ continue;
+
+ if (attr->attisdropped)
+ continue;
+
+ /* not a varlena datatype */
+ if (attr->attlen != -1)
+ continue;
+
+ /* no data */
+ if (isnull[natt])
+ continue;
+
+ /* ok, we know we have a toast datum */
+ varlena = (struct varlena *) DatumGetPointer(attrs[natt]);
+
+ /* no need to do anything if the tuple isn't external */
+ if (!VARATT_IS_EXTERNAL(varlena))
+ continue;
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, varlena);
+
+ /*
+ * check whether the toast tuple changed, replace if so.
+ */
+ ent = (ReorderBufferToastEnt *)
+ hash_search(txn->toast_hash,
+ (void *) &toast_pointer.va_valueid,
+ HASH_FIND,
+ NULL);
+ if (ent == NULL)
+ continue;
+
+ new_datum =
+ (struct varlena *) palloc0(INDIRECT_POINTER_SIZE);
+
+ free[natt] = true;
+
+ reconstructed = palloc0(toast_pointer.va_rawsize);
+
+ ent->reconstructed = reconstructed;
+
+ /* stitch toast tuple back together from its parts */
+ dlist_foreach(it, &ent->chunks)
+ {
+ bool isnull;
+ ReorderBufferTupleBuf *tup =
+ dlist_container(ReorderBufferChange, node, it.cur)->newtuple;
+ Pointer chunk =
+ DatumGetPointer(fastgetattr(&tup->tuple, 3, toast_desc, &isnull));
+
+ Assert(!isnull);
+ Assert(!VARATT_IS_EXTERNAL(chunk));
+ Assert(!VARATT_IS_SHORT(chunk));
+
+ memcpy(VARDATA(reconstructed) + data_done,
+ VARDATA(chunk),
+ VARSIZE(chunk) - VARHDRSZ);
+ data_done += VARSIZE(chunk) - VARHDRSZ;
+ }
+ Assert(data_done == toast_pointer.va_extsize);
+
+ /* make sure its marked as compressed or not */
+ if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+ SET_VARSIZE_COMPRESSED(reconstructed, data_done + VARHDRSZ);
+ else
+ SET_VARSIZE(reconstructed, data_done + VARHDRSZ);
+
+ memset(&redirect_pointer, 0, sizeof(redirect_pointer));
+ redirect_pointer.pointer = reconstructed;
+
+ SET_VARTAG_EXTERNAL(new_datum, VARTAG_INDIRECT);
+ memcpy(VARDATA_EXTERNAL(new_datum), &redirect_pointer,
+ sizeof(redirect_pointer));
+
+ attrs[natt] = PointerGetDatum(new_datum);
+ }
+
+ /*
+ * Build tuple in separate memory & copy tuple back into the tuplebuf
+ * passed to the output plugin. We can't directly heap_fill_tuple() into
+ * the tuplebuf because attrs[] will point back into the current content.
+ */
+ newtup = heap_form_tuple(desc, attrs, isnull);
+ Assert(change->newtuple->tuple.t_len <= MaxHeapTupleSize);
+ Assert(&change->newtuple->header == change->newtuple->tuple.t_data);
+
+ memcpy(change->newtuple->tuple.t_data,
+ newtup->t_data,
+ newtup->t_len);
+ change->newtuple->tuple.t_len = newtup->t_len;
+
+ /*
+ * free resources we won't further need, more persistent stuff will be
+ * free'd in ReorderBufferToastReset().
+ */
+ RelationClose(toast_rel);
+ pfree(newtup);
+ for (natt = 0; natt < desc->natts; natt++)
+ {
+ if (free[natt])
+ pfree(DatumGetPointer(attrs[natt]));
+ }
+ pfree(attrs);
+ pfree(free);
+ pfree(isnull);
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Free all resources allocated for toast reconstruction.
+ */
+static void
+ReorderBufferToastReset(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ HASH_SEQ_STATUS hstat;
+ ReorderBufferToastEnt *ent;
+
+ if (txn->toast_hash == NULL)
+ return;
+
+ /* sequentially walk over the hash and free everything */
+ hash_seq_init(&hstat, txn->toast_hash);
+ while ((ent = (ReorderBufferToastEnt *) hash_seq_search(&hstat)) != NULL)
+ {
+ dlist_mutable_iter it;
+
+ if (ent->reconstructed != NULL)
+ pfree(ent->reconstructed);
+
+ dlist_foreach_modify(it, &ent->chunks)
+ {
+ ReorderBufferChange *change =
+ dlist_container(ReorderBufferChange, node, it.cur);
+
+ dlist_delete(&change->node);
+ ReorderBufferReturnChange(rb, change);
+ }
+ }
+
+ hash_destroy(txn->toast_hash);
+ txn->toast_hash = NULL;
+}
+
+
+/*
+ * Visibility support routines
+ */
+
+/*-------------------------------------------------------------------------
+ * Lookup actual cmin/cmax values during timetravel access. We can't always
+ * rely on stored cmin/cmax values because of two scenarios:
+ *
+ * * A tuple got changed multiple times during a single transaction and thus
+ * has got a combocid. Combocid's are only valid for the duration of a single
+ * transaction.
+ * * A tuple with a cmin but no cmax (and thus no combocid) got deleted/updated
+ * in another transaction than the one which created it which we are looking
+ * at right now. As only one of cmin, cmax or combocid is actually stored in
+ * the heap we don't have access to the the value we need anymore.
+ *
+ * To resolve those problems we have a per-transaction hash of (cmin, cmax)
+ * tuples keyed by (relfilenode, ctid) which contains the actual (cmin, cmax)
+ * values. That also takes care of combocids by simply not caring about them at
+ * all. As we have the real cmin/cmax values thats enough.
+ *
+ * As we only care about catalog tuples here the overhead of this hashtable
+ * should be acceptable.
+ * -------------------------------------------------------------------------
+ */
+extern bool
+ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
+ HeapTuple htup, Buffer buffer,
+ CommandId *cmin, CommandId *cmax)
+{
+ ReorderBufferTupleCidKey key;
+ ReorderBufferTupleCidEnt *ent;
+ ForkNumber forkno;
+ BlockNumber blockno;
+
+ /* be careful about padding */
+ memset(&key, 0, sizeof(key));
+
+ Assert(!BufferIsLocal(buffer));
+
+ /*
+ * get relfilenode from the buffer, no convenient way to access it other
+ * than that.
+ */
+ BufferGetTag(buffer, &key.relnode, &forkno, &blockno);
+
+ /* tuples can only be in the main fork */
+ Assert(forkno == MAIN_FORKNUM);
+ Assert(blockno == ItemPointerGetBlockNumber(&htup->t_self));
+
+ ItemPointerCopy(&htup->t_self,
+ &key.tid);
+
+ ent = (ReorderBufferTupleCidEnt *)
+ hash_search(tuplecid_data,
+ (void *) &key,
+ HASH_FIND,
+ NULL);
+
+ if (ent == NULL)
+ return false;
+
+ if (cmin)
+ *cmin = ent->cmin;
+ if (cmax)
+ *cmax = ent->cmax;
+ return true;
+}
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
new file mode 100644
index 0000000..6547e3f
--- /dev/null
+++ b/src/backend/replication/logical/snapbuild.c
@@ -0,0 +1,1581 @@
+/*-------------------------------------------------------------------------
+ *
+ * snapbuild.c
+ *
+ * Support for building timetravel snapshots based on the contents of the
+ * WAL which then can be used to decode the contents of the WAL.
+ *
+ * NOTES:
+ *
+ * We build snapshots which can *only* be used to read catalog contents by
+ * reading and interpreting the WAL stream. The aim is to build a snapshot that
+ * behaves the same as a freshly taken MVCC snapshot would have at the time the
+ * XLogRecord was generated.
+ *
+ * To build the snapshots we reuse the infrastructure built for hot
+ * standby. The snapshots we build look different than HS' because we have
+ * different needs. To successfully decode data from the WAL we only need to
+ * access catalogs/(sys|rel|cat)cache, not the actual user tables since the
+ * data we decode is contained in the WAL records. Also, our snapshots need to
+ * be different in comparison to normal MVCC ones because in contrast to those
+ * we cannot fully rely on the clog and pg_subtrans for information about
+ * committed transactions because they might commit in the future from the POV
+ * of the wal entry we're currently decoding.
+ *
+ * As the percentage of transactions modifying the catalog normally is fairly
+ * small in comparisons to ones only manipulating user data we keep track of
+ * the committed catalog modifying ones inside (xmin, xmax) instead of keeping
+ * track of all running transactions like its done in a normal snapshot. Note
+ * that we're generally only looking at transactions that have acquired an
+ * xid. That is we keep a list of transactions between snapshot->(xmin, xmax)
+ * that we consider committed, everything else is considered aborted/in
+ * progress. That also allows us not to care about subtransactions before they
+ * have committed which means this modules, in contrast to HS, doesn't have to
+ * care about suboverflowed subtransactions and similar.
+ *
+ * One complexity of doing this is that to e.g. handle mixed DDL/DML
+ * transactions we need Snapshots that see intermediate versions of the catalog
+ * in a transaction. During normal operation this is achieved by using
+ * CommandIds/cmin/cmax. The problem with that however is that for space
+ * efficiency reasons only one value of that is stored (c.f. combocid.c). Since
+ * Combocids are only available in memory we log additional information which
+ * allows us to get the original (cmin, cmax) pair during visibility
+ * checks. Check the reorderbuffer.c's comment above
+ * ResolveCminCmaxDuringDecoding() for details.
+ *
+ * To facilitate all this we need our own visibility routine, as the normal
+ * ones are optimized for different usecases. To make sure no unexpected
+ * database access bypassing our special snapshot is possible - which would
+ * possibly load invalid data into caches - we temporarily overload the
+ * .satisfies methods of the usual snapshots while doing timetravel.
+ *
+ * To replace the normal catalog snapshots with timetravel ones use the
+ * SetupDecodingSnapshots and RevertFromDecodingSnapshots functions.
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/snapbuild.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+
+#include "access/heapam_xlog.h"
+#include "access/transam.h"
+#include "access/xact.h"
+
+#include "replication/logical.h"
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+
+#include "utils/builtins.h"
+#include "utils/catcache.h" /* FIXME: Use */
+#include "utils/memutils.h"
+#include "utils/snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/tqual.h"
+
+#include "storage/block.h" /* debugging output */
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/standby.h"
+
+typedef struct SnapBuild
+{
+ /* how far are we along building our first full snapshot */
+ SnapBuildState state;
+
+ /* private memory context used to allocate memory for this module. */
+ MemoryContext context;
+
+ /* all transactions < than this have committed/aborted */
+ TransactionId xmin;
+
+ /* all transactions >= than this are uncommitted */
+ TransactionId xmax;
+
+ /*
+ * Don't replay commits from an LSN <= this LSN. This can be set
+ * externally but it will also be advanced (never retreat) from within
+ * snapbuild.c.
+ */
+ XLogRecPtr transactions_after;
+
+ /*
+ * Don't start decoding WAL until the "xl_running_xacts" information
+ * indicates there are no running xids with a xid smaller than this.
+ */
+ TransactionId initial_xmin_horizon;
+
+ /*
+ * Snapshot thats valid to see all currently committed transactions that
+ * see catalog modifications.
+ */
+ Snapshot snapshot;
+
+ /*
+ * LSN of the last location we are sure a snapshot has been serialized to.
+ */
+ XLogRecPtr last_serialized_snapshot;
+
+ ReorderBuffer *reorder;
+
+ /*
+ * Information about initially running transactions
+ *
+ * When we start building a snapshot there already may be transactions in
+ * progress. Those are stored in running.xip. We don't have enough
+ * information about those to decode their contents, so until they are
+ * finished (xcnt=0) we cannot switch to a CONSISTENT state.
+ */
+ struct
+ {
+ /*
+ * As long as running.xcnt all XIDs < running.xmin and > running.xmax
+ * have to be checked whether they still are running.
+ */
+ TransactionId xmin;
+ TransactionId xmax;
+
+ size_t xcnt; /* number of used xip entries */
+ size_t xcnt_space; /* allocated size of xip */
+ TransactionId *xip; /* running xacts array, xidComparator-sorted */
+ } running;
+
+ /*
+ * Array of transactions which could have catalog changes that committed
+ * between xmin and xmax
+ */
+ struct
+ {
+ /* number of committed transactions */
+ size_t xcnt;
+
+ /* available space for committed transactions */
+ size_t xcnt_space;
+
+ /*
+ * Until we reach a CONSISTENT state, we record commits of all
+ * transactions, not just the catalog changing ones. Record when that
+ * changes so we know we cannot export a snapshot safely anymore.
+ */
+ bool includes_all_transactions;
+
+ /*
+ * Array of committed transactions that have modified the catalog.
+ *
+ * As this array is frequently modified we do *not* keep it in
+ * xidComparator order. Instead we sort the array when building &
+ * distributing a snapshot.
+ *
+ * XXX: That doesn't seem to be good reasoning anymore. Everytime we
+ * add something here after becoming consistent will also require
+ * distributing a snapshot. Storing them sorted would potentially make
+ * it easier to purge as well (but more complicated wrt wraparound?).
+ */
+ TransactionId *xip;
+ } committed;
+} SnapBuild;
+
+/*
+ * Starting a transaction -- which we need to do while exporting a snapshot --
+ * removes knowledge about the previously used resowner, so we save it here.
+ */
+ResourceOwner SavedResourceOwnerDuringExport = NULL;
+
+/* transaction state manipulation functions */
+static void SnapBuildEndTxn(SnapBuild *builder, TransactionId xid);
+
+/* ->running manipulation */
+static bool SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid);
+
+/* ->committed manipulation */
+static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
+
+/* snapshot building/manipulation/distribution functions */
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid);
+
+static void SnapBuildFreeSnapshot(Snapshot snap);
+
+static void SnapBuildSnapIncRefcount(Snapshot snap);
+
+static void SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn);
+
+/* xlog reading helper functions for SnapBuildProcessRecord */
+static bool SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running);
+
+/* serialization functions */
+static void SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn);
+static bool SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn);
+
+
+/*
+ * Allocate a new snapshot builder.
+ */
+SnapBuild *
+AllocateSnapshotBuilder(ReorderBuffer *reorder,
+ TransactionId xmin_horizon,
+ XLogRecPtr start_lsn)
+{
+ MemoryContext context;
+ MemoryContext oldcontext;
+ SnapBuild *builder;
+
+ context = AllocSetContextCreate(TopMemoryContext,
+ "snapshot builder context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ oldcontext = MemoryContextSwitchTo(context);
+
+ builder = palloc0(sizeof(SnapBuild));
+
+ builder->state = SNAPBUILD_START;
+ builder->context = context;
+ builder->reorder = reorder;
+ /* Other struct members initialized by zeroing, above */
+
+ /* builder->running is initialized by zeroing, above */
+
+ builder->committed.xcnt = 0;
+ builder->committed.xcnt_space = 128; /* arbitrary number */
+ builder->committed.xip =
+ palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
+ builder->committed.includes_all_transactions = true;
+ builder->committed.xip =
+ palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
+ builder->initial_xmin_horizon = xmin_horizon;
+ builder->transactions_after = start_lsn;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return builder;
+}
+
+/*
+ * Free a snapshot builder.
+ */
+void
+FreeSnapshotBuilder(SnapBuild *builder)
+{
+ MemoryContext context = builder->context;
+
+ if (builder->snapshot)
+ SnapBuildFreeSnapshot(builder->snapshot);
+
+ if (builder->running.xip)
+ pfree(builder->running.xip);
+
+ if (builder->committed.xip)
+ pfree(builder->committed.xip);
+
+ pfree(builder);
+
+ MemoryContextDelete(context);
+}
+
+/*
+ * Free an unreferenced snapshot that has previously been built by us.
+ */
+static void
+SnapBuildFreeSnapshot(Snapshot snap)
+{
+ /* make sure we don't get passed an external snapshot */
+ Assert(snap->satisfies == HeapTupleSatisfiesMVCCDuringDecoding);
+
+ /* make sure nobody modified our snapshot */
+ Assert(snap->curcid == FirstCommandId);
+ Assert(!snap->suboverflowed);
+ Assert(!snap->takenDuringRecovery);
+ Assert(!snap->regd_count);
+
+ /* slightly more likely, so it's checked even without c-asserts */
+ if (snap->copied)
+ elog(ERROR, "can't free a copied snapshot");
+
+ if (snap->active_count)
+ elog(ERROR, "can't free an active snapshot");
+
+ pfree(snap);
+}
+
+/*
+ * In which state of snapshot building ar we?
+ */
+SnapBuildState
+SnapBuildCurrentState(SnapBuild *builder)
+{
+ return builder->state;
+}
+
+/*
+ * Should the contents of transaction ending at 'ptr' be decoded?
+ */
+bool
+SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr)
+{
+ return ptr <= builder->transactions_after;
+}
+
+/*
+ * Increase refcount of a snapshot.
+ *
+ * This is used when handing out a snapshot to some external resource or when
+ * adding a Snapshot as builder->snapshot.
+ */
+static void
+SnapBuildSnapIncRefcount(Snapshot snap)
+{
+ snap->active_count++;
+}
+
+/*
+ * Decrease refcount of a snapshot and free if the refcount reaches zero.
+ *
+ * Externally visible so external resources that have been handed an IncRef'ed
+ * Snapshot can free it easily.
+ */
+void
+SnapBuildSnapDecRefcount(Snapshot snap)
+{
+ /* make sure we don't get passed an external snapshot */
+ Assert(snap->satisfies == HeapTupleSatisfiesMVCCDuringDecoding);
+
+ /* make sure nobody modified our snapshot */
+ Assert(snap->curcid == FirstCommandId);
+ Assert(!snap->suboverflowed);
+ Assert(!snap->takenDuringRecovery);
+ Assert(!snap->regd_count);
+
+ Assert(snap->active_count);
+
+ /* slightly more likely, so its checked even without casserts */
+ if (snap->copied)
+ elog(ERROR, "can't free a copied snapshot");
+
+ snap->active_count--;
+ if (!snap->active_count)
+ SnapBuildFreeSnapshot(snap);
+}
+
+/*
+ * Build a new snapshot, based on currently committed catalog-modifying
+ * transactions.
+ *
+ * In-progress transactions with catalog access are *not* allowed to modify
+ * these snapshots; they have to copy them and fill in appropriate ->curcid and
+ * ->subxip/subxcnt values.
+ */
+static Snapshot
+SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid)
+{
+ Snapshot snapshot;
+ Size ssize;
+
+ Assert(builder->state >= SNAPBUILD_FULL_SNAPSHOT);
+
+ ssize = sizeof(SnapshotData)
+ + sizeof(TransactionId) * builder->committed.xcnt
+ + sizeof(TransactionId) * 1 /* toplevel xid */ ;
+
+ snapshot = MemoryContextAllocZero(builder->context, ssize);
+
+ snapshot->satisfies = HeapTupleSatisfiesMVCCDuringDecoding;
+
+ /*
+ * We misuse the original meaning of SnapshotData's xip and subxip fields
+ * to make the more fitting for our needs.
+ *
+ * In the 'xip' array we store transactions that have to be treated as
+ * committed. Since we will only ever look at tuples from transactions
+ * that have modified the catalog its more efficient to store those few
+ * that exist between xmin and xmax (frequently there are none).
+ *
+ * Snapshots that are used in transactions that have modified the catalog
+ * also use the 'subxip' array to store their toplevel xid and all the
+ * subtransaction xids so we can recognize when we need to treat rows as
+ * visible that are not in xip but still need to be visible. Subxip only
+ * gets filled when the transaction is copied into the context of a
+ * catalog modifying transaction since we otherwise share a snapshot
+ * between transactions. As long as a txn hasn't modified the catalog it
+ * doesn't need to treat any uncommitted rows as visible, so there is no
+ * need for those xids.
+ *
+ * Both arrays are qsort'ed so that we can use bsearch() on them.
+ *
+ * XXX: Do we want extra fields instead of misusing existing ones instead?
+ */
+ Assert(TransactionIdIsNormal(builder->xmin));
+ Assert(TransactionIdIsNormal(builder->xmax));
+
+ snapshot->xmin = builder->xmin;
+ snapshot->xmax = builder->xmax;
+
+ /* store all transactions to be treated as committed by this snapshot */
+ snapshot->xip =
+ (TransactionId *) ((char *) snapshot + sizeof(SnapshotData));
+ snapshot->xcnt = builder->committed.xcnt;
+ memcpy(snapshot->xip,
+ builder->committed.xip,
+ builder->committed.xcnt * sizeof(TransactionId));
+
+ /* sort so we can bsearch() */
+ qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+
+ /*
+ * Initially, subxip is empty, i.e. it's a snapshot to be used by
+ * transactions that don't modify the catalog. Will be filled by
+ * ReorderBufferCopySnap() if necessary.
+ */
+ snapshot->subxcnt = 0;
+ snapshot->subxip = NULL;
+
+ snapshot->suboverflowed = false;
+ snapshot->takenDuringRecovery = false;
+ snapshot->copied = false;
+ snapshot->curcid = FirstCommandId;
+ snapshot->active_count = 0;
+ snapshot->regd_count = 0;
+
+ return snapshot;
+}
+
+/*
+ * Export a snapshot so it can be set in another session with SET TRANSACTION
+ * SNAPSHOT.
+ *
+ * For that we need to start a transaction in the current backend as the
+ * importing side checks whether the source transaction is still open to make
+ * sure the xmin horizon hasn't advanced since then.
+ *
+ * After that we convert a locally built snapshot into the normal variant
+ * understood by HeapTupleSatisfiesMVCC et al.
+ */
+const char *
+SnapBuildExportSnapshot(SnapBuild *builder)
+{
+ Snapshot snap;
+ char *snapname;
+ TransactionId xid;
+ TransactionId *newxip;
+ int newxcnt = 0;
+
+ elog(LOG, "building snapshot");
+
+ if (builder->state != SNAPBUILD_CONSISTENT)
+ elog(ERROR, "cannot export a snapshot before reaching a consistent state");
+
+ if (!builder->committed.includes_all_transactions)
+ elog(ERROR, "cannot export a snapshot, not all transactions are monitored anymore");
+
+ /* so we don't overwrite the existing value */
+ if (TransactionIdIsValid(MyPgXact->xmin))
+ elog(ERROR, "cannot export a snapshot when MyPgXact->xmin already is valid");
+
+ if (IsTransactionOrTransactionBlock())
+ elog(ERROR, "cannot export a snapshot from within a transaction");
+
+ if (SavedResourceOwnerDuringExport)
+ elog(ERROR, "can only export one snapshot at a time");
+
+ SavedResourceOwnerDuringExport = CurrentResourceOwner;
+
+ StartTransactionCommand();
+
+ Assert(!FirstSnapshotSet);
+
+ /* There doesn't seem to a nice API to set these */
+ XactIsoLevel = XACT_REPEATABLE_READ;
+ XactReadOnly = true;
+
+ snap = SnapBuildBuildSnapshot(builder, GetTopTransactionId());
+
+ /*
+ * We know that snap->xmin is alive, enforced by the logical xmin
+ * mechanism. Due to that we can do this without locks, we're only
+ * changing our own value.
+ */
+ MyPgXact->xmin = snap->xmin;
+
+ /* allocate in transaction context */
+ newxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+
+ /*
+ * snapbuild.c builds transactions in an "inverted" manner, which means it
+ * stores committed transactions in ->xip, not ones in progress. Build a
+ * classical snapshot by marking all non-committed transactions as
+ * in-progress. This can be expensive.
+ */
+ for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+ {
+ void *test;
+
+ /*
+ * check whether transaction committed using the timetravel meaning of
+ * ->xip
+ */
+ test = bsearch(&xid, snap->xip, snap->xcnt,
+ sizeof(TransactionId), xidComparator);
+
+ elog(DEBUG2, "checking xid %u.. %d (xmin %u, xmax %u)",
+ xid, test == NULL, snap->xmin, snap->xmax);
+
+ if (test == NULL)
+ {
+ if (newxcnt >= GetMaxSnapshotXidCount())
+ elog(ERROR, "snapshot too large");
+
+ newxip[newxcnt++] = xid;
+
+ elog(DEBUG2, "treat %u as in-progress", xid);
+ }
+
+ TransactionIdAdvance(xid);
+ }
+
+ snap->xcnt = newxcnt;
+ snap->xip = newxip;
+
+ /*
+ * now that we've built a plain snapshot, use the normal mechanisms for
+ * exporting it
+ */
+ snapname = ExportSnapshot(snap);
+
+ elog(LOG, "exported snapbuild snapshot: %s xcnt %u", snapname, snap->xcnt);
+ return snapname;
+}
+
+/*
+ * Reset a previously SnapBuildExportSnapshot()'ed snapshot if there is
+ * any. Aborts the previously started transaction and resets the resource owner
+ * back to it's original value.
+ */
+void
+SnapBuildClearExportedSnapshot()
+{
+ /* nothing exported, thats the usual case */
+ if (SavedResourceOwnerDuringExport == NULL)
+ return;
+
+ Assert(IsTransactionState());
+
+ /* make sure nothing could have ever happened */
+ AbortCurrentTransaction();
+
+ CurrentResourceOwner = SavedResourceOwnerDuringExport;
+ SavedResourceOwnerDuringExport = NULL;
+}
+
+/*
+ * Handle the effects of a single heap change, appropriate to the current state
+ * of the snapshot builder and returns whether changes made at (xid, lsn) may
+ * be decoded.
+ */
+bool
+SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
+{
+ bool is_old_tx;
+
+ /*
+ * We can't handle data in transactions if we haven't built a snapshot
+ * yet, so don't store them.
+ */
+ if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
+ return false;
+
+ /*
+ * No point in keeping track of changes in transactions that we don't have
+ * enough information about to decode.
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT &&
+ SnapBuildTxnIsRunning(builder, xid))
+ return false;
+
+ is_old_tx = ReorderBufferIsXidKnown(builder->reorder, xid);
+
+ if (!is_old_tx || !ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
+ {
+ /* only build a new snapshot if we don't have a prebuilt one */
+ if (builder->snapshot == NULL)
+ {
+ builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+ /* inrease refcount for the snapshot builder */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+ }
+
+ /* increase refcount for the transaction */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+ ReorderBufferSetBaseSnapshot(builder->reorder, xid, lsn,
+ builder->snapshot);
+ }
+
+ return true;
+}
+
+/*
+ * Do CommandId/ComboCid handling after reading a xl_heap_new_cid record. This
+ * implies that a transaction has done some for of write to system catalogs.
+ */
+void
+SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
+ XLogRecPtr lsn, xl_heap_new_cid *xlrec)
+{
+ CommandId cid;
+
+ /*
+ * we only log new_cid's if a catalog tuple was modified, so
+ * set transaction to timetravelling.
+ */
+ ReorderBufferXidSetTimetravel(builder->reorder, xid,lsn);
+
+ ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
+ xlrec->target.node, xlrec->target.tid,
+ xlrec->cmin, xlrec->cmax,
+ xlrec->combocid);
+
+ /* figure out new command id */
+ if (xlrec->cmin != InvalidCommandId &&
+ xlrec->cmax != InvalidCommandId)
+ cid = Max(xlrec->cmin, xlrec->cmax);
+ else if (xlrec->cmax != InvalidCommandId)
+ cid = xlrec->cmax;
+ else if (xlrec->cmin != InvalidCommandId)
+ cid = xlrec->cmin;
+ else
+ {
+ cid = InvalidCommandId; /* silence compiler */
+ elog(ERROR, "broken arrow, no cid?");
+ }
+
+ /*
+ * FIXME: potential race condition here: if multiple snapshots were running
+ * & generating changes in the same transaction on the source side this
+ * could be problematic. But this cannot happen for system catalogs, right?
+ */
+ ReorderBufferAddNewCommandId(builder->reorder, xid, lsn, cid + 1);
+}
+
+/*
+ * Check whether `xid` is currently 'running'. Running transactions in our
+ * parlance are transactions which we didn't observe from the start so we can't
+ * properly decode them. They only exist after we freshly started from an
+ * < CONSISTENT snapshot.
+ */
+static bool
+SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid)
+{
+ Assert(builder->state < SNAPBUILD_CONSISTENT);
+ Assert(TransactionIdIsValid(builder->running.xmin));
+ Assert(TransactionIdIsValid(builder->running.xmax));
+
+ if (builder->running.xcnt &&
+ NormalTransactionIdFollows(xid, builder->running.xmin) &&
+ NormalTransactionIdPrecedes(xid, builder->running.xmax))
+ {
+ TransactionId *search =
+ bsearch(&xid, builder->running.xip, builder->running.xcnt_space,
+ sizeof(TransactionId), xidComparator);
+
+ if (search != NULL)
+ {
+ Assert(*search == xid);
+ return true;
+ }
+ }
+
+ return false;
+}
+
+/*
+ * Add a new Snapshot to all transactions we're decoding that currently are
+ * in-progress so they can see new catalog contents made by the transaction
+ * that just committed. This is necessary because those in-progress
+ * transactions will use the new catalog's contents from here on (at the very
+ * least everything they do needs to be compatible with newer catalog contents).
+ */
+static void
+SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn)
+{
+ dlist_iter txn_i;
+ ReorderBufferTXN *txn;
+
+ /*
+ * Iterate through all toplevel transactions. This can include
+ * subtransactions which we just don't yet know to be that, but that's
+ * fine, they will just get an unneccesary snapshot queued.
+ */
+ dlist_foreach(txn_i, &builder->reorder->toplevel_by_lsn)
+ {
+ txn = dlist_container(ReorderBufferTXN, node, txn_i.cur);
+
+ Assert(TransactionIdIsValid(txn->xid));
+
+ /*
+ * If we don't have a base snapshot yet, there are no changes in this
+ * transaction which in turn implies we don't yet need a snapshot at
+ * all. We'll add add a snapshot when the first change gets queued.
+ *
+ * XXX: is that fine if only a subtransaction has a base snapshot so
+ * far?
+ */
+ if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
+ continue;
+
+ elog(DEBUG2, "adding a new snapshot to %u at %X/%X",
+ txn->xid, (uint32) (lsn >> 32), (uint32) lsn);
+
+ /* increase refcount for the transaction */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+ ReorderBufferAddSnapshot(builder->reorder, txn->xid, lsn,
+ builder->snapshot);
+ }
+}
+
+/*
+ * Keep track of a new catalog changing transaction that has committed.
+ */
+static void
+SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
+{
+ Assert(TransactionIdIsValid(xid));
+
+ if (builder->committed.xcnt == builder->committed.xcnt_space)
+ {
+ builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
+
+ /* XXX: put in a limit here as a defense against bugs? */
+
+ elog(DEBUG1, "increasing space for committed transactions to %zu",
+ builder->committed.xcnt_space);
+
+ builder->committed.xip = repalloc(builder->committed.xip,
+ builder->committed.xcnt_space * sizeof(TransactionId));
+ }
+
+ /*
+ * XXX: It might make sense to keep the array sorted here instead of doing
+ * it everytime we build a new snapshot. On the other hand this gets called
+ * repeatedly when a transaction with subtransactions commits.
+ */
+ builder->committed.xip[builder->committed.xcnt++] = xid;
+}
+
+/*
+ * Remove knowledge about transactions we treat as committed that are smaller
+ * than ->xmin. Those won't ever get checked via the ->commited array but via
+ * the clog machinery, so we don't need to waste memory on them.
+ */
+static void
+SnapBuildPurgeCommittedTxn(SnapBuild *builder)
+{
+ int off;
+ TransactionId *workspace;
+ int surviving_xids = 0;
+
+ /* not ready yet */
+ if (!TransactionIdIsNormal(builder->xmin))
+ return;
+
+ /* XXX: Neater algorithm? */
+ workspace =
+ MemoryContextAlloc(builder->context,
+ builder->committed.xcnt * sizeof(TransactionId));
+
+ /* copy xids that still are interesting to workspace */
+ for (off = 0; off < builder->committed.xcnt; off++)
+ {
+ if (NormalTransactionIdPrecedes(builder->committed.xip[off],
+ builder->xmin))
+ ; /* remove */
+ else
+ workspace[surviving_xids++] = builder->committed.xip[off];
+ }
+
+ /* copy workspace back to persistent state */
+ memcpy(builder->committed.xip, workspace,
+ surviving_xids * sizeof(TransactionId));
+
+ elog(DEBUG1, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
+ (uint32) builder->committed.xcnt, (uint32) surviving_xids,
+ builder->xmin, builder->xmax);
+ builder->committed.xcnt = surviving_xids;
+
+ pfree(workspace);
+}
+
+/*
+ * Common logic for SnapBuildAbortTxn and SnapBuildCommitTxn dealing with
+ * keeping track of the amount of running transactions.
+ */
+static void
+SnapBuildEndTxn(SnapBuild *builder, TransactionId xid)
+{
+ if (builder->state == SNAPBUILD_CONSISTENT)
+ return;
+
+ if (SnapBuildTxnIsRunning(builder, xid))
+ {
+ Assert(builder->running.xcnt > 0);
+
+ if (!--builder->running.xcnt)
+ {
+ /*
+ * None of the originally running transaction is running anymore.
+ * Due to that our incrementaly built snapshot now is complete.
+ */
+ elog(LOG, "found consistent point due to SnapBuildEndTxn + running: %u", xid);
+ builder->state = SNAPBUILD_CONSISTENT;
+ }
+ }
+}
+
+/*
+ * Abort a transaction, throw away all state we kept
+ */
+void
+SnapBuildAbortTxn(SnapBuild *builder, TransactionId xid,
+ int nsubxacts, TransactionId *subxacts)
+{
+ int i;
+
+ for (i = 0; i < nsubxacts; i++)
+ {
+ TransactionId subxid = subxacts[i];
+
+ SnapBuildEndTxn(builder, subxid);
+ }
+
+ SnapBuildEndTxn(builder, xid);
+}
+
+/*
+ * Handle everything that needs to be done when a transaction commits
+ */
+void
+SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
+ int nsubxacts, TransactionId *subxacts)
+{
+ int nxact;
+
+ bool forced_timetravel = false;
+ bool sub_does_timetravel = false;
+ bool top_does_timetravel = false;
+
+ TransactionId xmax = xid;
+
+ /*
+ * If we couldn't observe every change of a transaction because it was
+ * already running at the point we started to observe we have to assume it
+ * made catalog changes.
+ *
+ * This has the positive benefit that we afterwards have enough
+ * information to build an exportable snapshot thats usable by pg_dump et
+ * al.
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ {
+ /* ensure that only commits after this are getting replayed */
+ if (builder->transactions_after < lsn)
+ builder->transactions_after = lsn;
+
+ /*
+ * we could avoid treating !SnapBuildTxnIsRunning transactions as
+ * timetravel ones, but we want to be able to export a snapshot when
+ * we reached consistency.
+ */
+ forced_timetravel = true;
+ elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running to early", xid);
+ }
+
+ for (nxact = 0; nxact < nsubxacts; nxact++)
+ {
+ TransactionId subxid = subxacts[nxact];
+
+ /*
+ * make sure txn is not tracked in running txn's anymore, switch state
+ */
+ SnapBuildEndTxn(builder, subxid);
+
+ /*
+ * If we're forcing timetravel we also need accurate subtransaction
+ * status.
+ */
+ if (forced_timetravel)
+ {
+ SnapBuildAddCommittedTxn(builder, subxid);
+ if (NormalTransactionIdFollows(subxid, xmax))
+ xmax = subxid;
+ }
+
+ /*
+ * add subtransaction to base snapshot, we don't distinguish to
+ * toplevel transactions there.
+ */
+ else if (ReorderBufferXidDoesTimetravel(builder->reorder, subxid))
+ {
+ sub_does_timetravel = true;
+
+ elog(DEBUG1, "found subtransaction %u:%u with catalog changes.",
+ xid, subxid);
+
+ SnapBuildAddCommittedTxn(builder, subxid);
+
+ if (NormalTransactionIdFollows(subxid, xmax))
+ xmax = subxid;
+ }
+ }
+
+ /*
+ * make sure txn is not tracked in running txn's anymore, switch state
+ */
+ SnapBuildEndTxn(builder, xid);
+
+ if (forced_timetravel)
+ {
+ elog(DEBUG1, "forced transaction %u to do timetravel.", xid);
+
+ SnapBuildAddCommittedTxn(builder, xid);
+ }
+ /* add toplevel transaction to base snapshot */
+ else if (ReorderBufferXidDoesTimetravel(builder->reorder, xid))
+ {
+ elog(DEBUG1, "found top level transaction %u, with catalog changes!",
+ xid);
+
+ top_does_timetravel = true;
+ SnapBuildAddCommittedTxn(builder, xid);
+ }
+ else if (sub_does_timetravel)
+ {
+ /* mark toplevel txn as timetravel as well */
+ SnapBuildAddCommittedTxn(builder, xid);
+ }
+
+ if (forced_timetravel || top_does_timetravel || sub_does_timetravel)
+ {
+ if (!TransactionIdIsValid(builder->xmax) ||
+ TransactionIdFollowsOrEquals(xmax, builder->xmax))
+ {
+ builder->xmax = xmax;
+ TransactionIdAdvance(builder->xmax);
+ }
+
+ if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
+ return;
+
+ /* decrease the snapshot builder's refcount of the old snapshot */
+ if (builder->snapshot)
+ SnapBuildSnapDecRefcount(builder->snapshot);
+
+ builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+
+ /* refcount of the snapshot builder for the new snapshot */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+
+ /* add a new SnapshotNow to all currently running transactions */
+ SnapBuildDistributeNewCatalogSnapshot(builder, lsn);
+ }
+ else
+ {
+ /* record that we cannot export a general snapshot anymore */
+ builder->committed.includes_all_transactions = false;
+ }
+}
+
+
+/* -----------------------------------
+ * Snapshot building functions dealing with xlog records
+ * -----------------------------------
+ */
+void
+SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running)
+{
+ ReorderBufferTXN *txn;
+
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ {
+ /* returns false if there's no point in performing cleanup just yet */
+ if (!SnapBuildFindSnapshot(builder, lsn, running))
+ return;
+ }
+ else
+ {
+ SnapBuildSerialize(builder, lsn);
+ }
+
+ /*
+ * update range of interesting xids. We don't increase ->xmax because once
+ * we are in a consistent state we can do that ourselves and much more
+ * efficiently so because we only need to do it for catalog transactions.
+ */
+ builder->xmin = running->oldestRunningXid;
+
+ /*
+ * xmax can be lower than xmin here because we only increase xmax when we
+ * hit a transaction with catalog changes. While odd looking, its correct
+ * and actually more efficient this way since we hit fast paths in tqual.c.
+ */
+
+ /* Remove transactions we don't need to keep track off anymore */
+ SnapBuildPurgeCommittedTxn(builder);
+
+ elog(DEBUG1, "xmin: %u, xmax: %u, oldestrunning: %u",
+ builder->xmin, builder->xmax,
+ running->oldestRunningXid);
+
+ /*
+ * inrease shared memory state, so vacuum can work on tuples we prevent
+ * from being pruned till now.
+ */
+ IncreaseLogicalXminForSlot(lsn, running->oldestRunningXid);
+
+ /*
+ * Also tell the slot where we can restart decoding from. We don't want to
+ * do that after every commit because changing that implies an fsync of the
+ * logical slot's state file, so we only do it everytime we see a running
+ * xacts record.
+ *
+ * Do so by looking for the oldest in progress transaction (determined by
+ * the first LSN of any of its relevant records). Every transaction
+ * remembers the last location we stored the snapshot to disk before its
+ * beginning. That point is where we can restart from.
+ */
+
+ /*
+ * Can't know about a serialized snapshot's location if we're not
+ * consistent
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ return;
+
+ txn = ReorderBufferGetOldestTXN(builder->reorder);
+
+ /*
+ * oldest ongoing txn might have started when we didn't yet serialize
+ * anything because we hadn't reached a consistent state yet.
+ */
+ if (txn != NULL && txn->restart_decoding_lsn != InvalidXLogRecPtr)
+ IncreaseRestartDecodingForSlot(lsn, txn->restart_decoding_lsn);
+
+ /*
+ * No in-progress transaction, can reuse the last serialized snapshot if we
+ * have one.
+ */
+ else if (txn == NULL &&
+ builder->reorder->current_restart_decoding_lsn != InvalidXLogRecPtr &&
+ builder->last_serialized_snapshot != InvalidXLogRecPtr)
+ IncreaseRestartDecodingForSlot(lsn, builder->last_serialized_snapshot);
+}
+
+
+/*
+ * Build the start of a snapshot that's capable of decoding the catalog. Helper
+ * function for SnapBuildProcessRunningXacts() while we're not yet consistent.
+ *
+ * Returns true if there is a point in performing internal maintenance/cleanup
+ * using the xl_running_xacts record.
+ */
+static bool
+SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running)
+{
+ /* ---
+ * Build catalog decoding snapshot incrementally using information about
+ * the currently running transactions. There are several ways to do that:
+
+ * a) There were no running transactions when the xl_running_xacts record
+ * was inserted, jump to CONSISTENT immediately. We might find such a
+ * state we were waiting for b) and c).
+
+ * b) Wait for all toplevel transactions that were running to end. We
+ * simply track the number of in-progress toplevel transactions and
+ * lower it whenever one commits or aborts. When that number
+ * (builder->running.xcnt) reaches zero, we can go from FULL_SNAPSHOT to
+ * CONSISTENT.
+ * NB: We need to search running.xip when seeing a transaction's end to
+ * make sure it's a toplevel transaction and it's been one of the
+ * intially running ones.
+ * Interestingly, in contrast to HS this allows us not to care about
+ * subtransactions - and by extension suboverflowed xl_running_xacts -
+ * at all.
+ *
+ * c) This (in a previous run) or another decoding slot serialized a
+ * snapshot to disk that we can use.
+ * ---
+ */
+
+ /*
+ * xl_running_xact record is older than what we can use, we might not have
+ * all necessary catalog rows anymore.
+ */
+ if (TransactionIdIsNormal(builder->initial_xmin_horizon) &&
+ NormalTransactionIdPrecedes(running->oldestRunningXid,
+ builder->initial_xmin_horizon))
+ {
+ elog(LOG, "skipping snapshot at %X/%X due to initial xmin horizon of %u vs the snapshot's %u",
+ (uint32) (lsn >> 32), (uint32) lsn,
+ builder->initial_xmin_horizon, running->oldestRunningXid);
+ return true;
+ }
+
+ /*
+ * a) No transaction were running, we can jump to consistent.
+ *
+ * NB: We might have already started to incrementally assemble a snapshot,
+ * so we need to be careful to deal with that.
+ */
+ if (running->xcnt == 0)
+ {
+ if (builder->transactions_after == InvalidXLogRecPtr ||
+ builder->transactions_after < lsn)
+ builder->transactions_after = lsn;
+
+ builder->xmin = running->oldestRunningXid;
+ builder->xmax = running->latestCompletedXid;
+ TransactionIdAdvance(builder->xmax);
+
+ Assert(TransactionIdIsNormal(builder->xmin));
+ Assert(TransactionIdIsNormal(builder->xmax));
+
+ /* no transactions running now */
+ builder->running.xcnt = 0;
+ builder->running.xmin = InvalidTransactionId;
+ builder->running.xmax = InvalidTransactionId;
+
+ /*
+ * FIXME: abort everything we have stored about running transactions,
+ * relevant e.g. after a crash.
+ */
+ builder->state = SNAPBUILD_CONSISTENT;
+
+ elog(LOG, "found initial snapshot (xmin %u) due to running xacts with xcnt == 0",
+ builder->xmin);
+
+ return false;
+ }
+ /* c) valid on disk state */
+ else if (SnapBuildRestore(builder, lsn))
+ {
+ /* there won't be any state to cleanup */
+ return false;
+ }
+
+ /*
+ * b) first encounter of a useable xl_running_xacts record. If we had found
+ * one earlier we would either track running transactions
+ * (i.e. builder->running.xcnt != 0) or be consistent (this function
+ * wouldn't get called)..
+ */
+ else if (!builder->running.xcnt)
+ {
+ /*
+ * We only care about toplevel xids as those are the ones we definitely
+ * see in the wal stream. As snapbuild.c tracks committed instead of
+ * running transactions we don't need to know anything about
+ * uncommitted subtransactions.
+ */
+ builder->xmin = running->oldestRunningXid;
+ builder->xmax = running->latestCompletedXid;
+ TransactionIdAdvance(builder->xmax);
+
+ /* so we can safely use the faster comparisons */
+ Assert(TransactionIdIsNormal(builder->xmin));
+ Assert(TransactionIdIsNormal(builder->xmax));
+
+ builder->running.xcnt = running->xcnt;
+ builder->running.xcnt_space = running->xcnt;
+ builder->running.xip =
+ MemoryContextAlloc(builder->context,
+ builder->running.xcnt * sizeof(TransactionId));
+ memcpy(builder->running.xip, running->xids,
+ builder->running.xcnt * sizeof(TransactionId));
+
+ /* sort so we can do a binary search */
+ qsort(builder->running.xip, builder->running.xcnt,
+ sizeof(TransactionId), xidComparator);
+
+ builder->running.xmin = builder->running.xip[0];
+ builder->running.xmax = builder->running.xip[running->xcnt - 1];
+
+ /* makes comparisons cheaper later */
+ TransactionIdRetreat(builder->running.xmin);
+ TransactionIdAdvance(builder->running.xmax);
+
+ builder->state = SNAPBUILD_FULL_SNAPSHOT;
+
+ elog(LOG, "found initial snapshot (xmin %u) due to running xacts, %u xacts need to finish",
+ builder->xmin, (uint32) builder->running.xcnt);
+
+ /* nothing could have built up so far */
+ return false;
+ }
+
+ /*
+ * We already started to track running xacts and need to wait for all
+ * in-progress ones to finish. We fall through to the normal processing of
+ * records so incremental cleanup can be performed.
+ */
+ return true;
+}
+
+
+/* -----------------------------------
+ * Snapshot serialization support
+ * -----------------------------------
+ */
+
+/*
+ * We store current state of struct SnapBuild on disk in the following manner:
+ *
+ * struct SnapBuildOnDisk;
+ * TransactionId * running.xcnt_space;
+ * TransactionId * committed.xcnt; (*not xcnt_space*)
+ *
+ */
+typedef struct SnapBuildOnDisk
+{
+ uint32 magic;
+ /* how large is the SnapBuildOnDisk including all data in state */
+ Size size;
+ SnapBuild builder;
+
+ /* XXX: Should we store a CRC32? */
+
+ /* variable amount of TransactionId's */
+} SnapBuildOnDisk;
+
+#define SNAPBUILD_MAGIC 0x51A1E001
+
+/*
+ * Store/Load a snapshot from disk, depending on the snapshot builder's state.
+ *
+ * Supposed to be used by external (i.e. not snapbuild.c) code that just reada
+ * record that's a potential location for a serialized snapshot.
+ */
+void
+SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn)
+{
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ SnapBuildRestore(builder, lsn);
+ else
+ SnapBuildSerialize(builder, lsn);
+}
+
+/*
+ * Serialize the snapshot 'builder' at the location 'lsn' if it hasn't already
+ * been done by another decoding process.
+ */
+static void
+SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
+{
+ Size needed_size;
+ SnapBuildOnDisk *ondisk;
+ char *ondisk_c;
+ int fd;
+ char tmppath[MAXPGPATH];
+ char path[MAXPGPATH];
+ int ret;
+ struct stat stat_buf;
+
+ needed_size = sizeof(SnapBuildOnDisk) +
+ sizeof(TransactionId) * builder->running.xcnt_space +
+ sizeof(TransactionId) * builder->committed.xcnt;
+
+ Assert(lsn != InvalidXLogRecPtr);
+ Assert(builder->last_serialized_snapshot == InvalidXLogRecPtr ||
+ builder->last_serialized_snapshot <= lsn);
+
+ /*
+ * no point in serializing if we cannot continue to work immediately after
+ * restoring the snapshot
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ return;
+
+ /*
+ * FIXME: Timeline handling/naming.
+ */
+
+ /*
+ * first check whether some other backend already has written the snapshot
+ * for this LSN. It's perfectly fine if there's none, so we accept ENOENT
+ * as a valid state. Everything else is an unexpected error.
+ */
+ sprintf(path, "pg_llog/snapshots/%X-%X.snap",
+ (uint32) (lsn >> 32), (uint32) lsn);
+
+ ret = stat(path, &stat_buf);
+
+ if (ret != 0 && errno != ENOENT)
+ ereport(ERROR, (errmsg("could not stat snapbuild state file %s", path)));
+ else if (ret == 0)
+ {
+ /*
+ * somebody else has already serialized to this point, don't overwrite
+ * but remember location, so we don't need to read old data again.
+ *
+ * FIXME: Is it safe to set this as restartpoint below? While we can
+ * see the file it's not guaranteed to persist after a crash...
+ */
+ builder->last_serialized_snapshot = lsn;
+ goto out;
+ }
+
+ /*
+ * there is an obvious race condition here between the time we stat(2) the
+ * file and us writing the file. But we rename the file into place
+ * atomically and all files created need to contain the same data anyway,
+ * so this is perfectly fine, although a bit of a resource waste. Locking
+ * seems like pointless complication.
+ */
+ elog(DEBUG1, "serializing snapshot to %s", path);
+
+ /* to make sure only we will write to this tempfile, include pid */
+ sprintf(tmppath, "pg_llog/snapshots/%X-%X.snap.%u.tmp",
+ (uint32) (lsn >> 32), (uint32) lsn, MyProcPid);
+
+ /*
+ * Unlink temporary file if it already exists, needs to have been before a
+ * crash/error since we won't enter this function twice from within a
+ * single decoding slot/backend and the temporary file contains the pid of
+ * the current process.
+ */
+ if (unlink(tmppath) != 0 && errno != ENOENT)
+ ereport(ERROR, (errmsg("could not unlink old snapbuild state file %s", path)));
+
+ ondisk = MemoryContextAllocZero(builder->context, needed_size);
+ ondisk_c = ((char *) ondisk) + sizeof(SnapBuildOnDisk);
+ ondisk->magic = SNAPBUILD_MAGIC;
+ ondisk->size = needed_size;
+
+ /* copy state per struct assignment, lalala lazy. */
+ ondisk->builder = *builder;
+
+ /* NULL-ify memory-only data */
+ ondisk->builder.context = NULL;
+ ondisk->builder.snapshot = NULL;
+ ondisk->builder.reorder = NULL;
+
+ /* copy running xacts */
+ memcpy(ondisk_c, builder->running.xip,
+ sizeof(TransactionId) * builder->running.xcnt_space);
+ ondisk_c += sizeof(TransactionId) * builder->running.xcnt_space;
+
+ /* copy committed xacts */
+ memcpy(ondisk_c, builder->committed.xip,
+ sizeof(TransactionId) * builder->committed.xcnt);
+ ondisk_c += sizeof(TransactionId) * builder->committed.xcnt;
+
+ /* we have valid data now, open tempfile and write it there */
+ fd = OpenTransientFile(tmppath,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+ if (fd < 0)
+ ereport(ERROR, (errmsg("could not open snapbuild state file %s for writing: %m", path)));
+
+ if ((write(fd, ondisk, needed_size)) != needed_size)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to snapbuild state file \"%s\": %m",
+ tmppath)));
+ }
+
+ /*
+ * fsync the file before renaming so that even if we crash after this we
+ * have either a fully valid file or nothing.
+ *
+ * TODO: Do the fsync() via checkpoints/restartpoints, doing it here has
+ * some noticeable overhead since it's performed synchronously during
+ * decoding?
+ */
+ if (pg_fsync(fd) != 0)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync snapbuild state file \"%s\": %m",
+ tmppath)));
+ }
+
+ CloseTransientFile(fd);
+
+ /*
+ * We may overwrite the work from some other backend, but that's ok, our
+ * snapshot is valid as well.
+ */
+ if (rename(tmppath, path) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename snapbuild state file from \"%s\" to \"%s\": %m",
+ tmppath, path)));
+ }
+
+ /* make sure we persist */
+ fsync_fname(path, false);
+ fsync_fname("pg_llog/snapshots", true);
+
+ /*
+ * now there's no way we loose the dumped state anymore, remember
+ * serialization point.
+ */
+ builder->last_serialized_snapshot = lsn;
+
+out:
+ ReorderBufferSetRestartPoint(builder->reorder,
+ builder->last_serialized_snapshot);
+}
+
+/*
+ * Restore a snapshot into 'builder' if previously one has been stored at the
+ * location indicated by 'lsn'. Returns true if successfull, false otherwise.
+ */
+static bool
+SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
+{
+ SnapBuildOnDisk ondisk;
+ int fd;
+ char path[MAXPGPATH];
+ Size sz;
+
+ /* no point in loading a snapshot if we're already there */
+ if (builder->state == SNAPBUILD_CONSISTENT)
+ return false;
+
+ sprintf(path, "pg_llog/snapshots/%X-%X.snap",
+ (uint32) (lsn >> 32), (uint32) lsn);
+
+ fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+
+ elog(LOG, "restoring snapbuild state from %s", path);
+
+ if (fd < 0 && errno == ENOENT)
+ return false;
+ else if (fd < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open snapbuild state file %s", path)));
+
+ elog(LOG, "really restoring from %s", path);
+
+ /* read statically sized portion of snapshot */
+ if (read(fd, &ondisk, sizeof(ondisk)) != sizeof(ondisk))
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read snapbuild file \"%s\": %m",
+ path)));
+ }
+
+ if (ondisk.magic != SNAPBUILD_MAGIC)
+ ereport(ERROR, (errmsg("snapbuild state file has wrong magic %u instead of %u",
+ ondisk.magic, SNAPBUILD_MAGIC)));
+
+ /* restore running xact information */
+ sz = sizeof(TransactionId) * ondisk.builder.running.xcnt_space;
+ ondisk.builder.running.xip = MemoryContextAlloc(builder->context, sz);
+ if (read(fd, ondisk.builder.running.xip, sz) != sz)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read running xacts from snapbuild file \"%s\": %m",
+ path)));
+ }
+
+ /* restore running xact information */
+ sz = sizeof(TransactionId) * ondisk.builder.committed.xcnt;
+ ondisk.builder.committed.xip = MemoryContextAlloc(builder->context, sz);
+ if (read(fd, ondisk.builder.committed.xip, sz) != sz)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read committed xacts from snapbuild file \"%s\": %m",
+ path)));
+ }
+
+ CloseTransientFile(fd);
+
+ /*
+ * ok, we now have a sensible snapshot here, figure out if it has more
+ * information than we have.
+ */
+
+ /*
+ * We are only interested in consistent snapshots for now, comparing
+ * whether one imcomplete snapshot is more "advanced" seems to be
+ * unnecessarily complex.
+ */
+ if (ondisk.builder.state < SNAPBUILD_CONSISTENT)
+ goto snapshot_not_interesting;
+
+ /*
+ * Don't use a snapshot that requires an xmin that we cannot guarantee to
+ * be available.
+ */
+ if (TransactionIdPrecedes(ondisk.builder.xmin, builder->initial_xmin_horizon))
+ goto snapshot_not_interesting;
+
+ /*
+ * XXX: transactions_after needs to be updated differently, to be checked
+ * here
+ */
+
+ /* ok, we think the snapshot is sensible, copy over everything important */
+ builder->xmin = ondisk.builder.xmin;
+ builder->xmax = ondisk.builder.xmax;
+ builder->state = ondisk.builder.state;
+
+ builder->committed.xcnt = ondisk.builder.committed.xcnt;
+ /* We only allocated/stored xcnt, not xcnt_space xids ! */
+ /* don't overwrite preallocated xip, if we don't have anything here */
+ if (builder->committed.xcnt > 0)
+ {
+ pfree(builder->committed.xip);
+ builder->committed.xcnt_space = ondisk.builder.committed.xcnt;
+ builder->committed.xip = ondisk.builder.committed.xip;
+ }
+ ondisk.builder.committed.xip = NULL;
+
+ builder->running.xcnt = ondisk.builder.committed.xcnt;
+ if (builder->running.xip)
+ pfree(builder->running.xip);
+ builder->running.xcnt_space = ondisk.builder.committed.xcnt_space;
+ builder->running.xip = ondisk.builder.running.xip;
+
+ /* our snapshot is not interesting anymore, build a new one */
+ if (builder->snapshot != NULL)
+ {
+ SnapBuildSnapDecRefcount(builder->snapshot);
+ }
+ builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidTransactionId);
+ SnapBuildSnapIncRefcount(builder->snapshot);
+
+ ReorderBufferSetRestartPoint(builder->reorder, lsn);
+
+ Assert(builder->state == SNAPBUILD_CONSISTENT);
+ elog(LOG, "recovered initial snapshot (xmin %u) from disk", builder->xmin);
+
+ return true;
+
+snapshot_not_interesting:
+ if (ondisk.builder.running.xip != NULL)
+ pfree(ondisk.builder.running.xip);
+ if (ondisk.builder.committed.xip != NULL)
+ pfree(ondisk.builder.committed.xip);
+ return false;
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 8c83780..0d64156 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -65,7 +65,7 @@ Node *replication_parse_result;
}
/* Non-keyword tokens */
-%token <str> SCONST
+%token <str> SCONST IDENT
%token <uintval> UCONST
%token <recptr> RECPTR
@@ -73,6 +73,9 @@ Node *replication_parse_result;
%token K_BASE_BACKUP
%token K_IDENTIFY_SYSTEM
%token K_START_REPLICATION
+%token K_INIT_LOGICAL_REPLICATION
+%token K_START_LOGICAL_REPLICATION
+%token K_FREE_LOGICAL_REPLICATION
%token K_TIMELINE_HISTORY
%token K_LABEL
%token K_PROGRESS
@@ -82,10 +85,13 @@ Node *replication_parse_result;
%token K_TIMELINE
%type <node> command
-%type <node> base_backup start_replication identify_system timeline_history
+%type <node> base_backup start_replication start_logical_replication init_logical_replication free_logical_replication identify_system timeline_history
%type <list> base_backup_opt_list
%type <defelt> base_backup_opt
%type <uintval> opt_timeline
+%type <list> plugin_options plugin_opt_list
+%type <defelt> plugin_opt_elem
+%type <node> plugin_opt_arg
%%
firstcmd: command opt_semicolon
@@ -102,6 +108,9 @@ command:
identify_system
| base_backup
| start_replication
+ | init_logical_replication
+ | start_logical_replication
+ | free_logical_replication
| timeline_history
;
@@ -186,6 +195,67 @@ opt_timeline:
| /* nothing */ { $$ = 0; }
;
+init_logical_replication:
+ K_INIT_LOGICAL_REPLICATION IDENT IDENT
+ {
+ InitLogicalReplicationCmd *cmd;
+ cmd = makeNode(InitLogicalReplicationCmd);
+ cmd->name = $2;
+ cmd->plugin = $3;
+ $$ = (Node *) cmd;
+ }
+ ;
+
+start_logical_replication:
+ K_START_LOGICAL_REPLICATION IDENT RECPTR plugin_options
+ {
+ StartLogicalReplicationCmd *cmd;
+ cmd = makeNode(StartLogicalReplicationCmd);
+ cmd->name = $2;
+ cmd->startpoint = $3;
+ cmd->options = $4;
+ $$ = (Node *) cmd;
+ }
+ ;
+
+plugin_options:
+ '(' plugin_opt_list ')' { $$ = $2; }
+ | /* EMPTY */ { $$ = NIL; }
+ ;
+
+plugin_opt_list:
+ plugin_opt_elem
+ {
+ $$ = list_make1($1);
+ }
+ | plugin_opt_list ',' plugin_opt_elem
+ {
+ $$ = lappend($1, $3);
+ }
+ ;
+
+plugin_opt_elem:
+ IDENT plugin_opt_arg
+ {
+ $$ = makeDefElem($1, $2);
+ }
+ ;
+
+plugin_opt_arg:
+ SCONST { $$ = (Node *) makeString($1); }
+ | /* EMPTY */ { $$ = NULL; }
+ ;
+
+free_logical_replication:
+ K_FREE_LOGICAL_REPLICATION IDENT
+ {
+ FreeLogicalReplicationCmd *cmd;
+ cmd = makeNode(FreeLogicalReplicationCmd);
+ cmd->name = $2;
+ $$ = (Node *) cmd;
+ }
+ ;
+
/*
* TIMELINE_HISTORY %d
*/
@@ -205,6 +275,7 @@ timeline_history:
$$ = (Node *) cmd;
}
;
+
%%
#include "repl_scanner.c"
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 3d930f1..2b0f2ff 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "utils/builtins.h"
+#include "parser/scansup.h"
/* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
#undef fprintf
@@ -48,7 +49,7 @@ static void addlitchar(unsigned char ychar);
%option warn
%option prefix="replication_yy"
-%x xq
+%x xq xd
/* Extended quote
* xqdouble implements embedded quote, ''''
@@ -57,12 +58,26 @@ xqstart {quote}
xqdouble {quote}{quote}
xqinside [^']+
+/* Double quote
+ * Allows embedded spaces and other special characters into identifiers.
+ */
+dquote \"
+xdstart {dquote}
+xdstop {dquote}
+xddouble {dquote}{dquote}
+xdinside [^"]+
+
digit [0-9]+
hexdigit [0-9A-Za-z]+
quote '
quotestop {quote}
+ident_start [A-Za-z\200-\377_]
+ident_cont [A-Za-z\200-\377_0-9\$]
+
+identifier {ident_start}{ident_cont}*
+
%%
BASE_BACKUP { return K_BASE_BACKUP; }
@@ -74,9 +89,14 @@ PROGRESS { return K_PROGRESS; }
WAL { return K_WAL; }
TIMELINE { return K_TIMELINE; }
START_REPLICATION { return K_START_REPLICATION; }
+INIT_LOGICAL_REPLICATION { return K_INIT_LOGICAL_REPLICATION; }
+START_LOGICAL_REPLICATION { return K_START_LOGICAL_REPLICATION; }
+FREE_LOGICAL_REPLICATION { return K_FREE_LOGICAL_REPLICATION; }
TIMELINE_HISTORY { return K_TIMELINE_HISTORY; }
"," { return ','; }
";" { return ';'; }
+"(" { return '('; }
+")" { return ')'; }
[\n] ;
[\t] ;
@@ -100,20 +120,49 @@ TIMELINE_HISTORY { return K_TIMELINE_HISTORY; }
BEGIN(xq);
startlit();
}
+
<xq>{quotestop} {
yyless(1);
BEGIN(INITIAL);
yylval.str = litbufdup();
return SCONST;
}
-<xq>{xqdouble} {
+
+<xq>{xqdouble} {
addlitchar('\'');
}
+
<xq>{xqinside} {
addlit(yytext, yyleng);
}
-<xq><<EOF>> { yyerror("unterminated quoted string"); }
+{xdstart} {
+ BEGIN(xd);
+ startlit();
+ }
+
+<xd>{xdstop} {
+ int len;
+ yyless(1);
+ BEGIN(INITIAL);
+ yylval.str = litbufdup();
+ len = strlen(yylval.str);
+ truncate_identifier(yylval.str, len, true);
+ return IDENT;
+ }
+
+<xd>{xdinside} {
+ addlit(yytext, yyleng);
+ }
+
+{identifier} {
+ int len = strlen(yytext);
+
+ yylval.str = downcase_truncate_identifier(yytext, len, true);
+ return IDENT;
+ }
+
+<xq,xd><<EOF>> { yyerror("unterminated quoted string"); }
<<EOF>> {
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 413f0b9..e73f566 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1137,7 +1137,7 @@ XLogWalRcvSendHSFeedback(bool immed)
* everything else has been checked.
*/
if (hot_standby_feedback)
- xmin = GetOldestXmin(true, false);
+ xmin = GetOldestXmin(true, true, false, false);
else
xmin = InvalidTransactionId;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index b00a91a..2187d96 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -45,9 +45,8 @@
#include "access/timeline.h"
#include "access/transam.h"
-#include "access/xlog_internal.h"
#include "access/xact.h"
-
+#include "access/xlog_internal.h"
#include "catalog/pg_type.h"
#include "commands/dbcommands.h"
#include "funcapi.h"
@@ -56,6 +55,10 @@
#include "miscadmin.h"
#include "nodes/replnodes.h"
#include "replication/basebackup.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+#include "replication/snapbuild.h"
#include "replication/syncrep.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
@@ -157,6 +160,9 @@ static bool ping_sent = false;
static bool streamingDoneSending;
static bool streamingDoneReceiving;
+/* Are we there yet? */
+static bool WalSndCaughtUp = false;
+
/* Flags set by signal handlers for later service in main loop */
static volatile sig_atomic_t got_SIGHUP = false;
static volatile sig_atomic_t walsender_ready_to_stop = false;
@@ -169,24 +175,42 @@ static volatile sig_atomic_t walsender_ready_to_stop = false;
*/
static volatile sig_atomic_t replication_active = false;
+/* XXX reader */
+static MemoryContext decoding_ctx = NULL;
+static MemoryContext old_decoding_ctx = NULL;
+
+static LogicalDecodingContext *logical_decoding_ctx = NULL;
+static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
+
/* Signal handlers */
static void WalSndSigHupHandler(SIGNAL_ARGS);
static void WalSndXLogSendHandler(SIGNAL_ARGS);
static void WalSndLastCycleHandler(SIGNAL_ARGS);
/* Prototypes for private functions */
-static void WalSndLoop(void);
+typedef void (*WalSndSendData)(void);
+static void WalSndLoop(WalSndSendData send_data);
static void InitWalSenderSlot(void);
static void WalSndKill(int code, Datum arg);
-static void XLogSend(bool *caughtup);
+static void XLogSendPhysical(void);
+static void XLogSendLogical(void);
+static void WalSndDone(WalSndSendData send_data);
static XLogRecPtr GetStandbyFlushRecPtr(void);
static void IdentifySystem(void);
static void StartReplication(StartReplicationCmd *cmd);
+static void InitLogicalReplication(InitLogicalReplicationCmd *cmd);
+static void StartLogicalReplication(StartLogicalReplicationCmd *cmd);
+static void FreeLogicalReplication(FreeLogicalReplicationCmd *cmd);
static void ProcessStandbyMessage(void);
static void ProcessStandbyReplyMessage(void);
static void ProcessStandbyHSFeedbackMessage(void);
static void ProcessRepliesIfAny(void);
static void WalSndKeepalive(bool requestReply);
+static void WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
+static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
+static void XLogRead(char *buf, XLogRecPtr startptr, Size count);
+
+
/* Initialize walsender process before entering the main command loop */
@@ -247,14 +271,13 @@ IdentifySystem(void)
char tli[11];
char xpos[MAXFNAMELEN];
XLogRecPtr logptr;
- char* dbname = NULL;
+ char *dbname = NULL;
/*
* Reply with a result set with one row, four columns. First col is system
* ID, second is timeline ID, third is current xlog location and the fourth
* contains the database name if we are connected to one.
*/
-
snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
GetSystemIdentifier());
@@ -308,22 +331,22 @@ IdentifySystem(void)
pq_sendint(&buf, 0, 2); /* format code */
/* third field */
- pq_sendstring(&buf, "xlogpos");
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
- pq_sendint(&buf, TEXTOID, 4);
- pq_sendint(&buf, -1, 2);
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
+ pq_sendstring(&buf, "xlogpos"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
/* fourth field */
- pq_sendstring(&buf, "dbname");
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
- pq_sendint(&buf, TEXTOID, 4);
- pq_sendint(&buf, -1, 2);
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
+ pq_sendstring(&buf, "dbname"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
pq_endmessage(&buf);
/* Send a DataRow message */
@@ -335,9 +358,16 @@ IdentifySystem(void)
pq_sendbytes(&buf, (char *) tli, strlen(tli));
pq_sendint(&buf, strlen(xpos), 4); /* col3 len */
pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
- pq_sendint(&buf, strlen(dbname), 4); /* col4 len */
- pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
-
+ /* send NULL if not connected to a database */
+ if (dbname)
+ {
+ pq_sendint(&buf, strlen(dbname), 4); /* col4 len */
+ pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
+ }
+ else
+ {
+ pq_sendint(&buf, -1, 4); /* col4 len */
+ }
pq_endmessage(&buf);
}
@@ -586,7 +616,7 @@ StartReplication(StartReplicationCmd *cmd)
/* Main loop of walsender */
replication_active = true;
- WalSndLoop();
+ WalSndLoop(XLogSendPhysical);
replication_active = false;
if (walsender_ready_to_stop)
@@ -653,6 +683,497 @@ StartReplication(StartReplicationCmd *cmd)
pq_puttextmessage('C', "START_STREAMING");
}
+static int
+replay_read_page(XLogReaderState* state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetRecPtr, char* cur_page, TimeLineID *pageTLI)
+{
+ XLogRecPtr flushptr;
+ int count;
+
+ flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
+
+ /* more than one block available */
+ if (targetPagePtr + XLOG_BLCKSZ <= flushptr)
+ count = XLOG_BLCKSZ;
+ /* not enough data there */
+ else if (targetPagePtr + reqLen > flushptr)
+ return -1;
+ /* part of the page available */
+ else
+ count = flushptr - targetPagePtr;
+
+ /* FIXME: more sensible/efficient implementation */
+ XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+
+ return count;
+}
+
+/*
+ * Initialize logical replication and wait for an initial consistent point to
+ * start sending changes from.
+ */
+static void
+InitLogicalReplication(InitLogicalReplicationCmd *cmd)
+{
+ const char *slot_name;
+ StringInfoData buf;
+ char xpos[MAXFNAMELEN];
+ const char *snapshot_name = NULL;
+ LogicalDecodingContext *ctx;
+ XLogRecPtr startptr;
+
+ CheckLogicalReplicationRequirements();
+
+ Assert(!MyLogicalDecodingSlot);
+
+ /* XXX apply sanity checking to slot name? */
+ LogicalDecodingAcquireFreeSlot(cmd->name, cmd->plugin);
+
+ Assert(MyLogicalDecodingSlot);
+
+ decoding_ctx = AllocSetContextCreate(TopMemoryContext,
+ "decoding context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ old_decoding_ctx = MemoryContextSwitchTo(decoding_ctx);
+
+ /* setup state for XLogReadPage */
+ sendTimeLineIsHistoric = false;
+ sendTimeLine = ThisTimeLineID;
+
+ initStringInfo(&output_message);
+ ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false, InvalidXLogRecPtr,
+ NIL, replay_read_page,
+ WalSndPrepareWrite, WalSndWriteData);
+
+ MemoryContextSwitchTo(old_decoding_ctx);
+
+ startptr = MyLogicalDecodingSlot->restart_decoding;
+
+ elog(WARNING, "Initiating logical rep from %X/%X",
+ (uint32)(startptr >> 32), (uint32)startptr);
+
+ for (;;)
+ {
+ XLogRecord *record;
+ XLogRecordBuffer buf;
+ char *err = NULL;
+
+ /* the read_page callback waits for new WAL */
+ record = XLogReadRecord(ctx->reader, startptr, &err);
+ /* xlog record was invalid */
+ if (err)
+ elog(ERROR, "%s", err);
+
+ /* read up from last position next time round */
+ startptr = InvalidXLogRecPtr;
+
+ Assert(record);
+
+ buf.origptr = ctx->reader->ReadRecPtr;
+ buf.endptr = ctx->reader->EndRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+ DecodeRecordIntoReorderBuffer(ctx, &buf);
+
+ /* only continue till we found a consistent spot */
+ if (LogicalDecodingContextReady(ctx))
+ {
+ /* export plain, importable, snapshot to the user */
+ snapshot_name = SnapBuildExportSnapshot(ctx->snapshot_builder);
+ break;
+ }
+ }
+
+ MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+ slot_name = NameStr(MyLogicalDecodingSlot->name);
+ snprintf(xpos, sizeof(xpos), "%X/%X",
+ (uint32) (MyLogicalDecodingSlot->confirmed_flush >> 32),
+ (uint32) MyLogicalDecodingSlot->confirmed_flush);
+
+ pq_beginmessage(&buf, 'T');
+ pq_sendint(&buf, 4, 2); /* 4 fields */
+
+ /* first field */
+ pq_sendstring(&buf, "replication_id"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_sendstring(&buf, "consistent_point"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_sendstring(&buf, "snapshot_name"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_sendstring(&buf, "plugin"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_endmessage(&buf);
+
+ /* Send a DataRow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint(&buf, 4, 2); /* # of columns */
+
+ /* replication_id */
+ pq_sendint(&buf, strlen(slot_name), 4); /* col1 len */
+ pq_sendbytes(&buf, slot_name, strlen(slot_name));
+
+ /* consistent wal location */
+ pq_sendint(&buf, strlen(xpos), 4); /* col2 len */
+ pq_sendbytes(&buf, xpos, strlen(xpos));
+
+ /* snapshot name */
+ pq_sendint(&buf, strlen(snapshot_name), 4); /* col3 len */
+ pq_sendbytes(&buf, snapshot_name, strlen(snapshot_name));
+
+ /* plugin */
+ pq_sendint(&buf, strlen(cmd->plugin), 4); /* col4 len */
+ pq_sendbytes(&buf, cmd->plugin, strlen(cmd->plugin));
+
+ pq_endmessage(&buf);
+
+ /*
+ * release active status again, START_LOGICAL_REPLICATION will reacquire it
+ */
+ LogicalDecodingReleaseSlot();
+}
+
+/*
+ * Load previously initiated logical slot and prepare for sending data (via
+ * WalSndLoop).
+ */
+static void
+StartLogicalReplication(StartLogicalReplicationCmd *cmd)
+{
+ StringInfoData buf;
+ XLogRecPtr confirmed_flush;
+
+ elog(WARNING, "Starting logical replication from %x/%x",
+ (uint32)(cmd->startpoint >> 32), (uint32)cmd->startpoint);
+
+ /* make sure that our requirements are still fulfilled */
+ CheckLogicalReplicationRequirements();
+
+ Assert(!MyLogicalDecodingSlot);
+
+ LogicalDecodingReAcquireSlot(cmd->name);
+
+ if (am_cascading_walsender && !RecoveryInProgress())
+ {
+ ereport(LOG,
+ (errmsg("terminating walsender process to force cascaded standby to update timeline and reconnect")));
+ walsender_ready_to_stop = true;
+ }
+
+ WalSndSetState(WALSNDSTATE_CATCHUP);
+
+ /* Send a CopyBothResponse message, and start streaming */
+ pq_beginmessage(&buf, 'W');
+ pq_sendbyte(&buf, 0);
+ pq_sendint(&buf, 0, 2);
+ pq_endmessage(&buf);
+ pq_flush();
+
+ /* setup state for XLogReadPage */
+ sendTimeLineIsHistoric = false;
+ sendTimeLine = ThisTimeLineID;
+
+ confirmed_flush = MyLogicalDecodingSlot->confirmed_flush;
+
+ Assert(confirmed_flush != InvalidXLogRecPtr);
+
+ /* continue from last position */
+ if (cmd->startpoint == InvalidXLogRecPtr)
+ cmd->startpoint = MyLogicalDecodingSlot->confirmed_flush;
+ else if (cmd->startpoint > MyLogicalDecodingSlot->confirmed_flush)
+ elog(ERROR, "cannot stream from %X/%X, minimum is %X/%X",
+ (uint32)(cmd->startpoint >> 32), (uint32)cmd->startpoint,
+ (uint32)(confirmed_flush >> 32), (uint32)confirmed_flush);
+
+ /*
+ * Initialize position to the last ack'ed one, then the xlog records begin
+ * to be shipped from that position.
+ */
+ logical_decoding_ctx = CreateLogicalDecodingContext(
+ MyLogicalDecodingSlot, false, cmd->startpoint, cmd->options,
+ replay_read_page, WalSndPrepareWrite, WalSndWriteData);
+
+ /*
+ * XXX: For feedback purposes it would be nicer to set sentPtr to
+ * cmd->startpoint, but we use it to know where to read xlog in the main
+ * loop...
+ */
+ sentPtr = MyLogicalDecodingSlot->restart_decoding;
+ logical_startptr = sentPtr;
+
+ /* Also update the start position status in shared memory */
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = MyWalSnd;
+
+ SpinLockAcquire(&walsnd->mutex);
+ walsnd->sentPtr = MyLogicalDecodingSlot->restart_decoding;
+ SpinLockRelease(&walsnd->mutex);
+ }
+
+ elog(LOG, "starting to decode from %X/%X, replay %X/%X",
+ (uint32)(MyWalSnd->sentPtr >> 32), (uint32)MyWalSnd->sentPtr,
+ (uint32)(cmd->startpoint >> 32), (uint32)cmd->startpoint);
+
+ replication_active = true;
+
+ SyncRepInitConfig();
+
+ /* Main loop of walsender */
+ WalSndLoop(XLogSendLogical);
+
+ LogicalDecodingReleaseSlot();
+
+ replication_active = false;
+ if (walsender_ready_to_stop)
+ proc_exit(0);
+ WalSndSetState(WALSNDSTATE_STARTUP);
+
+ /* Get out of COPY mode (CommandComplete). */
+ EndCommand("COPY 0", DestRemote);
+}
+
+/*
+ * Free permanent state by a now inactive but defined logical slot.
+ */
+static void
+FreeLogicalReplication(FreeLogicalReplicationCmd *cmd)
+{
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingFreeSlot(cmd->name);
+ EndCommand("FREE_LOGICAL_REPLICATION", DestRemote);
+}
+
+/*
+ * LogicalDecodingContext 'prepare_write' callback.
+ *
+ * Prepare a write into a StringInfo.
+ *
+ * Don't do anything lasting in here, it's quite possible that nothing will done
+ * with the data.
+ */
+static void
+WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ AssertVariableIsOfType(&WalSndPrepareWrite, LogicalOutputPluginWriterPrepareWrite);
+
+ resetStringInfo(ctx->out);
+
+ pq_sendbyte(ctx->out, 'w');
+ pq_sendint64(ctx->out, lsn); /* dataStart */
+ /* XXX: overwrite when data is assembled */
+ pq_sendint64(ctx->out, lsn); /* walEnd */
+ /* XXX: gather that value later just as it's done in XLogSendPhysical */
+ pq_sendint64(ctx->out, 0 /*GetCurrentIntegerTimestamp() */);/* sendtime */
+}
+
+/*
+ * LogicalDecodingContext 'write' callback.
+ *
+ * Actually write out data previously prepared by WalSndPrepareWrite out to the
+ * network, take as long as needed but process replies from the other side
+ * during that.
+ */
+static void
+WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ AssertVariableIsOfType(&WalSndWriteData, LogicalOutputPluginWriterWrite);
+
+ /* output previously gathered data in a CopyData packet */
+ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
+
+ /* fast path */
+ /* Try to flush pending output to the client */
+ if (pq_flush_if_writable() != 0)
+ return;
+
+ if (!pq_is_send_pending())
+ return;
+
+ for (;;)
+ {
+ int wakeEvents;
+ long sleeptime = 10000; /* 10s */
+
+ /*
+ * Emergency bailout if postmaster has died. This is to avoid the
+ * necessity for manual cleanup of all postmaster children.
+ */
+ if (!PostmasterIsAlive())
+ exit(1);
+
+ /* Process any requests or signals received recently */
+ if (got_SIGHUP)
+ {
+ got_SIGHUP = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ SyncRepInitConfig();
+ }
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Check for input from the client */
+ ProcessRepliesIfAny();
+
+ /* Clear any already-pending wakeups */
+ ResetLatch(&MyWalSnd->latch);
+
+ /* Try to flush pending output to the client */
+ if (pq_flush_if_writable() != 0)
+ break;
+
+ /* If we finished clearing the buffered data, we're done here. */
+ if (!pq_is_send_pending())
+ break;
+
+ /*
+ * Note we don't set a timeout here. It would be pointless, because
+ * if the socket is not writable there's not much we can do elsewhere
+ * anyway.
+ */
+ wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+ WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
+
+ ImmediateInterruptOK = true;
+ CHECK_FOR_INTERRUPTS();
+ WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
+ MyProcPort->sock, sleeptime);
+ ImmediateInterruptOK = false;
+ }
+
+ /* reactivate latch so WalSndLoop knows to continue */
+ SetLatch(&MyWalSnd->latch);
+}
+
+/*
+ * Wait till WAL < loc is flushed to disk so it can be safely read.
+ */
+XLogRecPtr
+WalSndWaitForWal(XLogRecPtr loc)
+{
+ int wakeEvents;
+ XLogRecPtr flushptr;
+
+ /* fast path if everything is there already */
+ /*
+ * XXX: introduce RecentFlushPtr to avoid acquiring the spinlock in the
+ * fast path case where we already know we have enough WAL available.
+ */
+ flushptr = GetFlushRecPtr();
+ if (loc <= flushptr)
+ return flushptr;
+
+ for (;;)
+ {
+ long sleeptime = 10000; /* 10 s */
+
+ /*
+ * Emergency bailout if postmaster has died. This is to avoid the
+ * necessity for manual cleanup of all postmaster children.
+ */
+ if (!PostmasterIsAlive())
+ exit(1);
+
+ /* Process any requests or signals received recently */
+ if (got_SIGHUP)
+ {
+ got_SIGHUP = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ SyncRepInitConfig();
+ }
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Check for input from the client */
+ ProcessRepliesIfAny();
+
+ /* Clear any already-pending wakeups */
+ ResetLatch(&MyWalSnd->latch);
+
+ /* Update our idea of flushed position. */
+ flushptr = GetFlushRecPtr();
+
+ /* If postmaster asked us to stop, don't wait here anymore */
+ if (walsender_ready_to_stop)
+ break;
+
+ /* check whether we're done */
+ if (loc <= flushptr)
+ break;
+
+ /* Determine time until replication timeout */
+ if (wal_sender_timeout > 0)
+ {
+ if (!ping_sent)
+ {
+ TimestampTz timeout;
+
+ /*
+ * If half of wal_sender_timeout has lapsed without receiving
+ * any reply from standby, send a keep-alive message to standby
+ * requesting an immediate reply.
+ */
+ timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
+ wal_sender_timeout / 2);
+ if (GetCurrentTimestamp() >= timeout)
+ {
+ WalSndKeepalive(true);
+ ping_sent = true;
+ /* Try to flush pending output to the client */
+ if (pq_flush_if_writable() != 0)
+ break;
+ }
+ }
+
+ sleeptime = 1 + (wal_sender_timeout / 10);
+ }
+
+ wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+ WL_SOCKET_READABLE | WL_TIMEOUT;
+
+ ImmediateInterruptOK = true;
+ CHECK_FOR_INTERRUPTS();
+ WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
+ MyProcPort->sock, sleeptime);
+ ImmediateInterruptOK = false;
+
+ /*
+ * The equivalent code in WalSndLoop checks here that replication
+ * timeout hasn't been exceeded. We don't do that here. XXX explain
+ * why.
+ */
+ }
+
+ /* reactivate latch so WalSndLoop knows to continue */
+ SetLatch(&MyWalSnd->latch);
+ return flushptr;
+}
+
/*
* Execute an incoming replication command.
*/
@@ -664,6 +1185,12 @@ exec_replication_command(const char *cmd_string)
MemoryContext cmd_context;
MemoryContext old_context;
+ /*
+ * INIT_LOGICAL_REPLICATION exports a snapshot until the next command
+ * arrives. Clean up the old stuff if there's anything.
+ */
+ SnapBuildClearExportedSnapshot();
+
elog(DEBUG1, "received replication command: %s", cmd_string);
CHECK_FOR_INTERRUPTS();
@@ -695,6 +1222,18 @@ exec_replication_command(const char *cmd_string)
StartReplication((StartReplicationCmd *) cmd_node);
break;
+ case T_InitLogicalReplicationCmd:
+ InitLogicalReplication((InitLogicalReplicationCmd *) cmd_node);
+ break;
+
+ case T_StartLogicalReplicationCmd:
+ StartLogicalReplication((StartLogicalReplicationCmd *) cmd_node);
+ break;
+
+ case T_FreeLogicalReplicationCmd:
+ FreeLogicalReplication((FreeLogicalReplicationCmd *) cmd_node);
+ break;
+
case T_BaseBackupCmd:
SendBaseBackup((BaseBackupCmd *) cmd_node);
break;
@@ -904,6 +1443,12 @@ ProcessStandbyReplyMessage(void)
SpinLockRelease(&walsnd->mutex);
}
+ /*
+ * Advance our local xmin horizon when the client confirmed a flush.
+ */
+ if (MyLogicalDecodingSlot && flushPtr != InvalidXLogRecPtr)
+ LogicalConfirmReceivedLocation(flushPtr);
+
if (!am_cascading_walsender)
SyncRepReleaseWaiters();
}
@@ -988,10 +1533,8 @@ ProcessStandbyHSFeedbackMessage(void)
/* Main loop of walsender process that streams the WAL over Copy messages. */
static void
-WalSndLoop(void)
+WalSndLoop(WalSndSendData send_data)
{
- bool caughtup = false;
-
/*
* Allocate buffers that will be used for each outgoing and incoming
* message. We do this just once to reduce palloc overhead.
@@ -1043,21 +1586,21 @@ WalSndLoop(void)
/*
* If we don't have any pending data in the output buffer, try to send
- * some more. If there is some, we don't bother to call XLogSend
+ * some more. If there is some, we don't bother to call send_data
* again until we've flushed it ... but we'd better assume we are not
* caught up.
*/
if (!pq_is_send_pending())
- XLogSend(&caughtup);
+ send_data();
else
- caughtup = false;
+ WalSndCaughtUp = false;
/* Try to flush pending output to the client */
if (pq_flush_if_writable() != 0)
goto send_failure;
/* If nothing remains to be sent right now ... */
- if (caughtup && !pq_is_send_pending())
+ if (WalSndCaughtUp && !pq_is_send_pending())
{
/*
* If we're in catchup state, move to streaming. This is an
@@ -1083,29 +1626,17 @@ WalSndLoop(void)
* the walsender is not sure which.
*/
if (walsender_ready_to_stop)
- {
- /* ... let's just be real sure we're caught up ... */
- XLogSend(&caughtup);
- if (caughtup && sentPtr == MyWalSnd->flush &&
- !pq_is_send_pending())
- {
- /* Inform the standby that XLOG streaming is done */
- EndCommand("COPY 0", DestRemote);
- pq_flush();
-
- proc_exit(0);
- }
- }
+ WalSndDone(send_data);
}
/*
* We don't block if not caught up, unless there is unsent data
* pending in which case we'd better block until the socket is
- * write-ready. This test is only needed for the case where XLogSend
+ * write-ready. This test is only needed for the case where send_data
* loaded a subset of the available data but then pq_flush_if_writable
* flushed it all --- we should immediately try to send more.
*/
- if ((caughtup && !streamingDoneSending) || pq_is_send_pending())
+ if ((WalSndCaughtUp && !streamingDoneSending) || pq_is_send_pending())
{
TimestampTz timeout = 0;
long sleeptime = 10000; /* 10 s */
@@ -1434,15 +1965,17 @@ retry:
}
/*
+ * Send out the WAL in its normal physical/stored form.
+ *
* Read up to MAX_SEND_SIZE bytes of WAL that's been flushed to disk,
* but not yet sent to the client, and buffer it in the libpq output
* buffer.
*
- * If there is no unsent WAL remaining, *caughtup is set to true, otherwise
- * *caughtup is set to false.
+ * If there is no unsent WAL remaining, WalSndCaughtUp is set to true,
+ * otherwise WalSndCaughtUp is set to false.
*/
static void
-XLogSend(bool *caughtup)
+XLogSendPhysical(void)
{
XLogRecPtr SendRqstPtr;
XLogRecPtr startptr;
@@ -1451,7 +1984,7 @@ XLogSend(bool *caughtup)
if (streamingDoneSending)
{
- *caughtup = true;
+ WalSndCaughtUp = true;
return;
}
@@ -1568,7 +2101,7 @@ XLogSend(bool *caughtup)
pq_putmessage_noblock('c', NULL, 0);
streamingDoneSending = true;
- *caughtup = true;
+ WalSndCaughtUp = true;
elog(DEBUG1, "walsender reached end of timeline at %X/%X (sent up to %X/%X)",
(uint32) (sendTimeLineValidUpto >> 32), (uint32) sendTimeLineValidUpto,
@@ -1580,7 +2113,7 @@ XLogSend(bool *caughtup)
Assert(sentPtr <= SendRqstPtr);
if (SendRqstPtr <= sentPtr)
{
- *caughtup = true;
+ WalSndCaughtUp = true;
return;
}
@@ -1604,15 +2137,15 @@ XLogSend(bool *caughtup)
{
endptr = SendRqstPtr;
if (sendTimeLineIsHistoric)
- *caughtup = false;
+ WalSndCaughtUp = false;
else
- *caughtup = true;
+ WalSndCaughtUp = true;
}
else
{
/* round down to page boundary. */
endptr -= (endptr % XLOG_BLCKSZ);
- *caughtup = false;
+ WalSndCaughtUp = false;
}
nbytes = endptr - startptr;
@@ -1673,6 +2206,96 @@ XLogSend(bool *caughtup)
}
/*
+ * Send out the WAL after it being decoded into a logical format by the output
+ * plugin specified in INIT_LOGICAL_DECODING
+ */
+static void
+XLogSendLogical(void)
+{
+ XLogRecord *record;
+ char *errm;
+
+ if (decoding_ctx == NULL)
+ {
+ decoding_ctx = AllocSetContextCreate(TopMemoryContext,
+ "decoding context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ }
+
+ record = XLogReadRecord(logical_decoding_ctx->reader, logical_startptr, &errm);
+ logical_startptr = InvalidXLogRecPtr;
+
+ /* xlog record was invalid */
+ if (errm != NULL)
+ elog(ERROR, "%s", errm);
+
+ if (record != NULL)
+ {
+ XLogRecordBuffer buf;
+
+ buf.origptr = logical_decoding_ctx->reader->ReadRecPtr;
+ buf.endptr = logical_decoding_ctx->reader->EndRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+
+ old_decoding_ctx = MemoryContextSwitchTo(decoding_ctx);
+
+ DecodeRecordIntoReorderBuffer(logical_decoding_ctx, &buf);
+
+ MemoryContextSwitchTo(old_decoding_ctx);
+
+ /*
+ * If the record we just read is at or beyond the flushed point, then
+ * we're caught up.
+ */
+ WalSndCaughtUp =
+ logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr();
+ }
+ else
+ /*
+ * xlogreader failed, and no error was reported? we must be caught up.
+ */
+ WalSndCaughtUp = true;
+
+ /* Update shared memory status */
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = MyWalSnd;
+
+ SpinLockAcquire(&walsnd->mutex);
+ walsnd->sentPtr = logical_decoding_ctx->reader->ReadRecPtr;
+ SpinLockRelease(&walsnd->mutex);
+ }
+}
+
+/*
+ * The sender is caught up, so we can go away for shutdown processing
+ * to finish normally. (This should only be called when the shutdown
+ * signal has been received from postmaster.)
+ *
+ * Note that if while doing this we determine that there's still more
+ * data to send, this function will return control to the caller.
+ */
+static void
+WalSndDone(WalSndSendData send_data)
+{
+ /* ... let's just be real sure we're caught up ... */
+ send_data();
+
+ if (WalSndCaughtUp && sentPtr == MyWalSnd->flush &&
+ !pq_is_send_pending())
+ {
+ /* Inform the standby that XLOG streaming is done */
+ EndCommand("COPY 0", DestRemote);
+ pq_flush();
+
+ proc_exit(0);
+ }
+}
+
+/*
* Returns the latest point in WAL that has been safely flushed to disk, and
* can be sent to the standby. This should only be called when in recovery,
* ie. we're streaming to a cascaded standby.
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index a0b741b..71d8f04 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
#include "postmaster/bgworker_internals.h"
#include "postmaster/bgwriter.h"
#include "postmaster/postmaster.h"
+#include "replication/logical.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "storage/bufmgr.h"
@@ -124,6 +125,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
size = add_size(size, ProcSignalShmemSize());
size = add_size(size, CheckpointerShmemSize());
size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, LogicalDecodingShmemSize());
size = add_size(size, WalSndShmemSize());
size = add_size(size, WalRcvShmemSize());
size = add_size(size, BTreeShmemSize());
@@ -230,6 +232,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
ProcSignalShmemInit();
CheckpointerShmemInit();
AutoVacuumShmemInit();
+ LogicalDecodingShmemInit();
WalSndShmemInit();
WalRcvShmemInit();
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index c2f86ff..11aa1f5 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -51,6 +51,9 @@
#include "access/xact.h"
#include "access/twophase.h"
#include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/walsender.h"
+#include "replication/walsender_private.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/spin.h"
@@ -1141,16 +1144,18 @@ TransactionIdIsActive(TransactionId xid)
* GetOldestXmin() move backwards, with no consequences for data integrity.
*/
TransactionId
-GetOldestXmin(bool allDbs, bool ignoreVacuum)
+GetOldestXmin(bool allDbs, bool ignoreVacuum, bool systable, bool alreadyLocked)
{
ProcArrayStruct *arrayP = procArray;
TransactionId result;
int index;
+ volatile TransactionId logical_xmin = InvalidTransactionId;
/* Cannot look for individual databases during recovery */
Assert(allDbs || !RecoveryInProgress());
- LWLockAcquire(ProcArrayLock, LW_SHARED);
+ if (!alreadyLocked)
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
/*
* We initialize the MIN() calculation with latestCompletedXid + 1. This
@@ -1197,6 +1202,10 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
}
}
+ /* fetch into volatile var while ProcArrayLock is held */
+ if (max_logical_slots > 0)
+ logical_xmin = LogicalDecodingCtl->xmin;
+
if (RecoveryInProgress())
{
/*
@@ -1205,7 +1214,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
*/
TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
- LWLockRelease(ProcArrayLock);
+ if (!alreadyLocked)
+ LWLockRelease(ProcArrayLock);
if (TransactionIdIsNormal(kaxmin) &&
TransactionIdPrecedes(kaxmin, result))
@@ -1213,10 +1223,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
}
else
{
- /*
- * No other information needed, so release the lock immediately.
- */
- LWLockRelease(ProcArrayLock);
+ if (!alreadyLocked)
+ LWLockRelease(ProcArrayLock);
/*
* Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
@@ -1237,6 +1245,15 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
result = FirstNormalTransactionId;
}
+ /*
+ * after locks are released and defer_cleanup_age has been applied, check
+ * whether we need to back up further to make logical decoding possible.
+ */
+ if (systable &&
+ TransactionIdIsValid(logical_xmin) &&
+ NormalTransactionIdPrecedes(logical_xmin, result))
+ result = logical_xmin;
+
return result;
}
@@ -1290,7 +1307,9 @@ GetMaxSnapshotSubxidCount(void)
* older than this are known not running any more.
* RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
* running transactions, except those running LAZY VACUUM). This is
- * the same computation done by GetOldestXmin(true, true).
+ * the same computation done by GetOldestXmin(true, true, ...).
+ * RecentGlobalDataXmin: the global xmin for non-catalog tables
+ * >= RecentGlobalXmin
*
* Note: this function should probably not be called with an argument that's
* not statically allocated (see xip allocation below).
@@ -1306,6 +1325,7 @@ GetSnapshotData(Snapshot snapshot)
int count = 0;
int subcount = 0;
bool suboverflowed = false;
+ volatile TransactionId logical_xmin = InvalidTransactionId;
Assert(snapshot != NULL);
@@ -1483,8 +1503,14 @@ GetSnapshotData(Snapshot snapshot)
suboverflowed = true;
}
+
+ /* fetch into volatile var while ProcArrayLock is held */
+ if (max_logical_slots > 0)
+ logical_xmin = LogicalDecodingCtl->xmin;
+
if (!TransactionIdIsValid(MyPgXact->xmin))
MyPgXact->xmin = TransactionXmin = xmin;
+
LWLockRelease(ProcArrayLock);
/*
@@ -1499,6 +1525,17 @@ GetSnapshotData(Snapshot snapshot)
RecentGlobalXmin = globalxmin - vacuum_defer_cleanup_age;
if (!TransactionIdIsNormal(RecentGlobalXmin))
RecentGlobalXmin = FirstNormalTransactionId;
+
+ /* Non-catalog tables can be vacuumed if older than this xid */
+ RecentGlobalDataXmin = RecentGlobalXmin;
+
+ /*
+ * peg the global xmin to the one required for logical decoding if required
+ */
+ if (TransactionIdIsNormal(logical_xmin) &&
+ NormalTransactionIdPrecedes(logical_xmin, RecentGlobalXmin))
+ RecentGlobalXmin = logical_xmin;
+
RecentXmin = xmin;
snapshot->xmin = xmin;
@@ -1599,9 +1636,11 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
* Similar to GetSnapshotData but returns more information. We include
* all PGXACTs with an assigned TransactionId, even VACUUM processes.
*
- * We acquire XidGenLock, but the caller is responsible for releasing it.
- * This ensures that no new XIDs enter the proc array until the caller has
- * WAL-logged this snapshot, and releases the lock.
+ * We acquire XidGenLock and ProcArrayLock, but the caller is responsible for
+ * releasing them. Acquiring XidGenLock ensures that no new XIDs enter the proc
+ * array until the caller has WAL-logged this snapshot, and releases the
+ * lock. Acquiring ProcArrayLock ensures that no transactions commit until the
+ * lock is released.
*
* The returned data structure is statically allocated; caller should not
* modify it, and must not assume it is valid past the next call.
@@ -1736,6 +1775,12 @@ GetRunningTransactionData(void)
}
}
+ /*
+ * Its important *not* to track decoding tasks here because snapbuild.c
+ * uses ->oldestRunningXid to manage its xmin. If it were to be included
+ * here the initial value could never increase.
+ */
+
CurrentRunningXacts->xcnt = count - subcount;
CurrentRunningXacts->subxcnt = subcount;
CurrentRunningXacts->subxid_overflow = suboverflowed;
@@ -1743,13 +1788,12 @@ GetRunningTransactionData(void)
CurrentRunningXacts->oldestRunningXid = oldestRunningXid;
CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
- /* We don't release XidGenLock here, the caller is responsible for that */
- LWLockRelease(ProcArrayLock);
-
Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));
+ /* We don't release the locks here, the caller is responsible for that */
+
return CurrentRunningXacts;
}
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 97da1a0..5f74c3e 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -879,8 +879,23 @@ LogStandbySnapshot(void)
* record we write, because standby will open up when it sees this.
*/
running = GetRunningTransactionData();
+
+ /*
+ * GetRunningTransactionData() acquired ProcArrayLock, we must release
+ * it. We can do that before inserting the WAL record because
+ * ProcArrayApplyRecoveryInfo can recheck the commit status using the
+ * clog. If we're doing logical replication we can't do that though, so
+ * hold the lock for a moment longer.
+ */
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ LWLockRelease(ProcArrayLock);
+
recptr = LogCurrentRunningXacts(running);
+ /* Release lock if we kept it longer ... */
+ if (wal_level >= WAL_LEVEL_LOGICAL)
+ LWLockRelease(ProcArrayLock);
+
/* GetRunningTransactionData() acquired XidGenLock, we must release it */
LWLockRelease(XidGenLock);
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index bfe7d78..015970a 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -512,7 +512,7 @@ RegisterSnapshotInvalidation(Oid dbId, Oid relId)
* Only the local caches are flushed; this does not transmit the message
* to other backends.
*/
-static void
+void
LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
{
if (msg->id >= 0)
@@ -596,7 +596,7 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
* since that tells us we've lost some shared-inval messages and hence
* don't know what needs to be invalidated.
*/
-static void
+void
InvalidateSystemCaches(void)
{
int i;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 44dd0d2..5d304ce 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1601,6 +1601,10 @@ RelationIdGetRelation(Oid relationId)
return rd;
}
+ /* up2date system relations, even during timetravel */
+ if (IsSystemRelationId(relationId))
+ SuspendDecodingSnapshots();
+
/*
* no reldesc in the cache, so have RelationBuildDesc() build one and add
* it.
@@ -1608,6 +1612,10 @@ RelationIdGetRelation(Oid relationId)
rd = RelationBuildDesc(relationId, true);
if (RelationIsValid(rd))
RelationIncrementReferenceCount(rd);
+
+ if (IsSystemRelationId(relationId))
+ UnSuspendDecodingSnapshots();
+
return rd;
}
@@ -1729,6 +1737,10 @@ RelationReloadIndexInfo(Relation relation)
return;
}
+ /* up2date system relations, even during timetravel */
+ if (IsSystemRelation(relation))
+ SuspendDecodingSnapshots();
+
/*
* Read the pg_class row
*
@@ -1796,6 +1808,9 @@ RelationReloadIndexInfo(Relation relation)
/* Okay, now it's valid again */
relation->rd_isvalid = true;
+
+ if (IsSystemRelation(relation))
+ UnSuspendDecodingSnapshots();
}
/*
@@ -1977,6 +1992,10 @@ RelationClearRelation(Relation relation, bool rebuild)
bool keep_tupdesc;
bool keep_rules;
+ /* up2date system relations, even during timetravel */
+ if (IsSystemRelation(relation))
+ SuspendDecodingSnapshots();
+
/* Build temporary entry, but don't link it into hashtable */
newrel = RelationBuildDesc(save_relid, false);
if (newrel == NULL)
@@ -2046,6 +2065,9 @@ RelationClearRelation(Relation relation, bool rebuild)
/* And now we can throw away the temporary entry */
RelationDestroyRelation(newrel);
+
+ if (IsSystemRelation(relation))
+ UnSuspendDecodingSnapshots();
}
}
@@ -3551,7 +3573,10 @@ RelationGetIndexList(Relation relation)
Form_pg_attribute attr;
/* internal column, like oid */
if (attno <= 0)
- continue;
+ {
+ found = false;
+ break;
+ }
attr = relation->rd_att->attrs[attno - 1];
if (!attr->attnotnull)
@@ -3839,17 +3864,26 @@ RelationGetIndexPredicate(Relation relation)
* be bms_free'd when not needed anymore.
*/
Bitmapset *
-RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
+RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
{
Bitmapset *indexattrs;
- Bitmapset *uindexattrs;
+ Bitmapset *uindexattrs; /* unique keys */
+ Bitmapset *cindexattrs; /* best candidate key */
List *indexoidlist;
ListCell *l;
MemoryContext oldcxt;
/* Quick exit if we already computed the result. */
if (relation->rd_indexattr != NULL)
- return bms_copy(keyAttrs ? relation->rd_keyattr : relation->rd_indexattr);
+ switch(attrKind)
+ {
+ case INDEX_ATTR_BITMAP_CANDIDATE_KEY:
+ return bms_copy(relation->rd_ckeyattr);
+ case INDEX_ATTR_BITMAP_KEY:
+ return bms_copy(relation->rd_keyattr);
+ case INDEX_ATTR_BITMAP_ALL:
+ return bms_copy(relation->rd_indexattr);
+ }
/* Fast path if definitely no indexes */
if (!RelationGetForm(relation)->relhasindex)
@@ -3876,13 +3910,16 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
*/
indexattrs = NULL;
uindexattrs = NULL;
+ cindexattrs = NULL;
foreach(l, indexoidlist)
{
Oid indexOid = lfirst_oid(l);
Relation indexDesc;
IndexInfo *indexInfo;
int i;
- bool isKey;
+ bool isCKey;/* candidate or primary key */
+ bool isKey;/* key member */
+
indexDesc = index_open(indexOid, AccessShareLock);
@@ -3894,6 +3931,8 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
indexInfo->ii_Expressions == NIL &&
indexInfo->ii_Predicate == NIL;
+ isCKey = indexOid == relation->rd_primary;
+
/* Collect simple attribute references */
for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
{
@@ -3903,6 +3942,11 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
{
indexattrs = bms_add_member(indexattrs,
attrnum - FirstLowInvalidHeapAttributeNumber);
+
+ if (isCKey)
+ cindexattrs = bms_add_member(cindexattrs,
+ attrnum - FirstLowInvalidHeapAttributeNumber);
+
if (isKey)
uindexattrs = bms_add_member(uindexattrs,
attrnum - FirstLowInvalidHeapAttributeNumber);
@@ -3924,10 +3968,21 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
relation->rd_indexattr = bms_copy(indexattrs);
relation->rd_keyattr = bms_copy(uindexattrs);
+ relation->rd_ckeyattr = bms_copy(cindexattrs);
MemoryContextSwitchTo(oldcxt);
/* We return our original working copy for caller to play with */
- return keyAttrs ? uindexattrs : indexattrs;
+ switch(attrKind)
+ {
+ case INDEX_ATTR_BITMAP_CANDIDATE_KEY:
+ return cindexattrs;
+ case INDEX_ATTR_BITMAP_KEY:
+ return uindexattrs;
+ case INDEX_ATTR_BITMAP_ALL:
+ return indexattrs;
+ default:
+ elog(ERROR, "unknown attrKind %u", attrKind);
+ }
}
/*
@@ -4902,3 +4957,49 @@ unlink_initfile(const char *initfilename)
elog(LOG, "could not remove cache file \"%s\": %m", initfilename);
}
}
+
+bool
+RelationIsDoingTimetravelInternal(Relation relation)
+{
+ Assert(wal_level >= WAL_LEVEL_LOGICAL);
+
+ if (!RelationNeedsWAL(relation))
+ return false;
+
+ /*
+ * XXX: Doing this test instead of using IsSystemNamespace has the
+ * advantage of classifying a catalog relation's toast tables as a
+ * timetravel relation as well. This is safe since even a oid wraparound
+ * will preserve this property (c.f. GetNewObjectId()).
+ */
+ if (IsSystemRelation(relation))
+ return true;
+
+ /*
+ * Also log relevant data if we want the table to behave as a catalog
+ * table, although its not a system provided one.
+ * XXX: we need to make sure both the relation and its toast relation have
+ * the flag set!
+ */
+ if (RelationIsTreatedAsCatalogTable(relation))
+ return true;
+
+ return false;
+}
+
+bool
+RelationIsLogicallyLoggedInternal(Relation relation)
+{
+ Assert(wal_level >= WAL_LEVEL_LOGICAL);
+ if (!RelationNeedsWAL(relation))
+ return false;
+ /*
+ * XXX: In addition to the above comment, we could decide to always log
+ * data even for real system catalogs, although the benefits of that seem
+ * unclear.
+ */
+ if (IsSystemRelation(relation))
+ return false;
+
+ return true;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7d297bc..ced36f6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -57,6 +57,7 @@
#include "postmaster/postmaster.h"
#include "postmaster/syslogger.h"
#include "postmaster/walwriter.h"
+#include "replication/logical.h"
#include "replication/syncrep.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
@@ -2060,6 +2061,17 @@ static struct config_int ConfigureNamesInt[] =
},
{
+ /* see max_connections */
+ {"max_logical_slots", PGC_POSTMASTER, REPLICATION_SENDING,
+ gettext_noop("Sets the maximum number of simultaneously defined WAL decoding slots."),
+ NULL
+ },
+ &max_logical_slots,
+ 0, 0, MAX_BACKENDS /*?*/,
+ NULL, NULL, NULL
+ },
+
+ {
{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
gettext_noop("Sets the maximum time to wait for WAL replication."),
NULL,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d69a02b..b04291c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -161,7 +161,7 @@
# - Settings -
-#wal_level = minimal # minimal, archive, or hot_standby
+#wal_level = minimal # minimal, archive, logical or hot_standby
# (change requires restart)
#fsync = on # turns forced synchronization on or off
#synchronous_commit = on # synchronization level;
@@ -208,11 +208,18 @@
# Set these on the master and on any standby that will send replication data.
-#max_wal_senders = 0 # max number of walsender processes
+#max_wal_senders = 0 # max number of walsender processes, including
+ # both physical and logical replication senders.
# (change requires restart)
#wal_keep_segments = 0 # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s # in milliseconds; 0 disables
+#max_logical_slots = 0 # max number of logical replication sender
+ # and receiver processes. Logical senders
+ # (but not receivers) also consume a
+ # max_wal_senders slot.
+ # (change requires restart)
+
# - Master Server -
# These settings are ignored on a standby server.
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 584d70c..f63bafa 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -69,7 +69,7 @@
*/
static SnapshotData CurrentSnapshotData = {HeapTupleSatisfiesMVCC};
static SnapshotData SecondarySnapshotData = {HeapTupleSatisfiesMVCC};
-static SnapshotData CatalogSnapshotData = {HeapTupleSatisfiesMVCC};
+SnapshotData CatalogSnapshotData = {HeapTupleSatisfiesMVCC};
/* Pointers to valid snapshots */
static Snapshot CurrentSnapshot = NULL;
@@ -86,13 +86,14 @@ static bool CatalogSnapshotStale = true;
* for the convenience of TransactionIdIsInProgress: even in bootstrap
* mode, we don't want it to say that BootstrapTransactionId is in progress.
*
- * RecentGlobalXmin is initialized to InvalidTransactionId, to ensure that no
+ * RecentGlobal(Data)?Xmin is initialized to InvalidTransactionId, to ensure that no
* one tries to use a stale value. Readers should ensure that it has been set
* to something else before using it.
*/
TransactionId TransactionXmin = FirstNormalTransactionId;
TransactionId RecentXmin = FirstNormalTransactionId;
TransactionId RecentGlobalXmin = InvalidTransactionId;
+TransactionId RecentGlobalDataXmin = InvalidTransactionId;
/*
* Elements of the active snapshot stack.
@@ -796,7 +797,7 @@ AtEOXact_Snapshot(bool isCommit)
* Returns the token (the file name) that can be used to import this
* snapshot.
*/
-static char *
+char *
ExportSnapshot(Snapshot snapshot)
{
TransactionId topXid;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index ed66c49..28ce805 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -62,6 +62,8 @@
#include "access/xact.h"
#include "storage/bufmgr.h"
#include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/combocid.h"
#include "utils/tqual.h"
@@ -70,9 +72,17 @@ SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};
+static Snapshot TimetravelSnapshot;
+/* (table, ctid) => (cmin, cmax) mapping during timetravel */
+static HTAB *tuplecid_data = NULL;
+static int timetravel_suspended = 0;
+
+
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
-
+static bool FailsSatisfies(HeapTuple htup, Snapshot snapshot, Buffer buffer);
+static bool RedirectSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
+ Buffer buffer);
/*
* SetHintBits()
@@ -1490,3 +1500,261 @@ HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
*/
return true;
}
+
+/*
+ * check whether the transaciont id 'xid' in in the pre-sorted array 'xip'.
+ */
+static bool
+TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
+{
+ return bsearch(&xid, xip, num,
+ sizeof(TransactionId), xidComparator) != NULL;
+}
+
+/*
+ * See the comments for HeapTupleSatisfiesMVCC for the semantics this function
+ * obeys.
+ *
+ * Only usable on tuples from catalog tables!
+ *
+ * We don't need to support HEAP_MOVED_(IN|OFF) for now because we only support
+ * reading catalog pages which couldn't have been created in an older version.
+ *
+ * We don't set any hint bits in here as it seems unlikely to be beneficial as
+ * those should already be set by normal access and it seems to be too
+ * dangerous to do so as the semantics of doing so during timetravel are more
+ * complicated than when dealing "only" with the present.
+ */
+bool
+HeapTupleSatisfiesMVCCDuringDecoding(HeapTuple htup, Snapshot snapshot,
+ Buffer buffer)
+{
+ HeapTupleHeader tuple = htup->t_data;
+ TransactionId xmin = HeapTupleHeaderGetXmin(tuple);
+ TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
+
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
+ /* inserting transaction aborted */
+ if (tuple->t_infomask & HEAP_XMIN_INVALID)
+ {
+ Assert(!TransactionIdDidCommit(xmin));
+ return false;
+ }
+ /* check if its one of our txids, toplevel is also in there */
+ else if (TransactionIdInArray(xmin, snapshot->subxip, snapshot->subxcnt))
+ {
+ CommandId cmin = HeapTupleHeaderGetRawCommandId(tuple);
+ CommandId cmax = InvalidCommandId;
+
+ /*
+ * If another transaction deleted this tuple or if cmin/cmax is stored
+ * in a combocid we need to to lookup the actual values externally. We
+ * need to do so in the deleted case because the deletion will have
+ * overwritten the cmin value when setting cmax (c.f. combocid.c).
+ */
+ if ((!(tuple->t_infomask & HEAP_XMAX_INVALID) &&
+ !TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt)) ||
+ tuple->t_infomask & HEAP_COMBOCID
+ )
+ {
+ bool resolved;
+
+ resolved = ResolveCminCmaxDuringDecoding(tuplecid_data, htup,
+ buffer, &cmin, &cmax);
+
+ if (!resolved)
+ elog(ERROR, "could not resolve cmin/cmax of catalog tuple");
+ }
+
+ Assert(cmin != InvalidCommandId);
+
+ if (cmin >= snapshot->curcid)
+ return false; /* inserted after scan started */
+ }
+ /* committed before our xmin horizon. Do a normal visibility check. */
+ else if (TransactionIdPrecedes(xmin, snapshot->xmin))
+ {
+ Assert(!(tuple->t_infomask & HEAP_XMIN_COMMITTED &&
+ !TransactionIdDidCommit(xmin)));
+
+ /* check for hint bit first, consult clog afterwards */
+ if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED) &&
+ !TransactionIdDidCommit(xmin))
+ return false;
+ }
+ /* beyond our xmax horizon, i.e. invisible */
+ else if (TransactionIdFollowsOrEquals(xmin, snapshot->xmax))
+ {
+ return false;
+ }
+ /* check if it's a committed transaction in [xmin, xmax) */
+ else if(TransactionIdInArray(xmin, snapshot->xip, snapshot->xcnt))
+ {
+ }
+ /*
+ * none of the above, i.e. between [xmin, xmax) but hasn't
+ * committed. I.e. invisible.
+ */
+ else
+ {
+ return false;
+ }
+
+ /* at this point we know xmin is visible, go on to check xmax */
+
+ /* why should those be in catalog tables? */
+ Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
+
+ /* xid invalid or aborted */
+ if (tuple->t_infomask & HEAP_XMAX_INVALID)
+ return true;
+ /* locked tuples are always visible */
+ else if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+ return true;
+ /* check if its one of our txids, toplevel is also in there */
+ else if (TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt))
+ {
+ CommandId cmin;
+ CommandId cmax = HeapTupleHeaderGetRawCommandId(tuple);
+
+ /* Lookup actual cmin/cmax values */
+ if (tuple->t_infomask & HEAP_COMBOCID)
+ {
+ bool resolved;
+
+ resolved = ResolveCminCmaxDuringDecoding(tuplecid_data, htup,
+ buffer, &cmin, &cmax);
+
+ if (!resolved)
+ elog(ERROR, "could not resolve combocid to cmax");
+ }
+
+ Assert(cmax != InvalidCommandId);
+
+ if (cmax >= snapshot->curcid)
+ return true; /* deleted after scan started */
+ else
+ return false; /* deleted before scan started */
+ }
+ /* below xmin horizon, normal transaction state is valid */
+ else if (TransactionIdPrecedes(xmax, snapshot->xmin))
+ {
+ Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED &&
+ !TransactionIdDidCommit(xmax)));
+
+ /* check hint bit first */
+ if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
+ return false;
+
+ /* check clog */
+ return !TransactionIdDidCommit(xmax);
+ }
+ /* above xmax horizon, we cannot possibly see the deleting transaction */
+ else if (TransactionIdFollowsOrEquals(xmax, snapshot->xmax))
+ return true;
+ /* xmax is between [xmin, xmax), check known committed array */
+ else if (TransactionIdInArray(xmax, snapshot->xip, snapshot->xcnt))
+ return false;
+ /* xmax is between [xmin, xmax), but known not to have committed yet */
+ else
+ return true;
+}
+
+/*
+ * Setup a snapshot that replaces normal catalog snapshots that allows catalog
+ * access to behave just like it did at a certain point in the past.
+ *
+ * Needed for after-the-fact WAL decoding.
+ */
+void
+SetupDecodingSnapshots(Snapshot timetravel_snapshot, HTAB *tuplecids)
+{
+ /* prevent recursively setting up decoding snapshots */
+ Assert(CatalogSnapshotData.satisfies != RedirectSatisfiesMVCC);
+
+ CatalogSnapshotData.satisfies = RedirectSatisfiesMVCC;
+ /* make sure normal snapshots aren't used*/
+ SnapshotSelfData.satisfies = FailsSatisfies;
+ SnapshotAnyData.satisfies = FailsSatisfies;
+ SnapshotToastData.satisfies = FailsSatisfies;
+ /* don't overwrite SnapshotToastData, we want that to behave normally */
+
+ /* setup the timetravel snapshot */
+ TimetravelSnapshot = timetravel_snapshot;
+
+ /* setup (cmin, cmax) lookup hash */
+ tuplecid_data = tuplecids;
+
+ timetravel_suspended = 0;
+}
+
+
+/*
+ * Make catalog snapshots behave normally again.
+ */
+void
+RevertFromDecodingSnapshots(void)
+{
+ Assert(timetravel_suspended == 0);
+
+ TimetravelSnapshot = NULL;
+ tuplecid_data = NULL;
+
+ /* rally to restore sanity and/or boredom */
+ CatalogSnapshotData.satisfies = HeapTupleSatisfiesMVCC;
+ SnapshotSelfData.satisfies = HeapTupleSatisfiesSelf;
+ SnapshotAnyData.satisfies = HeapTupleSatisfiesAny;
+ SnapshotToastData.satisfies = HeapTupleSatisfiesToast;
+ timetravel_suspended = 0;
+}
+
+/*
+ * Disable catalog snapshot timetravel and perform old-fashioned access but
+ * make re-enabling cheap.. This is useful for accessing catalog entries which
+ * must stay up2date like the pg_class entries of system relations.
+ *
+ * Can be called several times in a nested fashion since several of it's
+ * callers suspend timetravel access on several code levels.
+ */
+void
+SuspendDecodingSnapshots(void)
+{
+ timetravel_suspended++;
+}
+
+/*
+ * Enable timetravel again, After SuspendDecodingSnapshots it.
+ */
+void
+UnSuspendDecodingSnapshots(void)
+{
+ Assert(timetravel_suspended > 0);
+ timetravel_suspended--;
+}
+
+/*
+ * Error out if a normal snapshot is used. That is neither legal nor expected
+ * during timetravel, so this is just extra assurance.
+ */
+static bool
+FailsSatisfies(HeapTuple htup, Snapshot snapshot, Buffer buffer)
+{
+ elog(ERROR, "Normal snapshots cannot be used during timetravel access.");
+ return false;
+}
+
+
+/*
+ * Call the replacement SatisifiesMVCC with the required Snapshot data.
+ */
+static bool
+RedirectSatisfiesMVCC(HeapTuple htup, Snapshot snapshot, Buffer buffer)
+{
+ Assert(TimetravelSnapshot != NULL);
+ if (timetravel_suspended > 0)
+ return HeapTupleSatisfiesMVCC(htup, snapshot, buffer);
+ return HeapTupleSatisfiesMVCCDuringDecoding(htup, TimetravelSnapshot,
+ buffer);
+}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f66f530..a887035 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -193,7 +193,9 @@ const char *subdirs[] = {
"base/1",
"pg_tblspc",
"pg_stat",
- "pg_stat_tmp"
+ "pg_stat_tmp",
+ "pg_llog",
+ "pg_llog/snapshots"
};
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index fde483a..8c6cf24 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -77,6 +77,8 @@ wal_level_str(WalLevel wal_level)
return "archive";
case WAL_LEVEL_HOT_STANDBY:
return "hot_standby";
+ case WAL_LEVEL_LOGICAL:
+ return "logical";
}
return _("unrecognized wal_level");
}
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 4381778..42f3e6b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -55,6 +55,18 @@
#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
+#define XLOG_HEAP2_NEW_CID 0x70
+
+/*
+ * xl_heap_* ->flag values
+ */
+/* PD_ALL_VISIBLE was cleared */
+#define XLOG_HEAP_ALL_VISIBLE_CLEARED (1<<0)
+/* PD_ALL_VISIBLE was cleared in the 2nd page */
+#define XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED (1<<1)
+#define XLOG_HEAP_CONTAINS_OLD_TUPLE (1<<2)
+#define XLOG_HEAP_CONTAINS_OLD_KEY (1<<3)
+#define XLOG_HEAP_CONTAINS_NEW_TUPLE (1<<4)
/*
* All what we need to find changed tuple
@@ -78,10 +90,10 @@ typedef struct xl_heap_delete
xl_heaptid target; /* deleted tuple id */
TransactionId xmax; /* xmax of the deleted tuple */
uint8 infobits_set; /* infomask bits */
- bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
+ uint8 flags;
} xl_heap_delete;
-#define SizeOfHeapDelete (offsetof(xl_heap_delete, all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapDelete (offsetof(xl_heap_delete, flags) + sizeof(uint8))
/*
* We don't store the whole fixed part (HeapTupleHeaderData) of an inserted
@@ -100,15 +112,23 @@ typedef struct xl_heap_header
#define SizeOfHeapHeader (offsetof(xl_heap_header, t_hoff) + sizeof(uint8))
+typedef struct xl_heap_header_len
+{
+ uint16 t_len;
+ xl_heap_header header;
+} xl_heap_header_len;
+
+#define SizeOfHeapHeaderLen (offsetof(xl_heap_header_len, header) + SizeOfHeapHeader)
+
/* This is what we need to know about insert */
typedef struct xl_heap_insert
{
xl_heaptid target; /* inserted tuple id */
- bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
+ uint8 flags;
/* xl_heap_header & TUPLE DATA FOLLOWS AT END OF STRUCT */
} xl_heap_insert;
-#define SizeOfHeapInsert (offsetof(xl_heap_insert, all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapInsert (offsetof(xl_heap_insert, flags) + sizeof(uint8))
/*
* This is what we need to know about a multi-insert. The record consists of
@@ -120,7 +140,7 @@ typedef struct xl_heap_multi_insert
{
RelFileNode node;
BlockNumber blkno;
- bool all_visible_cleared;
+ uint8 flags;
uint16 ntuples;
OffsetNumber offsets[1];
@@ -147,13 +167,12 @@ typedef struct xl_heap_update
TransactionId old_xmax; /* xmax of the old tuple */
TransactionId new_xmax; /* xmax of the new tuple */
ItemPointerData newtid; /* new inserted tuple id */
- uint8 old_infobits_set; /* infomask bits to set on old tuple */
- bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
- bool new_all_visible_cleared; /* same for the page of newtid */
+ uint8 old_infobits_set; /* infomask bits to set on old tuple */
+ uint8 flags;
/* NEW TUPLE xl_heap_header AND TUPLE DATA FOLLOWS AT END OF STRUCT */
} xl_heap_update;
-#define SizeOfHeapUpdate (offsetof(xl_heap_update, new_all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapUpdate (offsetof(xl_heap_update, flags) + sizeof(uint8))
/*
* This is what we need to know about vacuum page cleanup/redirect
@@ -261,6 +280,28 @@ typedef struct xl_heap_visible
#define SizeOfHeapVisible (offsetof(xl_heap_visible, cutoff_xid) + sizeof(TransactionId))
+typedef struct xl_heap_new_cid
+{
+ /*
+ * store toplevel xid so we don't have to merge cids from different
+ * transactions
+ */
+ TransactionId top_xid;
+ CommandId cmin;
+ CommandId cmax;
+ /*
+ * don't really need the combocid but the padding makes it free and its
+ * useful for debugging.
+ */
+ CommandId combocid;
+ /*
+ * Store the relfilenode/ctid pair to facilitate lookups.
+ */
+ xl_heaptid target;
+} xl_heap_new_cid;
+
+#define SizeOfHeapNewCid (offsetof(xl_heap_new_cid, target) + SizeOfHeapTid)
+
extern void HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
TransactionId *latestRemovedXid);
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 23a41fd..8452ec5 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -63,6 +63,11 @@
(AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
(int32) ((id1) - (id2)) < 0)
+/* compare two XIDs already known to be normal; this is a macro for speed */
+#define NormalTransactionIdFollows(id1, id2) \
+ (AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
+ (int32) ((id1) - (id2)) > 0)
+
/* ----------
* Object ID (OID) zero is InvalidOid.
*
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 835f6ac..96502ce 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -215,6 +215,7 @@ extern TransactionId GetCurrentTransactionId(void);
extern TransactionId GetCurrentTransactionIdIfAny(void);
extern TransactionId GetStableLatestTransactionId(void);
extern SubTransactionId GetCurrentSubTransactionId(void);
+extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 002862c..7415a26 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -197,7 +197,8 @@ typedef enum WalLevel
{
WAL_LEVEL_MINIMAL = 0,
WAL_LEVEL_ARCHIVE,
- WAL_LEVEL_HOT_STANDBY
+ WAL_LEVEL_HOT_STANDBY,
+ WAL_LEVEL_LOGICAL
} WalLevel;
extern int wal_level;
@@ -210,9 +211,12 @@ extern int wal_level;
*/
#define XLogIsNeeded() (wal_level >= WAL_LEVEL_ARCHIVE)
-/* Do we need to WAL-log information required only for Hot Standby? */
+/* Do we need to WAL-log information required only for Hot Standby and logical replication? */
#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_HOT_STANDBY)
+/* Do we need to WAL-log information required only for logical replication? */
+#define XLogLogicalInfoActive() (wal_level >= WAL_LEVEL_LOGICAL)
+
#ifdef WAL_DEBUG
extern bool XLOG_DEBUG;
#endif
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 3829ce2..fdc8cc2 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -19,6 +19,7 @@
#ifndef XLOGREADER_H
#define XLOGREADER_H
+#include "access/xlog.h"
#include "access/xlog_internal.h"
typedef struct XLogReaderState XLogReaderState;
@@ -108,10 +109,20 @@ struct XLogReaderState
char *errormsg_buf;
};
-/* Get a new XLogReader */
+
extern XLogReaderState *XLogReaderAllocate(XLogPageReadCB pagereadfunc,
void *private_data);
+
+typedef struct XLogRecordBuffer
+{
+ XLogRecPtr origptr;
+ XLogRecPtr endptr;
+ XLogRecord record;
+ char *record_data;
+} XLogRecordBuffer;
+
+
/* Free an XLogReader */
extern void XLogReaderFree(XLogReaderState *state);
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 44b6f38..a96ed69 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -23,6 +23,7 @@ extern ForkNumber forkname_to_number(char *forkName);
extern char *GetDatabasePath(Oid dbNode, Oid spcNode);
+extern bool IsSystemRelationId(Oid relid);
extern bool IsSystemRelation(Relation relation);
extern bool IsToastRelation(Relation relation);
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index f03dd0b..cf9c143 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2621,6 +2621,8 @@ DATA(insert OID = 2022 ( pg_stat_get_activity PGNSP PGUID 12 1 100 0 0 f f f
DESCR("statistics: information about currently active backends");
DATA(insert OID = 3099 ( pg_stat_get_wal_senders PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 "" "{23,25,25,25,25,25,23,25}" "{o,o,o,o,o,o,o,o}" "{pid,state,sent_location,write_location,flush_location,replay_location,sync_priority,sync_state}" _null_ pg_stat_get_wal_senders _null_ _null_ _null_ ));
DESCR("statistics: information about currently active replication");
+DATA(insert OID = 3457 ( pg_stat_get_logical_decoding_slots PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 "" "{25,25,26,16,28,25}" "{o,o,o,o,o,o}" "{slot_name,plugin,database,active,xmin,restart_decoding_lsn}" _null_ pg_stat_get_logical_decoding_slots _null_ _null_ _null_ ));
+DESCR("statistics: information about logical replication slots currently in use");
DATA(insert OID = 2026 ( pg_backend_pid PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 23 "" _null_ _null_ _null_ _null_ pg_backend_pid _null_ _null_ _null_ ));
DESCR("statistics: current backend PID");
DATA(insert OID = 1937 ( pg_stat_get_backend_pid PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 23 "23" _null_ _null_ _null_ _null_ pg_stat_get_backend_pid _null_ _null_ _null_ ));
@@ -4725,6 +4727,10 @@ DESCR("SP-GiST support for quad tree over range");
DATA(insert OID = 3473 ( spg_range_quad_leaf_consistent PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "2281 2281" _null_ _null_ _null_ _null_ spg_range_quad_leaf_consistent _null_ _null_ _null_ ));
DESCR("SP-GiST support for quad tree over range");
+DATA(insert OID = 3779 ( init_logical_replication PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2249 "19 19" "{19,19,25,25}" "{i,i,o,o}" "{slotname,plugin,slotname,xlog_position}" _null_ init_logical_replication _null_ _null_ _null_ ));
+DESCR("set up a logical replication slot");
+DATA(insert OID = 3780 ( stop_logical_replication PGNSP PGUID 12 1 0 0 0 f f f f f f v 1 0 23 "19" _null_ _null_ _null_ _null_ stop_logical_replication _null_ _null_ _null_ ));
+DESCR("stop logical replication");
/* event triggers */
DATA(insert OID = 3566 ( pg_event_trigger_dropped_objects PGNSP PGUID 12 10 100 0 0 f f f f t t s 0 0 2249 "" "{26,26,23,25,25,25,25}" "{o,o,o,o,o,o,o}" "{classid, objid, objsubid, object_type, schema_name, object_name, object_identity}" _null_ pg_event_trigger_dropped_objects _null_ _null_ _null_ ));
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index d8dd8b0..2616ac1 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -156,7 +156,7 @@ extern void vac_update_relstats(Relation relation,
TransactionId frozenxid,
MultiXactId minmulti);
extern void vacuum_set_xid_limits(int freeze_min_age, int freeze_table_age,
- bool sharedRel,
+ bool sharedRel, bool catalogRel,
TransactionId *oldestXmin,
TransactionId *freezeLimit,
TransactionId *freezeTableLimit,
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 78368c6..360f98c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -409,6 +409,9 @@ typedef enum NodeTag
T_IdentifySystemCmd,
T_BaseBackupCmd,
T_StartReplicationCmd,
+ T_InitLogicalReplicationCmd,
+ T_StartLogicalReplicationCmd,
+ T_FreeLogicalReplicationCmd,
T_TimeLineHistoryCmd,
/*
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 85b4544..3da8d40 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -52,6 +52,41 @@ typedef struct StartReplicationCmd
/* ----------------------
+ * INIT_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct InitLogicalReplicationCmd
+{
+ NodeTag type;
+ char *name;
+ char *plugin;
+} InitLogicalReplicationCmd;
+
+
+/* ----------------------
+ * START_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct StartLogicalReplicationCmd
+{
+ NodeTag type;
+ char *name;
+ XLogRecPtr startpoint;
+ List *options;
+} StartLogicalReplicationCmd;
+
+/* ----------------------
+ * FREE_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct FreeLogicalReplicationCmd
+{
+ NodeTag type;
+ char *name;
+} FreeLogicalReplicationCmd;
+
+
+/* ----------------------
* TIMELINE_HISTORY command
* ----------------------
*/
diff --git a/src/include/replication/decode.h b/src/include/replication/decode.h
new file mode 100644
index 0000000..dd3f2ca
--- /dev/null
+++ b/src/include/replication/decode.h
@@ -0,0 +1,20 @@
+/*-------------------------------------------------------------------------
+ * decode.h
+ * PostgreSQL WAL to logical transformation
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DECODE_H
+#define DECODE_H
+
+#include "access/xlogreader.h"
+#include "replication/reorderbuffer.h"
+#include "replication/logical.h"
+
+void DecodeRecordIntoReorderBuffer(LogicalDecodingContext *ctx,
+ XLogRecordBuffer *buf);
+
+#endif
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
new file mode 100644
index 0000000..971180b
--- /dev/null
+++ b/src/include/replication/logical.h
@@ -0,0 +1,198 @@
+/*-------------------------------------------------------------------------
+ * logical.h
+ * PostgreSQL WAL to logical transformation
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LOGICAL_H
+#define LOGICAL_H
+
+#include "access/xlog.h"
+#include "access/xlogreader.h"
+#include "replication/output_plugin.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+
+/*
+ * Shared memory state of a single logical decoding slot
+ */
+typedef struct LogicalDecodingSlot
+{
+ /* lock, on same cacheline as effective_xmin */
+ slock_t mutex;
+
+ /* on-disk xmin, updated first */
+ TransactionId xmin;
+
+ /* in-memory xmin, updated after syncing to disk */
+ TransactionId effective_xmin;
+
+ /* is this slot defined */
+ bool in_use;
+
+ /* is somebody streaming out changes for this slot */
+ bool active;
+
+ /* have we been aborted while ->active */
+ bool aborted;
+
+ /* ----
+ * If we shutdown, crash, whatever where do we have to restart decoding
+ * from to
+ * a) find a valid & ready snapshot
+ * b) the complete content for all in-progress xacts
+ * ----
+ */
+ XLogRecPtr restart_decoding;
+
+ /*
+ * Last location we know the client has confirmed to have safely received
+ * data to. No earlier data can be decoded after a restart/crash.
+ */
+ XLogRecPtr confirmed_flush;
+
+ /* ----
+ * When the client has confirmed flushes >= candidate_xmin_after we can
+ * a) advance the pegged xmin
+ * b) advance restart_decoding_from so we have to read/keep less WAL
+ * ----
+ */
+ XLogRecPtr candidate_lsn;
+ TransactionId candidate_xmin;
+ XLogRecPtr candidate_restart_decoding;
+
+ /* database the slot is active on */
+ Oid database;
+
+ /* slot identifier */
+ NameData name;
+
+ /* plugin name */
+ NameData plugin;
+} LogicalDecodingSlot;
+
+/*
+ * Shared memory control area for all of logical decoding
+ */
+typedef struct LogicalDecodingCtlData
+{
+ /*
+ * Xmin across all logical slots.
+ *
+ * Protected by ProcArrayLock.
+ */
+ TransactionId xmin;
+
+ LogicalDecodingSlot logical_slots[FLEXIBLE_ARRAY_MEMBER];
+} LogicalDecodingCtlData;
+
+/*
+ * Pointers to shared memory
+ */
+extern LogicalDecodingCtlData *LogicalDecodingCtl;
+extern LogicalDecodingSlot *MyLogicalDecodingSlot;
+
+struct LogicalDecodingContext;
+
+typedef void (*LogicalOutputPluginWriterWrite) (
+ struct LogicalDecodingContext *lr,
+ XLogRecPtr Ptr,
+ TransactionId xid
+);
+
+typedef LogicalOutputPluginWriterWrite LogicalOutputPluginWriterPrepareWrite;
+
+/*
+ * Output plugin callbacks
+ */
+typedef struct OutputPluginCallbacks
+{
+ LogicalDecodeInitCB init_cb;
+ LogicalDecodeBeginCB begin_cb;
+ LogicalDecodeChangeCB change_cb;
+ LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeCleanupCB cleanup_cb;
+} OutputPluginCallbacks;
+
+typedef struct LogicalDecodingContext
+{
+ struct XLogReaderState *reader;
+ struct LogicalDecodingSlot *slot;
+ struct ReorderBuffer *reorder;
+ struct SnapBuild *snapshot_builder;
+
+ struct OutputPluginCallbacks callbacks;
+
+ bool stop_after_consistent;
+
+ /*
+ * User specified options
+ */
+ List *output_plugin_options;
+
+ /*
+ * User-Provided callback for writing/streaming out data.
+ */
+ LogicalOutputPluginWriterPrepareWrite prepare_write;
+ LogicalOutputPluginWriterWrite write;
+
+ /*
+ * Output buffer.
+ */
+ StringInfo out;
+
+ /*
+ * Private data pointer for the creator of the logical decoding context.
+ */
+ void *owner_private;
+
+ /*
+ * Private data pointer of the output plugin.
+ */
+ void *output_plugin_private;
+
+ /*
+ * Private data pointer for the data writer.
+ */
+ void *output_writer_private;
+} LogicalDecodingContext;
+
+/* GUCs */
+extern PGDLLIMPORT int max_logical_slots;
+
+extern Size LogicalDecodingShmemSize(void);
+extern void LogicalDecodingShmemInit(void);
+
+extern void LogicalDecodingAcquireFreeSlot(const char *name, const char *plugin);
+extern void LogicalDecodingReleaseSlot(void);
+extern void LogicalDecodingReAcquireSlot(const char *name);
+extern void LogicalDecodingFreeSlot(const char *name);
+
+extern void ComputeLogicalXmin(void);
+
+/* change logical xmin */
+extern void IncreaseLogicalXminForSlot(XLogRecPtr lsn, TransactionId xmin);
+
+/* change recovery restart location */
+extern void IncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart_lsn);
+
+extern void LogicalConfirmReceivedLocation(XLogRecPtr lsn);
+
+extern void CheckLogicalReplicationRequirements(void);
+
+extern void StartupLogicalReplication(XLogRecPtr checkPointRedo);
+
+extern LogicalDecodingContext *CreateLogicalDecodingContext(
+ LogicalDecodingSlot *slot,
+ bool is_init,
+ XLogRecPtr start_lsn,
+ List *output_plugin_options,
+ XLogPageReadCB read_page,
+ LogicalOutputPluginWriterPrepareWrite prepare_write,
+ LogicalOutputPluginWriterWrite do_write);
+extern bool LogicalDecodingContextReady(LogicalDecodingContext *ctx);
+extern void FreeLogicalDecodingContext(LogicalDecodingContext *ctx);
+
+#endif
diff --git a/src/include/replication/logicalfuncs.h b/src/include/replication/logicalfuncs.h
new file mode 100644
index 0000000..37f36a5
--- /dev/null
+++ b/src/include/replication/logicalfuncs.h
@@ -0,0 +1,19 @@
+/*-------------------------------------------------------------------------
+ * logicalfuncs.h
+ * PostgreSQL WAL to logical transformation support functions
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LOGICALFUNCS_H
+#define LOGICALFUNCS_H
+
+extern int logical_read_local_xlog_page(XLogReaderState *state,
+ XLogRecPtr targetPagePtr,
+ int reqLen, XLogRecPtr targetRecPtr,
+ char *cur_page, TimeLineID *pageTLI);
+
+extern Datum pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS);
+
+#endif
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
new file mode 100644
index 0000000..a9fcc2d
--- /dev/null
+++ b/src/include/replication/output_plugin.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ * output_plugin.h
+ * PostgreSQL Logical Decode Plugin Interface
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OUTPUT_PLUGIN_H
+#define OUTPUT_PLUGIN_H
+
+#include "replication/reorderbuffer.h"
+
+struct LogicalDecodingContext;
+
+/*
+ * Callback that gets called in a user-defined plugin. ctx->private_data can
+ * be set to some private data.
+ *
+ * "is_init" will be set to "true" if the decoding slot just got defined. When
+ * the same slot is used from there one, it will be "false".
+ *
+ * Gets looked up via the library symbol pg_decode_init.
+ */
+typedef void (*LogicalDecodeInitCB) (
+ struct LogicalDecodingContext *ctx,
+ bool is_init
+);
+
+/*
+ * Callback called for every BEGIN of a successful transaction.
+ *
+ * Gets looked up via the library symbol pg_decode_begin_txn.
+ */
+typedef void (*LogicalDecodeBeginCB) (
+ struct LogicalDecodingContext *,
+ ReorderBufferTXN *txn);
+
+/*
+ * Callback for every individual change in a successful transaction.
+ *
+ * Gets looked up via the library symbol pg_decode_change.
+ */
+typedef void (*LogicalDecodeChangeCB) (
+ struct LogicalDecodingContext *,
+ ReorderBufferTXN *txn,
+ Relation relation,
+ ReorderBufferChange *change
+);
+
+/*
+ * Called for every COMMIT of a successful transaction.
+ *
+ * Gets looked up via the library symbol pg_decode_commit_txn.
+ */
+typedef void (*LogicalDecodeCommitCB) (
+ struct LogicalDecodingContext *,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called to cleanup the state of an output plugin.
+ *
+ * Gets looked up via the library symbol pg_decode_cleanup.
+ */
+typedef void (*LogicalDecodeCleanupCB) (
+ struct LogicalDecodingContext *
+);
+
+#endif /* OUTPUT_PLUGIN_H */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
new file mode 100644
index 0000000..7a4e046
--- /dev/null
+++ b/src/include/replication/reorderbuffer.h
@@ -0,0 +1,342 @@
+/*
+ * reorderbuffer.h
+ *
+ * PostgreSQL logical replay buffer management
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * src/include/replication/reorderbuffer.h
+ */
+#ifndef REORDERBUFFER_H
+#define REORDERBUFFER_H
+
+#include "access/htup_details.h"
+#include "utils/hsearch.h"
+#include "utils/rel.h"
+
+#include "lib/ilist.h"
+
+#include "storage/sinval.h"
+
+#include "utils/snapshot.h"
+
+/* an individual tuple, stored in one chunk of memory */
+typedef struct ReorderBufferTupleBuf
+{
+ /* position in preallocated list */
+ slist_node node;
+
+ /* tuple, stored sequentially */
+ HeapTupleData tuple;
+ HeapTupleHeaderData header;
+ char data[MaxHeapTupleSize];
+} ReorderBufferTupleBuf;
+
+/* types of the change passed to a 'change' callback */
+enum ReorderBufferChangeType
+{
+ REORDER_BUFFER_CHANGE_INSERT,
+ REORDER_BUFFER_CHANGE_UPDATE,
+ REORDER_BUFFER_CHANGE_DELETE
+};
+
+/*
+ * a single 'change', can be an insert (with one tuple), an update (old, new),
+ * or a delete (old).
+ *
+ * The same struct is also used internally for other purposes but that should
+ * never be visible outside reorderbuffer.c.
+ */
+typedef struct ReorderBufferChange
+{
+ XLogRecPtr lsn;
+
+ /* type of change */
+ union
+ {
+ enum ReorderBufferChangeType action;
+ /* do not leak internal enum values to the outside */
+ int action_internal;
+ };
+
+ /*
+ * Context data for the change, which part of the union is valid depends
+ * on action/action_internal.
+ */
+ union
+ {
+ /* old, new tuples when action == *_INSERT|UPDATE|DELETE */
+ struct
+ {
+ /* relation that has been changed */
+ RelFileNode relnode;
+ /* valid for DELETE || UPDATE */
+ ReorderBufferTupleBuf *oldtuple;
+ /* valid for INSERT || UPDATE */
+ ReorderBufferTupleBuf *newtuple;
+ };
+
+ /* new snapshot */
+ Snapshot snapshot;
+
+ /* new command id for existing snapshot in a catalog changing tx */
+ CommandId command_id;
+
+ /* new cid mapping for catalog changing transaction */
+ struct
+ {
+ RelFileNode node;
+ ItemPointerData tid;
+ CommandId cmin;
+ CommandId cmax;
+ CommandId combocid;
+ } tuplecid;
+ };
+
+ /*
+ * While in use this is how a change is linked into a transactions,
+ * otherwise it's the preallocated list.
+ */
+ dlist_node node;
+} ReorderBufferChange;
+
+typedef struct ReorderBufferTXN
+{
+ /*
+ * The transactions transaction id, can be a toplevel or sub xid.
+ */
+ TransactionId xid;
+
+ /*
+ * LSN of the first data carrying, WAL record with knowledge about this
+ * xid. This is allowed to *not* be first record adorned with this xid, if
+ * the previous records aren't relevant for logical decoding.
+ */
+ XLogRecPtr first_lsn;
+
+ /* ----
+ * LSN of the record that lead to this xact to be committed or
+ * aborted. This can be a
+ * * plain commit record
+ * * plain commit record, of a parent transaction
+ * * prepared transaction commit
+ * * plain abort record
+ * * prepared transaction abort
+ * * error during decoding
+ * ----
+ */
+ XLogRecPtr final_lsn;
+
+ /*
+ * LSN pointing to the end of the commit record + 1.
+ */
+ XLogRecPtr end_lsn;
+
+ /*
+ * LSN of the last lsn at which snapshot information reside, so we can
+ * restart decoding from there and fully recover this transaction from
+ * WAL.
+ */
+ XLogRecPtr restart_decoding_lsn;
+
+ /*
+ * Base snapshot or NULL.
+ */
+ Snapshot base_snapshot;
+
+ /* did the TX have catalog changes */
+ bool does_timetravel;
+
+ /*
+ * Do we know this is a subxact?
+ */
+ bool is_known_as_subxact;
+
+ /*
+ * How many ReorderBufferChange's do we have in this txn.
+ *
+ * Changes in subtransactions are *not* included but tracked separately.
+ */
+ Size nentries;
+
+ /*
+ * How many of the above entries are stored in memory in contrast to being
+ * spilled to disk.
+ */
+ Size nentries_mem;
+
+ /*
+ * List of ReorderBufferChange structs, including new Snapshots and new
+ * CommandIds
+ */
+ dlist_head changes;
+
+ /*
+ * List of (relation, ctid) => (cmin, cmax) mappings for catalog tuples.
+ * Those are always assigned to the toplevel transaction. (Keep track of
+ * #entries to create a hash of the right size)
+ */
+ dlist_head tuplecids;
+ size_t ntuplecids;
+
+ /*
+ * On-demand built hash for looking up the above values.
+ */
+ HTAB *tuplecid_hash;
+
+ /*
+ * Hash containing (potentially partial) toast entries. NULL if no toast
+ * tuples have been found for the current change.
+ */
+ HTAB *toast_hash;
+
+ /*
+ * non-hierarchical list of subtransactions that are *not* aborted. Only
+ * used in toplevel transactions.
+ */
+ dlist_head subtxns;
+ size_t nsubtxns;
+
+ /* ---
+ * Position in one of three lists:
+ * * list of subtransactions if we are *known* to be subxact
+ * * list of toplevel xacts (can be a as-yet unknown subxact)
+ * * list of preallocated ReorderBufferTXNs
+ * ---
+ */
+ dlist_node node;
+
+ /*
+ * Stored cache invalidations. This is not a linked list because we get
+ * all the invalidations at once.
+ */
+ SharedInvalidationMessage *invalidations;
+ size_t ninvalidations;
+
+} ReorderBufferTXN;
+
+/* so we can define the callbacks used inside struct ReorderBuffer itself */
+typedef struct ReorderBuffer ReorderBuffer;
+
+/* change callback signature */
+typedef void (*ReorderBufferApplyChangeCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ Relation relation,
+ ReorderBufferChange *change);
+
+/* begin callback signature */
+typedef void (*ReorderBufferBeginCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn);
+
+/* commit callback signature */
+typedef void (*ReorderBufferCommitCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+struct ReorderBuffer
+{
+ /*
+ * xid => ReorderBufferTXN lookup table
+ */
+ HTAB *by_txn;
+
+ /*
+ * Transactions that could be a toplevel xact, ordered by LSN of the first
+ * record bearing that xid..
+ */
+ dlist_head toplevel_by_lsn;
+
+ /*
+ * one-entry sized cache for by_txn. Very frequently the same txn gets
+ * looked up over and over again.
+ */
+ TransactionId by_txn_last_xid;
+ ReorderBufferTXN *by_txn_last_txn;
+
+ /*
+ * Callacks to be called when a transactions commits.
+ */
+ ReorderBufferBeginCB begin;
+ ReorderBufferApplyChangeCB apply_change;
+ ReorderBufferCommitCB commit;
+
+ /*
+ * Pointer that will be passed untouched to the callbacks.
+ */
+ void *private_data;
+
+ /*
+ * Private memory context.
+ */
+ MemoryContext context;
+
+ /*
+ * Data structure slab cache.
+ *
+ * We allocate/deallocate some structures very frequently, to avoid bigger
+ * overhead we cache some unused ones here.
+ *
+ * The maximum number of cached entries is controlled by const variables
+ * ontop of reorderbuffer.c
+ */
+
+ /* cached ReorderBufferTXNs */
+ dlist_head cached_transactions;
+ Size nr_cached_transactions;
+
+ /* cached ReorderBufferChanges */
+ dlist_head cached_changes;
+ Size nr_cached_changes;
+
+ /* cached ReorderBufferTupleBufs */
+ slist_head cached_tuplebufs;
+ Size nr_cached_tuplebufs;
+
+ XLogRecPtr current_restart_decoding_lsn;
+
+ /* buffer for disk<->memory conversions */
+ char *outbuf;
+ Size outbufsize;
+};
+
+
+ReorderBuffer *ReorderBufferAllocate(void);
+void ReorderBufferFree(ReorderBuffer *);
+
+ReorderBufferTupleBuf *ReorderBufferGetTupleBuf(ReorderBuffer *);
+void ReorderBufferReturnTupleBuf(ReorderBuffer *, ReorderBufferTupleBuf *tuple);
+ReorderBufferChange *ReorderBufferGetChange(ReorderBuffer *);
+void ReorderBufferReturnChange(ReorderBuffer *, ReorderBufferChange *);
+
+void ReorderBufferQueueChange(ReorderBuffer *, TransactionId, XLogRecPtr lsn, ReorderBufferChange *);
+void ReorderBufferCommit(ReorderBuffer *, TransactionId,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
+void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
+void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
+void ReorderBufferAbort(ReorderBuffer *, TransactionId, XLogRecPtr lsn);
+
+void ReorderBufferSetBaseSnapshot(ReorderBuffer *, TransactionId, XLogRecPtr lsn, struct SnapshotData *snap);
+void ReorderBufferAddSnapshot(ReorderBuffer *, TransactionId, XLogRecPtr lsn, struct SnapshotData *snap);
+void ReorderBufferAddNewCommandId(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+ CommandId cid);
+void ReorderBufferAddNewTupleCids(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+ RelFileNode node, ItemPointerData pt,
+ CommandId cmin, CommandId cmax, CommandId combocid);
+void ReorderBufferAddInvalidations(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+ Size nmsgs, SharedInvalidationMessage *msgs);
+bool ReorderBufferIsXidKnown(ReorderBuffer *, TransactionId xid);
+void ReorderBufferXidSetTimetravel(ReorderBuffer *, TransactionId xid, XLogRecPtr lsn);
+bool ReorderBufferXidDoesTimetravel(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+
+ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
+
+void ReorderBufferSetRestartPoint(ReorderBuffer *, XLogRecPtr ptr);
+
+void ReorderBufferStartup(void);
+
+#endif
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
new file mode 100644
index 0000000..ff369c5
--- /dev/null
+++ b/src/include/replication/snapbuild.h
@@ -0,0 +1,79 @@
+/*-------------------------------------------------------------------------
+ *
+ * snapbuild.h
+ * Exports from replication/logical/snapbuild.c.
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * src/include/replication/snapbuild.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SNAPBUILD_H
+#define SNAPBUILD_H
+
+#include "access/xlogdefs.h"
+
+typedef enum
+{
+ /*
+ * Initial state, we can't do much yet.
+ */
+ SNAPBUILD_START,
+
+ /*
+ * We have collected enough information to decode tuples in transactions
+ * that started after this.
+ *
+ * Once we reached this we start to collect changes. We cannot apply them
+ * yet because the might be based on transactions that were still running
+ * when we reached them yet.
+ */
+ SNAPBUILD_FULL_SNAPSHOT,
+
+ /*
+ * Found a point after hitting built_full_snapshot where all transactions
+ * that were running at that point finished. Till we reach that we hold
+ * off calling any commit callbacks.
+ */
+ SNAPBUILD_CONSISTENT
+} SnapBuildState;
+
+/* forward declare so we don't have to expose the struct to the public */
+struct SnapBuild;
+typedef struct SnapBuild SnapBuild;
+
+/* forward declare so we don't have to include xlogreader.h */
+struct XLogRecordBuffer;
+
+extern SnapBuild *AllocateSnapshotBuilder(ReorderBuffer *cache,
+ TransactionId xmin_horizon, XLogRecPtr start_lsn);
+extern void FreeSnapshotBuilder(SnapBuild *cache);
+
+extern void SnapBuildSnapDecRefcount(Snapshot snap);
+
+extern const char *SnapBuildExportSnapshot(SnapBuild *snapstate);
+extern void SnapBuildClearExportedSnapshot(void);
+
+extern SnapBuildState SnapBuildCurrentState(SnapBuild *snapstate);
+
+extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+
+/* don't want to include heapam_xlog.h */
+struct xl_heap_new_cid;
+struct xl_running_xacts;
+
+extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
+ TransactionId xid, int nsubxacts,
+ TransactionId *subxacts);
+extern void SnapBuildAbortTxn(SnapBuild *builder, TransactionId xid,
+ int nsubxacts, TransactionId *subxacts);
+extern bool SnapBuildProcessChange(SnapBuild *builder, TransactionId xid,
+ XLogRecPtr lsn);
+extern void SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
+ XLogRecPtr lsn, struct xl_heap_new_cid *cid);
+extern void SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn,
+ struct xl_running_xacts *running);
+extern void SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn);
+
+#endif /* SNAPBUILD_H */
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 7eaa21b..daae320 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -66,6 +66,7 @@ typedef struct WalSnd
extern WalSnd *MyWalSnd;
+
/* There is one WalSndCtl struct for the whole database cluster */
typedef struct
{
@@ -93,7 +94,6 @@ typedef struct
extern WalSndCtlData *WalSndCtl;
-
extern void WalSndSetState(WalSndState state);
/*
@@ -108,4 +108,8 @@ extern void replication_scanner_finish(void);
extern Node *replication_parse_result;
+/* logical wal sender data gathering functions */
+extern XLogRecPtr WalSndWaitForWal(XLogRecPtr loc);
+
+
#endif /* _WALSENDER_PRIVATE_H */
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index e0eb184..75c56a9 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -116,6 +116,9 @@ typedef ItemPointerData *ItemPointer;
/*
* ItemPointerCopy
* Copies the contents of one disk item pointer to another.
+ *
+ * Should there ever be padding in an ItemPointer this would need to be handled
+ * differently as it's used as hash key.
*/
#define ItemPointerCopy(fromPointer, toPointer) \
( \
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 39415a3..a33d6cf 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -80,6 +80,7 @@ typedef enum LWLockId
OldSerXidLock,
SyncRepLock,
BackgroundWorkerLock,
+ LogicalReplicationCtlLock,
/* Individual lock IDs end here */
FirstBufMappingLock,
FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index c5f58b4..744317e 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -50,7 +50,7 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
extern bool TransactionIdIsActive(TransactionId xid);
-extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum);
+extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum, bool systable, bool alreadyLocked);
extern TransactionId GetOldestActiveTransactionId(void);
extern VirtualTransactionId *GetVirtualXIDsDelayingChkpt(int *nvxids);
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index 7e70e57..5448818 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -147,4 +147,6 @@ extern void ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs
int nmsgs, bool RelcacheInitFileInval,
Oid dbid, Oid tsid);
+extern void LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg);
+
#endif /* SINVAL_H */
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 6fd6e1e..5424912 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -64,4 +64,5 @@ extern void CacheRegisterRelcacheCallback(RelcacheCallbackFunction func,
extern void CallSyscacheCallbacks(int cacheid, uint32 hashvalue);
+extern void InvalidateSystemCaches(void);
#endif /* INVAL_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 0281b4b..6a4d2d5 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -104,6 +104,7 @@ typedef struct RelationData
List *rd_indexlist; /* list of OIDs of indexes on relation */
Bitmapset *rd_indexattr; /* identifies columns used in indexes */
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
+ Bitmapset *rd_ckeyattr; /* cols that are included ref'd by pkey */
Oid rd_oidindex; /* OID of unique index on OID, if any */
LockInfoData rd_lockInfo; /* lock mgr's info for locking relation */
RuleLock *rd_rules; /* rewrite rules */
@@ -221,6 +222,7 @@ typedef struct StdRdOptions
AutoVacOpts autovacuum; /* autovacuum-related options */
bool security_barrier; /* for views */
int check_option_offset; /* for views */
+ bool treat_as_catalog_table; /* treat as timetraveleable table */
} StdRdOptions;
#define HEAP_MIN_FILLFACTOR 10
@@ -290,6 +292,15 @@ typedef struct StdRdOptions
"cascaded") == 0 : false)
/*
+ * RelationIsTreatedAsCatalogTable
+ * Returns whether the relation should be treated as a catalog table
+ * from the pov of logical decoding.
+ */
+#define RelationIsTreatedAsCatalogTable(relation) \
+ ((relation)->rd_options ? \
+ ((StdRdOptions *) (relation)->rd_options)->treat_as_catalog_table : false)
+
+/*
* RelationIsValid
* True iff relation descriptor is valid.
*/
@@ -441,7 +452,6 @@ typedef struct StdRdOptions
((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP && \
!(relation)->rd_islocaltemp)
-
/*
* RelationIsScannable
* Currently can only be false for a materialized view which has not been
@@ -458,6 +468,24 @@ typedef struct StdRdOptions
*/
#define RelationIsPopulated(relation) ((relation)->rd_rel->relispopulated)
+/*
+ * RelationIsDoingTimetravel
+ * True if we need to log enough information to provide timetravel access
+ */
+#define RelationIsDoingTimetravel(relation) \
+ (wal_level >= WAL_LEVEL_LOGICAL && \
+ RelationIsDoingTimetravelInternal(relation))
+
+/*
+ * RelationIsLogicallyLogged
+ * True if we need to log enough information to provide timetravel access
+ */
+#define RelationIsLogicallyLogged(relation) \
+ (wal_level >= WAL_LEVEL_LOGICAL && \
+ RelationIsLogicallyLoggedInternal(relation))
+
+extern bool RelationIsDoingTimetravelInternal(Relation relation);
+extern bool RelationIsLogicallyLoggedInternal(Relation relation);
/* routines in utils/cache/relcache.c */
extern void RelationIncrementReferenceCount(Relation rel);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8ac2549..cfeded8 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -41,7 +41,16 @@ extern List *RelationGetIndexList(Relation relation);
extern Oid RelationGetOidIndex(Relation relation);
extern List *RelationGetIndexExpressions(Relation relation);
extern List *RelationGetIndexPredicate(Relation relation);
-extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs);
+
+typedef enum IndexAttrBitmapKind {
+ INDEX_ATTR_BITMAP_ALL,
+ INDEX_ATTR_BITMAP_KEY,
+ INDEX_ATTR_BITMAP_CANDIDATE_KEY
+} IndexAttrBitmapKind;
+
+extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
+ IndexAttrBitmapKind keyAttrs);
+
extern void RelationGetExclusionInfo(Relation indexRelation,
Oid **operators,
Oid **procs,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 81a286c..2187f58 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -23,6 +23,7 @@ extern bool FirstSnapshotSet;
extern TransactionId TransactionXmin;
extern TransactionId RecentXmin;
extern TransactionId RecentGlobalXmin;
+extern TransactionId RecentGlobalDataXmin;
extern Snapshot GetTransactionSnapshot(void);
extern Snapshot GetLatestSnapshot(void);
@@ -53,4 +54,6 @@ extern bool XactHasExportedSnapshots(void);
extern void DeleteAllExportedSnapshotFiles(void);
extern bool ThereAreNoPriorRegisteredSnapshots(void);
+extern char *ExportSnapshot(Snapshot snapshot);
+
#endif /* SNAPMGR_H */
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 19f56e4..873f170 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -22,6 +22,7 @@
extern PGDLLIMPORT SnapshotData SnapshotSelfData;
extern PGDLLIMPORT SnapshotData SnapshotAnyData;
extern PGDLLIMPORT SnapshotData SnapshotToastData;
+extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
#define SnapshotSelf (&SnapshotSelfData)
#define SnapshotAny (&SnapshotAnyData)
@@ -37,7 +38,8 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
/* This macro encodes the knowledge of which snapshots are MVCC-safe */
#define IsMVCCSnapshot(snapshot) \
- ((snapshot)->satisfies == HeapTupleSatisfiesMVCC)
+ ((snapshot)->satisfies == HeapTupleSatisfiesMVCC || \
+ (snapshot)->satisfies == HeapTupleSatisfiesMVCCDuringDecoding)
/*
* HeapTupleSatisfiesVisibility
@@ -86,4 +88,20 @@ extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid);
extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
+/* Support for catalog timetravel */
+extern bool HeapTupleSatisfiesMVCCDuringDecoding(HeapTuple htup,
+ Snapshot snapshot, Buffer buffer);
+extern void SetupDecodingSnapshots(Snapshot snapshot_now, HTAB *tuplecids);
+extern void RevertFromDecodingSnapshots(void);
+extern void SuspendDecodingSnapshots(void);
+extern void UnSuspendDecodingSnapshots(void);
+
+/*
+ * To avoid leaking to much knowledge about reorderbuffer implementation
+ * details this is implemented in reorderbuffer.c not tqual.c.
+ */
+extern bool ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data, HeapTuple htup,
+ Buffer buffer,
+ CommandId *cmin, CommandId *cmax);
+
#endif /* TQUAL_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 8f24c51..d49e499 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1679,6 +1679,13 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, +
| pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock +
| FROM pg_database d;
+ pg_stat_logical_decoding | SELECT l.slot_name, +
+ | l.plugin, +
+ | l.database, +
+ | l.active, +
+ | l.xmin, +
+ | l.restart_decoding_lsn +
+ | FROM pg_stat_get_logical_decoding_slots() l(slot_name, plugin, database, active, xmin, restart_decoding_lsn);
pg_stat_replication | SELECT s.pid, +
| s.usesysid, +
| u.rolname AS usename, +
@@ -2142,7 +2149,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| FROM tv;
tvvmv | SELECT tvvm.grandtot +
| FROM tvvm;
-(64 rows)
+(65 rows)
SELECT tablename, rulename, definition FROM pg_rules
ORDER BY tablename, rulename;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b20eb0d..648caa0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -621,6 +621,7 @@ Form_pg_ts_template
Form_pg_type
Form_pg_user_mapping
FormatNode
+FreeLogicalReplicationCmd
FromCharDateMode
FromExpr
FuncCall
@@ -791,6 +792,7 @@ IdentifySystemCmd
IncrementVarSublevelsUp_context
Index
IndexArrayKeyInfo
+IndexAttrBitmapKind
IndexBuildCallback
IndexBuildResult
IndexBulkDeleteCallback
@@ -818,6 +820,7 @@ IndxInfo
InfoItem
InhInfo
InhOption
+InitLogicalReplicationCmd
InheritableSocket
InlineCodeBlock
InsertStmt
@@ -937,6 +940,17 @@ LockTupleMode
LockingClause
LogOpts
LogStmtLevel
+LogicalDecodeBeginCB
+LogicalDecodeChangeCB
+LogicalDecodeCleanupCB
+LogicalDecodeCommitCB
+LogicalDecodeInitCB
+LogicalDecodingCheckpointData
+LogicalDecodingContext
+LogicalDecodingCtlData
+LogicalDecodingSlot
+LogicalOutputPluginWriterPrepareWrite
+LogicalOutputPluginWriterWrite
LogicalTape
LogicalTapeSet
MAGIC
@@ -1050,6 +1064,7 @@ OprInfo
OprProofCacheEntry
OprProofCacheKey
OutputContext
+OutputPluginCallbacks
OverrideSearchPath
OverrideStackEntry
PACE_HEADER
@@ -1464,6 +1479,21 @@ Relids
RelocationBufferInfo
RenameStmt
ReopenPtr
+ReorderBuffer
+ReorderBufferApplyChangeCB
+ReorderBufferBeginCB
+ReorderBufferChange
+ReorderBufferChangeTypeInternal
+ReorderBufferCommitCB
+ReorderBufferDiskChange
+ReorderBufferIterTXNEntry
+ReorderBufferIterTXNState
+ReorderBufferToastEnt
+ReorderBufferTupleBuf
+ReorderBufferTupleCidEnt
+ReorderBufferTupleCidKey
+ReorderBufferTXN
+ReorderBufferTXNByIdEnt
ReplaceVarsFromTargetList_context
ReplaceVarsNoMatchOption
ResTarget
@@ -1518,6 +1548,8 @@ SID_NAME_USE
SISeg
SMgrRelation
SMgrRelationData
+SnapBuildAction
+SnapBuildState
SOCKADDR
SOCKET
SPELL
@@ -1609,6 +1641,8 @@ SlruSharedData
Snapshot
SnapshotData
SnapshotSatisfiesFunc
+Snapstate
+SnapstateOnDisk
SockAddr
Sort
SortBy
@@ -1651,6 +1685,7 @@ StandardChunkHeader
StartBlobPtr
StartBlobsPtr
StartDataPtr
+StartLogicalReplicationCmd
StartReplicationCmd
StartupPacket
StatEntry
@@ -1874,6 +1909,7 @@ WalRcvData
WalRcvState
WalSnd
WalSndCtlData
+WalSndSendData
WalSndState
WholeRowVarExprState
WindowAgg
@@ -1925,6 +1961,7 @@ XLogReaderState
XLogRecData
XLogRecPtr
XLogRecord
+XLogRecordBuffer
XLogSegNo
XLogSource
XLogwrtResult
@@ -2347,6 +2384,7 @@ symbol
tablespaceinfo
teReqs
teSection
+TestDecodingData
temp_tablespaces_extra
text
timeKEY
@@ -2419,11 +2457,13 @@ xl_heap_cleanup_info
xl_heap_delete
xl_heap_freeze
xl_heap_header
+xl_heap_header_len
xl_heap_inplace
xl_heap_insert
xl_heap_lock
xl_heap_lock_updated
xl_heap_multi_insert
+xl_heap_new_cid
xl_heap_newpage
xl_heap_update
xl_heap_visible
--
1.8.4.21.g992c386.dirty
0005-wal_decoding-test_decoding-Add-a-simple-decoding-mod.patchtext/x-patch; charset=us-asciiDownload
>From 09e8610644e76c7b6391df3981fa2431064c9744 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 5/8] wal_decoding: test_decoding: Add a simple decoding module
in contrib
This is mostly useful for testing, demonstration and documentation purposes.
---
contrib/Makefile | 1 +
contrib/test_decoding/Makefile | 16 ++
contrib/test_decoding/test_decoding.c | 322 ++++++++++++++++++++++++++++++++++
3 files changed, 339 insertions(+)
create mode 100644 contrib/test_decoding/Makefile
create mode 100644 contrib/test_decoding/test_decoding.c
diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..6d2fe32 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -50,6 +50,7 @@ SUBDIRS = \
tablefunc \
tcn \
test_parser \
+ test_decoding \
tsearch2 \
unaccent \
vacuumlo \
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
new file mode 100644
index 0000000..2ac9653
--- /dev/null
+++ b/contrib/test_decoding/Makefile
@@ -0,0 +1,16 @@
+# contrib/test_decoding/Makefile
+
+MODULE_big = test_decoding
+OBJS = test_decoding.o
+
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_decoding
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
new file mode 100644
index 0000000..fb9a240
--- /dev/null
+++ b/contrib/test_decoding/test_decoding.c
@@ -0,0 +1,322 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_decoding.c
+ * example output plugin for the logical replication functionality
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/test_decoding/test_decoding.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/sysattr.h"
+
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "catalog/index.h"
+
+#include "nodes/parsenodes.h"
+
+#include "replication/output_plugin.h"
+#include "replication/logical.h"
+
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relcache.h"
+#include "utils/syscache.h"
+#include "utils/typcache.h"
+
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+
+typedef struct
+{
+ MemoryContext context;
+ bool include_xids;
+} TestDecodingData;
+
+/* These must be available to pg_dlsym() */
+extern void pg_decode_init(LogicalDecodingContext *ctx, bool is_init);
+extern void pg_decode_begin_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn);
+extern void pg_decode_commit_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+extern void pg_decode_change(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, Relation rel,
+ ReorderBufferChange *change);
+
+void
+_PG_init(void)
+{
+}
+
+/* initialize this plugin */
+void
+pg_decode_init(LogicalDecodingContext *ctx, bool is_init)
+{
+ ListCell *option;
+ TestDecodingData *data;
+
+ AssertVariableIsOfType(&pg_decode_init, LogicalDecodeInitCB);
+
+ data = palloc(sizeof(TestDecodingData));
+ data->context = AllocSetContextCreate(TopMemoryContext,
+ "text conversion context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ data->include_xids = true;
+
+ ctx->output_plugin_private = data;
+
+ foreach(option, ctx->output_plugin_options)
+ {
+ DefElem *elem = lfirst(option);
+
+ Assert(elem->arg == NULL || IsA(elem->arg, String));
+
+ if (strcmp(elem->defname, "hide-xids") == 0)
+ {
+ /* FIXME: parse argument */
+ data->include_xids = false;
+ }
+ else
+ {
+ elog(WARNING, "option %s = %s is unknown",
+ elem->defname, elem->arg ? strVal(elem->arg) : "(null)");
+ }
+ }
+}
+
+/* BEGIN callback */
+void
+pg_decode_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ AssertVariableIsOfType(&pg_decode_begin_txn, LogicalDecodeBeginCB);
+
+ ctx->prepare_write(ctx, txn->end_lsn, txn->xid);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "BEGIN %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "BEGIN");
+ ctx->write(ctx, txn->end_lsn, txn->xid);
+}
+
+/* COMMIT callback */
+void
+pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ AssertVariableIsOfType(&pg_decode_commit_txn, LogicalDecodeCommitCB);
+
+ ctx->prepare_write(ctx, txn->end_lsn, txn->xid);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "COMMIT %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "COMMIT");
+ ctx->write(ctx, txn->end_lsn, txn->xid);
+}
+
+/* print the tuple 'tuple' into the StringInfo s */
+static void
+tuple_to_stringinfo(StringInfo s, TupleDesc tupdesc, HeapTuple tuple)
+{
+ int natt;
+ Oid oid;
+
+ /* print oid of tuple, it's not included in the TupleDesc */
+ if ((oid = HeapTupleHeaderGetOid(tuple->t_data)) != InvalidOid)
+ {
+ appendStringInfo(s, " oid[oid]:%u", oid);
+ }
+
+ /* print all columns individually */
+ for (natt = 0; natt < tupdesc->natts; natt++)
+ {
+ Form_pg_attribute attr; /* the attribute itself */
+ Oid typid; /* type of current attribute */
+ HeapTuple type_tuple; /* information about a type */
+ Form_pg_type type_form;
+ Oid typoutput; /* output function */
+ bool typisvarlena;
+ Datum origval; /* possibly toasted Datum */
+ Datum val; /* definitely detoasted Datum */
+ char *outputstr = NULL;
+ bool isnull; /* column is null? */
+
+ attr = tupdesc->attrs[natt];
+
+ /*
+ * don't print dropped columns, we can't be sure everything is
+ * available for them
+ */
+ if (attr->attisdropped)
+ continue;
+
+ /*
+ * Don't print system columns, oid will already have been printed if
+ * present.
+ */
+ if (attr->attnum < 0)
+ continue;
+
+ typid = attr->atttypid;
+
+ /* gather type name */
+ type_tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
+ if (!HeapTupleIsValid(type_tuple))
+ elog(ERROR, "cache lookup failed for type %u", typid);
+ type_form = (Form_pg_type) GETSTRUCT(type_tuple);
+
+ /* print attribute name */
+ appendStringInfoChar(s, ' ');
+ appendStringInfoString(s, NameStr(attr->attname));
+
+ /* print attribute type */
+ appendStringInfoChar(s, '[');
+ appendStringInfoString(s, NameStr(type_form->typname));
+ appendStringInfoChar(s, ']');
+
+ /* query output function */
+ getTypeOutputInfo(typid,
+ &typoutput, &typisvarlena);
+
+ ReleaseSysCache(type_tuple);
+
+ /* get Datum from tuple */
+ origval = fastgetattr(tuple, natt + 1, tupdesc, &isnull);
+
+ if (isnull)
+ outputstr = "(null)";
+ else if (typisvarlena && VARATT_IS_EXTERNAL_ONDISK(origval))
+ outputstr = "(unchanged-toast-datum)";
+ else if (typisvarlena)
+ val = PointerGetDatum(PG_DETOAST_DATUM(origval));
+ else
+ val = origval;
+
+ /* call output function if necessary */
+ if (outputstr == NULL)
+ outputstr = OidOutputFunctionCall(typoutput, val);
+
+ /* print data */
+ appendStringInfoChar(s, ':');
+ appendStringInfoString(s, outputstr);
+ }
+}
+
+/*
+ * callback for individual changed tuples
+ */
+void
+pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ TestDecodingData *data;
+ Form_pg_class class_form;
+ TupleDesc tupdesc;
+ MemoryContext old;
+
+ AssertVariableIsOfType(&pg_decode_change, LogicalDecodeChangeCB);
+
+ data = ctx->output_plugin_private;
+ class_form = RelationGetForm(relation);
+ tupdesc = RelationGetDescr(relation);
+
+ /* Avoid leaking memory by using and resetting our own context */
+ old = MemoryContextSwitchTo(data->context);
+
+ ctx->prepare_write(ctx, change->lsn, txn->xid);
+
+ appendStringInfoString(ctx->out, "table \"");
+ appendStringInfoString(ctx->out, NameStr(class_form->relname));
+ appendStringInfoString(ctx->out, "\":");
+
+ switch (change->action)
+ {
+ case REORDER_BUFFER_CHANGE_INSERT:
+ appendStringInfoString(ctx->out, " INSERT:");
+ if (change->newtuple == NULL)
+ appendStringInfoString(ctx->out, " (no-tuple-data)");
+ else
+ tuple_to_stringinfo(ctx->out, tupdesc, &change->newtuple->tuple);
+ break;
+ case REORDER_BUFFER_CHANGE_UPDATE:
+ appendStringInfoString(ctx->out, " UPDATE:");
+ if (change->oldtuple != NULL)
+ {
+ Relation indexrel;
+ TupleDesc indexdesc;
+
+ appendStringInfoString(ctx->out, " old-pkey:");
+ RelationGetIndexList(relation);
+
+ if (!OidIsValid(relation->rd_primary))
+ {
+ elog(LOG, "tuple in table with oid: %u without primary key",
+ RelationGetRelid(relation));
+ break;
+ }
+
+ indexrel = RelationIdGetRelation(relation->rd_primary);
+
+ indexdesc = RelationGetDescr(indexrel);
+
+ tuple_to_stringinfo(ctx->out, indexdesc, &change->oldtuple->tuple);
+
+ RelationClose(indexrel);
+ appendStringInfoString(ctx->out, " new-tuple:");
+ }
+
+ if (change->newtuple == NULL)
+ appendStringInfoString(ctx->out, " (no-tuple-data)");
+ else
+ tuple_to_stringinfo(ctx->out, tupdesc, &change->newtuple->tuple);
+
+ break;
+ case REORDER_BUFFER_CHANGE_DELETE:
+ appendStringInfoString(ctx->out, " DELETE:");
+
+ /* if there was no PK, we only know that a delete happened */
+ if (change->oldtuple == NULL)
+ appendStringInfoString(ctx->out, " (no-tuple-data)");
+ /* In DELETE, only the PK is present; display that */
+ else
+ {
+ Relation indexrel;
+
+ /* make sure rd_primary is set */
+ RelationGetIndexList(relation);
+
+ if (!OidIsValid(relation->rd_primary))
+ {
+ elog(LOG, "tuple in table with oid: %u without primary key",
+ RelationGetRelid(relation));
+ break;
+ }
+
+ indexrel = RelationIdGetRelation(relation->rd_primary);
+
+ tuple_to_stringinfo(ctx->out, RelationGetDescr(indexrel),
+ &change->oldtuple->tuple);
+
+ RelationClose(indexrel);
+ }
+ break;
+ }
+
+ MemoryContextSwitchTo(old);
+ MemoryContextReset(data->context);
+
+ ctx->write(ctx, change->lsn, txn->xid);
+}
--
1.8.4.21.g992c386.dirty
0006-wal_decoding-pg_receivellog-Introduce-pg_receivexlog.patchtext/x-patch; charset=us-asciiDownload
>From d4d15827ea16281927ed9727fc4a5f472569995b Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 6/8] wal_decoding: pg_receivellog: Introduce pg_receivexlog
equivalent for logical changes
---
src/backend/utils/cache/relcache.c | 3 +
src/bin/pg_basebackup/.gitignore | 1 +
src/bin/pg_basebackup/Makefile | 11 +-
src/bin/pg_basebackup/pg_receivellog.c | 860 +++++++++++++++++++++++++++++++++
src/bin/pg_basebackup/receivelog.c | 137 +-----
src/bin/pg_basebackup/receivelog.h | 2 +
src/bin/pg_basebackup/streamutil.c | 123 ++++-
src/bin/pg_basebackup/streamutil.h | 10 +
8 files changed, 1023 insertions(+), 124 deletions(-)
create mode 100644 src/bin/pg_basebackup/pg_receivellog.c
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 5d304ce..1b66e64 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1577,6 +1577,9 @@ RelationIdGetRelation(Oid relationId)
{
Relation rd;
+ /* Make sure we're in a xact, even if this ends up being a cache hit */
+ Assert(IsTransactionState());
+
/*
* first try to find reldesc in the cache
*/
diff --git a/src/bin/pg_basebackup/.gitignore b/src/bin/pg_basebackup/.gitignore
index 1334a1f..eb2978c 100644
--- a/src/bin/pg_basebackup/.gitignore
+++ b/src/bin/pg_basebackup/.gitignore
@@ -1,2 +1,3 @@
/pg_basebackup
/pg_receivexlog
+/pg_receivellog
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a707c93..c251249 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -20,7 +20,7 @@ override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
OBJS=receivelog.o streamutil.o $(WIN32RES)
-all: pg_basebackup pg_receivexlog
+all: pg_basebackup pg_receivexlog pg_receivellog
pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport
$(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -28,9 +28,13 @@ pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport
pg_receivexlog: pg_receivexlog.o $(OBJS) | submake-libpq submake-libpgport
$(CC) $(CFLAGS) pg_receivexlog.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_receivellog: pg_receivellog.o $(OBJS) | submake-libpq submake-libpgport
+ $(CC) $(CFLAGS) pg_receivellog.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
install: all installdirs
$(INSTALL_PROGRAM) pg_basebackup$(X) '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
$(INSTALL_PROGRAM) pg_receivexlog$(X) '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
+ $(INSTALL_PROGRAM) pg_receivellog$(X) '$(DESTDIR)$(bindir)/pg_receivellog$(X)'
installdirs:
$(MKDIR_P) '$(DESTDIR)$(bindir)'
@@ -38,6 +42,9 @@ installdirs:
uninstall:
rm -f '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
rm -f '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
+ rm -f '$(DESTDIR)$(bindir)/pg_receivellog$(X)'
clean distclean maintainer-clean:
- rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o
+ rm -f pg_basebackup$(X) pg_receivexlog$(X) pg_receivellog$(X) \
+ pg_basebackup.o pg_receivexlog.o pg_receivellog.o \
+ $(OBJS)
diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c
new file mode 100644
index 0000000..fc81608
--- /dev/null
+++ b/src/bin/pg_basebackup/pg_receivellog.c
@@ -0,0 +1,860 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_receivellog.c - receive streaming logical log data and write it
+ * to a local file.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/pg_receivellog.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "streamutil.h"
+
+#include "getopt_long.h"
+
+#include "libpq-fe.h"
+#include "libpq/pqsignal.h"
+
+#include "access/xlog_internal.h"
+#include "common/fe_memutils.h"
+
+#include <dirent.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+/* Time to sleep between reconnection attempts */
+#define RECONNECT_SLEEP_TIME 5
+
+/* Global Options */
+static char *outfile = NULL;
+static int verbose = 0;
+static int noloop = 0;
+static int standby_message_timeout = 10 * 1000; /* 10 sec = default */
+static const char *slot = NULL;
+static XLogRecPtr startpos = InvalidXLogRecPtr;
+static bool do_init_slot = false;
+static bool do_start_slot = false;
+static bool do_stop_slot = false;
+
+/* filled pairwise with option, value. value may be NULL */
+static char **options;
+static size_t noptions = 0;
+static const char *plugin = "test_decoding";
+
+/* Global State */
+static int outfd = -1;
+static volatile bool time_to_abort = false;
+
+static void usage(void);
+static void StreamLog();
+
+static void
+usage(void)
+{
+ printf(_("%s receives PostgreSQL logical change stream.\n\n"),
+ progname);
+ printf(_("Usage:\n"));
+ printf(_(" %s [OPTION]...\n"), progname);
+ printf(_("\nOptions:\n"));
+ printf(_(" -f, --file=FILE receive log into this file. - for stdout\n"));
+ printf(_(" -n, --no-loop do not loop on connection lost\n"));
+ printf(_(" -v, --verbose output verbose messages\n"));
+ printf(_(" -V, --version output version information, then exit\n"));
+ printf(_(" -?, --help show this help, then exit\n"));
+ printf(_("\nConnection options:\n"));
+ printf(_(" -d, --database=DBNAME database to connect to\n"));
+ printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
+ printf(_(" -p, --port=PORT database server port number\n"));
+ printf(_(" -U, --username=NAME connect as specified database user\n"));
+ printf(_(" -w, --no-password never prompt for password\n"));
+ printf(_(" -W, --password force password prompt (should happen automatically)\n"));
+ printf(_("\nReplication options:\n"));
+ printf(_(" -o, --option=NAME[=VALUE]\n"
+ " Specify option NAME with optional value VAL, to be passed\n"
+ " to the output plugin\n"));
+ printf(_(" -P, --plugin=PLUGIN use output plugin PLUGIN (defaults to test_decoding)\n"));
+ printf(_(" -s, --status-interval=INTERVAL\n"
+ " time between status packets sent to server (in seconds)\n"));
+ printf(_(" -S, --slot=SLOT use existing replication slot SLOT instead of starting a new one\n"));
+ printf(_(" -I, --startpos=PTR Where in an existing slot should the streaming start"));
+ printf(_("\nAction to be performed:\n"));
+ printf(_(" --init initiate a new replication slot (for the slotname see --slot)\n"));
+ printf(_(" --start start streaming in a replication slot (for the slotname see --slot)\n"));
+ printf(_(" --stop stop the replication slot (for the slotname see --slot)\n"));
+ printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+/*
+ * Send a Standby Status Update message to server.
+ */
+static bool
+sendFeedback(PGconn *conn, XLogRecPtr blockpos, int64 now, bool force, bool replyRequested)
+{
+ char replybuf[1 + 8 + 8 + 8 + 8 + 1];
+ int len = 0;
+
+ /*
+ * we normally don't want to send superflous feedbacks, but if
+ * it's because of a timeout we need to, otherwise
+ * replication_timeout will kill us.
+ */
+ if (blockpos == startpos && !force)
+ return true;
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: confirming flush up to %X/%X (slot %s)\n"),
+ progname, (uint32) (blockpos >> 32), (uint32) blockpos,
+ slot);
+
+ replybuf[len] = 'r';
+ len += 1;
+ fe_sendint64(blockpos, &replybuf[len]); /* write */
+ len += 8;
+ fe_sendint64(blockpos, &replybuf[len]); /* flush */
+ len += 8;
+ fe_sendint64(InvalidXLogRecPtr, &replybuf[len]); /* apply */
+ len += 8;
+ fe_sendint64(now, &replybuf[len]); /* sendTime */
+ len += 8;
+ replybuf[len] = replyRequested ? 1 : 0; /* replyRequested */
+ len += 1;
+
+ startpos = blockpos;
+
+ if (PQputCopyData(conn, replybuf, len) <= 0 || PQflush(conn))
+ {
+ fprintf(stderr, _("%s: could not send feedback packet: %s"),
+ progname, PQerrorMessage(conn));
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Start the log streaming
+ */
+static void
+StreamLog(void)
+{
+ PGresult *res;
+ char query[512];
+ char *copybuf = NULL;
+ int64 last_status = -1;
+ XLogRecPtr logoff = InvalidXLogRecPtr;
+ int written;
+ int i;
+
+ /*
+ * Connect in replication mode to the server
+ */
+ if (!conn)
+ conn = GetConnection();
+ if (!conn)
+ /* Error message already written in GetConnection() */
+ return;
+
+ /*
+ * Start the replication
+ */
+ if (verbose)
+ fprintf(stderr,
+ _("%s: starting log streaming at %X/%X (slot %s)\n"),
+ progname, (uint32) (startpos >> 32), (uint32) startpos,
+ slot);
+
+ /* Initiate the replication stream at specified location */
+ written = snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION \"%s\" %X/%X",
+ slot, (uint32) (startpos >> 32), (uint32) startpos);
+
+ /*
+ * add options to string, if present
+ * Oh, if we just had stringinfo in src/common...
+ */
+ if (noptions)
+ written += snprintf(query + written, sizeof(query) - written, " (");
+
+ for (i = 0; i < noptions; i++)
+ {
+ /* separator */
+ if (i > 0)
+ written += snprintf(query + written, sizeof(query) - written, ", ");
+
+ /* write option name */
+ written += snprintf(query + written, sizeof(query) - written, "\"%s\"",
+ options[(i * 2)]);
+
+ if (written >= sizeof(query) - 1)
+ {
+ fprintf(stderr, _("%s: option string too long\n"), progname);
+ exit(1); /* no point in retrying, fatal error */
+ }
+
+ /* write option name if specified */
+ if (options[(i * 2) + 1] != NULL)
+ {
+ written += snprintf(query + written, sizeof(query) - written, " '%s'",
+ options[(i * 2) + 1]);
+
+ if (written >= sizeof(query) - 1)
+ {
+ fprintf(stderr, _("%s: option string too long\n"), progname);
+ exit(1); /* no point in retrying, fatal error */
+ }
+ }
+ }
+
+ if (noptions)
+ {
+ written += snprintf(query + written, sizeof(query) - written, ")");
+ if (written >= sizeof(query) - 1)
+ {
+ fprintf(stderr, _("%s: option string too long\n"), progname);
+ exit(1); /* no point in retrying, fatal error */
+ }
+ }
+
+ res = PQexec(conn, query);
+ if (PQresultStatus(res) != PGRES_COPY_BOTH)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s\n"),
+ progname, query, PQresultErrorMessage(res));
+ PQclear(res);
+ goto error;
+ }
+ PQclear(res);
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: initiated streaming\n"),
+ progname);
+
+ while (!time_to_abort)
+ {
+ int r;
+ int bytes_left;
+ int bytes_written;
+ int64 now;
+ int hdr_len;
+
+ if (copybuf != NULL)
+ {
+ PQfreemem(copybuf);
+ copybuf = NULL;
+ }
+
+ /*
+ * Potentially send a status message to the master
+ */
+ now = feGetCurrentTimestamp();
+ if (standby_message_timeout > 0 &&
+ feTimestampDifferenceExceeds(last_status, now,
+ standby_message_timeout))
+ {
+ /* Time to send feedback! */
+ if (!sendFeedback(conn, logoff, now, true, false))
+ goto error;
+
+ last_status = now;
+ }
+
+ r = PQgetCopyData(conn, ©buf, 1);
+ if (r == 0)
+ {
+ /*
+ * In async mode, and no data available. We block on reading but
+ * not more than the specified timeout, so that we can send a
+ * response back to the client.
+ */
+ fd_set input_mask;
+ struct timeval timeout;
+ struct timeval *timeoutptr;
+
+ FD_ZERO(&input_mask);
+ FD_SET(PQsocket(conn), &input_mask);
+ if (standby_message_timeout)
+ {
+ int64 targettime;
+ long secs;
+ int usecs;
+
+ targettime = last_status + (standby_message_timeout - 1) *
+ ((int64) 1000);
+ feTimestampDifference(now,
+ targettime,
+ &secs,
+ &usecs);
+ if (secs <= 0)
+ timeout.tv_sec = 1; /* Always sleep at least 1 sec */
+ else
+ timeout.tv_sec = secs;
+ timeout.tv_usec = usecs;
+ timeoutptr = &timeout;
+ }
+ else
+ timeoutptr = NULL;
+
+ r = select(PQsocket(conn) + 1, &input_mask, NULL, NULL, timeoutptr);
+ if (r == 0 || (r < 0 && errno == EINTR))
+ {
+ /*
+ * Got a timeout or signal. Continue the loop and either
+ * deliver a status packet to the server or just go back into
+ * blocking.
+ */
+ continue;
+ }
+ else if (r < 0)
+ {
+ fprintf(stderr, _("%s: select() failed: %s\n"),
+ progname, strerror(errno));
+ goto error;
+ }
+ /* Else there is actually data on the socket */
+ if (PQconsumeInput(conn) == 0)
+ {
+ fprintf(stderr,
+ _("%s: could not receive data from WAL stream: %s"),
+ progname, PQerrorMessage(conn));
+ goto error;
+ }
+ continue;
+ }
+ if (r == -1)
+ /* End of copy stream */
+ break;
+ if (r == -2)
+ {
+ fprintf(stderr, _("%s: could not read COPY data: %s"),
+ progname, PQerrorMessage(conn));
+ goto error;
+ }
+
+ /* Check the message type. */
+ if (copybuf[0] == 'k')
+ {
+ int pos;
+ bool replyRequested;
+
+ /*
+ * Parse the keepalive message, enclosed in the CopyData message.
+ * We just check if the server requested a reply, and ignore the
+ * rest.
+ */
+ pos = 1; /* skip msgtype 'k' */
+ pos += 8; /* skip walEnd */
+ pos += 8; /* skip sendTime */
+
+ if (r < pos + 1)
+ {
+ fprintf(stderr, _("%s: streaming header too small: %d\n"),
+ progname, r);
+ goto error;
+ }
+ replyRequested = copybuf[pos];
+
+ /* If the server requested an immediate reply, send one. */
+ if (replyRequested)
+ {
+ now = feGetCurrentTimestamp();
+ if (!sendFeedback(conn, logoff, now, false, false))
+ goto error;
+ last_status = now;
+ }
+ continue;
+ }
+ else if (copybuf[0] != 'w')
+ {
+ fprintf(stderr, _("%s: unrecognized streaming header: \"%c\"\n"),
+ progname, copybuf[0]);
+ goto error;
+ }
+
+
+ /*
+ * Read the header of the XLogData message, enclosed in the CopyData
+ * message. We only need the WAL location field (dataStart), the rest
+ * of the header is ignored.
+ */
+ hdr_len = 1; /* msgtype 'w' */
+ hdr_len += 8; /* dataStart */
+ hdr_len += 8; /* walEnd */
+ hdr_len += 8; /* sendTime */
+ if (r < hdr_len + 1)
+ {
+ fprintf(stderr, _("%s: streaming header too small: %d\n"),
+ progname, r);
+ goto error;
+ }
+
+ /* Extract WAL location for this block */
+ {
+ XLogRecPtr temp = fe_recvint64(©buf[1]);
+
+ logoff = Max(temp, logoff);
+ }
+
+ if (outfd == -1 && strcmp(outfile, "-") == 0)
+ {
+ outfd = fileno(stdout);
+ }
+ else if (outfd == -1)
+ {
+ outfd = open(outfile, O_CREAT | O_APPEND | O_WRONLY | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+ if (outfd == -1)
+ {
+ fprintf(stderr,
+ _("%s: could not open log file \"%s\": %s\n"),
+ progname, outfile, strerror(errno));
+ goto error;
+ }
+ }
+
+ bytes_left = r - hdr_len;
+ bytes_written = 0;
+
+
+ while (bytes_left)
+ {
+ int ret;
+
+ ret = write(outfd,
+ copybuf + hdr_len + bytes_written,
+ bytes_left);
+
+ if (ret < 0)
+ {
+ fprintf(stderr,
+ _("%s: could not write %u bytes to log file \"%s\": %s\n"),
+ progname, bytes_left, outfile,
+ strerror(errno));
+ goto error;
+ }
+
+ /* Write was successful, advance our position */
+ bytes_written += ret;
+ bytes_left -= ret;
+ }
+
+ if (write(outfd, "\n", 1) != 1)
+ {
+ fprintf(stderr,
+ _("%s: could not write %u bytes to log file \"%s\": %s\n"),
+ progname, 1, outfile,
+ strerror(errno));
+ goto error;
+ }
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ fprintf(stderr,
+ _("%s: unexpected termination of replication stream: %s"),
+ progname, PQresultErrorMessage(res));
+ goto error;
+ }
+ PQclear(res);
+
+ if (copybuf != NULL)
+ PQfreemem(copybuf);
+
+ if (outfd != -1 && close(outfd) != 0)
+ fprintf(stderr, _("%s: could not close file \"%s\": %s\n"),
+ progname, outfile, strerror(errno));
+ outfd = -1;
+error:
+ PQfinish(conn);
+ conn = NULL;
+}
+
+/*
+ * When sigint is called, just tell the system to exit at the next possible
+ * moment.
+ */
+#ifndef WIN32
+
+static void
+sigint_handler(int signum)
+{
+ time_to_abort = true;
+}
+#endif
+
+int
+main(int argc, char **argv)
+{
+ PGresult *res;
+ static struct option long_options[] = {
+/* general options */
+ {"file", required_argument, NULL, 'f'},
+ {"no-loop", no_argument, NULL, 'n'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"version", no_argument, NULL, 'V'},
+ {"help", no_argument, NULL, '?'},
+/* connnection options */
+ {"database", required_argument, NULL, 'd'},
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+/* replication options */
+ {"option", required_argument, NULL, 'o'},
+ {"plugin", required_argument, NULL, 'P'},
+ {"status-interval", required_argument, NULL, 's'},
+ {"slot", required_argument, NULL, 'S'},
+ {"startpos", required_argument, NULL, 'I'},
+/* action */
+ {"init", no_argument, NULL, 1},
+ {"start", no_argument, NULL, 2},
+ {"stop", no_argument, NULL, 3},
+ {NULL, 0, NULL, 0}
+ };
+ int c;
+ int option_index;
+ uint32 hi,
+ lo;
+
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_receivellog"));
+
+ if (argc > 1)
+ {
+ if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+ {
+ usage();
+ exit(0);
+ }
+ else if (strcmp(argv[1], "-V") == 0 ||
+ strcmp(argv[1], "--version") == 0)
+ {
+ puts("pg_receivellog (PostgreSQL) " PG_VERSION);
+ exit(0);
+ }
+ }
+
+ while ((c = getopt_long(argc, argv, "f:nvd:h:o:p:U:wWP:s:S:",
+ long_options, &option_index)) != -1)
+ {
+ switch (c)
+ {
+/* general options */
+ case 'f':
+ outfile = pg_strdup(optarg);
+ break;
+ case 'n':
+ noloop = 1;
+ break;
+ case 'v':
+ verbose++;
+ break;
+/* connnection options */
+ case 'd':
+ dbname = pg_strdup(optarg);
+ break;
+ case 'h':
+ dbhost = pg_strdup(optarg);
+ break;
+ case 'p':
+ if (atoi(optarg) <= 0)
+ {
+ fprintf(stderr, _("%s: invalid port number \"%s\"\n"),
+ progname, optarg);
+ exit(1);
+ }
+ dbport = pg_strdup(optarg);
+ break;
+ case 'U':
+ dbuser = pg_strdup(optarg);
+ break;
+ case 'w':
+ dbgetpassword = -1;
+ break;
+ case 'W':
+ dbgetpassword = 1;
+ break;
+/* replication options */
+ case 'o':
+ {
+ char *data = pg_strdup(optarg);
+ char *val = strchr(data, '=');
+
+ if (val != NULL)
+ {
+ /* remove =; separate data from val */
+ *val = '\0';
+ val++;
+ }
+
+ noptions += 1;
+ options = pg_realloc(options, sizeof(char*) * noptions * 2);
+
+ options[(noptions - 1) * 2] = data;
+ options[(noptions - 1) * 2 + 1] = val;
+ }
+
+ break;
+ case 'P':
+ plugin = pg_strdup(optarg);
+ break;
+ case 's':
+ standby_message_timeout = atoi(optarg) * 1000;
+ if (standby_message_timeout < 0)
+ {
+ fprintf(stderr, _("%s: invalid status interval \"%s\"\n"),
+ progname, optarg);
+ exit(1);
+ }
+ break;
+ case 'S':
+ slot = pg_strdup(optarg);
+ break;
+ case 'I':
+ if (sscanf(optarg, "%X/%X", &hi, &lo) != 2)
+ {
+ fprintf(stderr,
+ _("%s: could not parse start position \"%s\"\n"),
+ progname, optarg);
+ exit(1);
+ }
+ startpos = ((uint64) hi) << 32 | lo;
+ break;
+/* action */
+ case 1:
+ do_init_slot = true;
+ break;
+ case 2:
+ do_start_slot = true;
+ break;
+ case 3:
+ do_stop_slot = true;
+ break;
+
+ default:
+
+ /*
+ * getopt_long already emitted a complaint
+ */
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ }
+
+ /*
+ * Any non-option arguments?
+ */
+ if (optind < argc)
+ {
+ fprintf(stderr,
+ _("%s: too many command-line arguments (first is \"%s\")\n"),
+ progname, argv[optind]);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Required arguments
+ */
+ if (slot == NULL)
+ {
+ fprintf(stderr, _("%s: no slot specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (!do_stop_slot && outfile == NULL)
+ {
+ fprintf(stderr, _("%s: no target file specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (!do_stop_slot && dbname == NULL)
+ {
+ fprintf(stderr, _("%s: no database specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (!do_stop_slot && !do_init_slot && !do_start_slot)
+ {
+ fprintf(stderr, _("%s: at least one action needs to be specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (do_stop_slot && (do_init_slot || do_start_slot))
+ {
+ fprintf(stderr, _("%s: --stop cannot be combined with --init or --start\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (startpos && (do_init_slot || do_stop_slot))
+ {
+ fprintf(stderr, _("%s: --startpos cannot be combined with --init or --stop\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+#ifndef WIN32
+ pqsignal(SIGINT, sigint_handler);
+#endif
+
+ /*
+ * don't really need this but it actually helps to get more precise error
+ * messages about authentication, required GUCs and such without starting
+ * to loop around connection attempts lateron.
+ */
+ {
+ conn = GetConnection();
+ if (!conn)
+ /* Error message already written in GetConnection() */
+ exit(1);
+
+ /*
+ * Run IDENTIFY_SYSTEM so we can get the timeline and current xlog
+ * position.
+ */
+ res = PQexec(conn, "IDENTIFY_SYSTEM");
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+ progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
+ disconnect_and_exit(1);
+ }
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
+ {
+ fprintf(stderr,
+ _("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
+ progname, PQntuples(res), PQnfields(res), 1, 4);
+ disconnect_and_exit(1);
+ }
+ PQclear(res);
+ }
+
+
+ /*
+ * stop a replication slot
+ */
+ if (do_stop_slot)
+ {
+ char query[256];
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: init replication slot \"%s\"\n"),
+ progname, slot);
+
+ snprintf(query, sizeof(query), "FREE_LOGICAL_REPLICATION \"%s\"",
+ slot);
+ res = PQexec(conn, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+ progname, query, PQerrorMessage(conn));
+ disconnect_and_exit(1);
+ }
+
+ if (PQntuples(res) != 0 || PQnfields(res) != 0)
+ {
+ fprintf(stderr,
+ _("%s: could not stop logical rep: got %d rows and %d fields, expected %d rows and %d fields\n"),
+ progname, PQntuples(res), PQnfields(res), 0, 0);
+ disconnect_and_exit(1);
+ }
+
+ PQclear(res);
+ disconnect_and_exit(0);
+ }
+
+ /*
+ * init a replication slot
+ */
+ if (do_init_slot)
+ {
+ char query[256];
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: init replication slot \"%s\"\n"),
+ progname, slot);
+
+ snprintf(query, sizeof(query), "INIT_LOGICAL_REPLICATION \"%s\" \"%s\"",
+ slot, plugin);
+
+ res = PQexec(conn, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+ progname, query, PQerrorMessage(conn));
+ disconnect_and_exit(1);
+ }
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
+ {
+ fprintf(stderr,
+ _("%s: could not init logical rep: got %d rows and %d fields, expected %d rows and %d fields\n"),
+ progname, PQntuples(res), PQnfields(res), 1, 4);
+ disconnect_and_exit(1);
+ }
+
+ if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+ {
+ fprintf(stderr,
+ _("%s: could not parse log location \"%s\"\n"),
+ progname, PQgetvalue(res, 0, 1));
+ disconnect_and_exit(1);
+ }
+ startpos = ((uint64) hi) << 32 | lo;
+
+ slot = strdup(PQgetvalue(res, 0, 0));
+ PQclear(res);
+ }
+
+
+ if (!do_start_slot)
+ disconnect_and_exit(0);
+
+ while (true)
+ {
+ StreamLog();
+ if (time_to_abort)
+ {
+ /*
+ * We've been Ctrl-C'ed. That's not an error, so exit without an
+ * errorcode.
+ */
+ disconnect_and_exit(0);
+ }
+ else if (noloop)
+ {
+ fprintf(stderr, _("%s: disconnected.\n"), progname);
+ exit(1);
+ }
+ else
+ {
+ fprintf(stderr,
+ /* translator: check source for value for %d */
+ _("%s: disconnected. Waiting %d seconds to try again.\n"),
+ progname, RECONNECT_SLEEP_TIME);
+ pg_usleep(RECONNECT_SLEEP_TIME * 1000000);
+ }
+ }
+}
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index 22a5340..f027e1e 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -11,21 +11,18 @@
* src/bin/pg_basebackup/receivelog.c
*-------------------------------------------------------------------------
*/
+
#include "postgres_fe.h"
-#include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/types.h>
-#include <unistd.h>
-/* for ntohl/htonl */
-#include <netinet/in.h>
-#include <arpa/inet.h>
+/* local includes */
+#include "receivelog.h"
+#include "streamutil.h"
#include "libpq-fe.h"
#include "access/xlog_internal.h"
-#include "receivelog.h"
-#include "streamutil.h"
+#include <sys/stat.h>
+#include <unistd.h>
/* fd and filename for currently open WAL file */
@@ -193,63 +190,6 @@ close_walfile(char *basedir, char *partial_suffix)
/*
- * Local version of GetCurrentTimestamp(), since we are not linked with
- * backend code. The protocol always uses integer timestamps, regardless of
- * server setting.
- */
-static int64
-localGetCurrentTimestamp(void)
-{
- int64 result;
- struct timeval tp;
-
- gettimeofday(&tp, NULL);
-
- result = (int64) tp.tv_sec -
- ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY);
-
- result = (result * USECS_PER_SEC) + tp.tv_usec;
-
- return result;
-}
-
-/*
- * Local version of TimestampDifference(), since we are not linked with
- * backend code.
- */
-static void
-localTimestampDifference(int64 start_time, int64 stop_time,
- long *secs, int *microsecs)
-{
- int64 diff = stop_time - start_time;
-
- if (diff <= 0)
- {
- *secs = 0;
- *microsecs = 0;
- }
- else
- {
- *secs = (long) (diff / USECS_PER_SEC);
- *microsecs = (int) (diff % USECS_PER_SEC);
- }
-}
-
-/*
- * Local version of TimestampDifferenceExceeds(), since we are not
- * linked with backend code.
- */
-static bool
-localTimestampDifferenceExceeds(int64 start_time,
- int64 stop_time,
- int msec)
-{
- int64 diff = stop_time - start_time;
-
- return (diff >= msec * INT64CONST(1000));
-}
-
-/*
* Check if a timeline history file exists.
*/
static bool
@@ -369,47 +309,6 @@ writeTimeLineHistoryFile(char *basedir, TimeLineID tli, char *filename, char *co
}
/*
- * Converts an int64 to network byte order.
- */
-static void
-sendint64(int64 i, char *buf)
-{
- uint32 n32;
-
- /* High order half first, since we're doing MSB-first */
- n32 = (uint32) (i >> 32);
- n32 = htonl(n32);
- memcpy(&buf[0], &n32, 4);
-
- /* Now the low order half */
- n32 = (uint32) i;
- n32 = htonl(n32);
- memcpy(&buf[4], &n32, 4);
-}
-
-/*
- * Converts an int64 from network byte order to native format.
- */
-static int64
-recvint64(char *buf)
-{
- int64 result;
- uint32 h32;
- uint32 l32;
-
- memcpy(&h32, buf, 4);
- memcpy(&l32, buf + 4, 4);
- h32 = ntohl(h32);
- l32 = ntohl(l32);
-
- result = h32;
- result <<= 32;
- result |= l32;
-
- return result;
-}
-
-/*
* Send a Standby Status Update message to server.
*/
static bool
@@ -420,13 +319,13 @@ sendFeedback(PGconn *conn, XLogRecPtr blockpos, int64 now, bool replyRequested)
replybuf[len] = 'r';
len += 1;
- sendint64(blockpos, &replybuf[len]); /* write */
+ fe_sendint64(blockpos, &replybuf[len]); /* write */
len += 8;
- sendint64(InvalidXLogRecPtr, &replybuf[len]); /* flush */
+ fe_sendint64(InvalidXLogRecPtr, &replybuf[len]); /* flush */
len += 8;
- sendint64(InvalidXLogRecPtr, &replybuf[len]); /* apply */
+ fe_sendint64(InvalidXLogRecPtr, &replybuf[len]); /* apply */
len += 8;
- sendint64(now, &replybuf[len]); /* sendTime */
+ fe_sendint64(now, &replybuf[len]); /* sendTime */
len += 8;
replybuf[len] = replyRequested ? 1 : 0; /* replyRequested */
len += 1;
@@ -828,9 +727,9 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
/*
* Potentially send a status message to the master
*/
- now = localGetCurrentTimestamp();
+ now = feGetCurrentTimestamp();
if (still_sending && standby_message_timeout > 0 &&
- localTimestampDifferenceExceeds(last_status, now,
+ feTimestampDifferenceExceeds(last_status, now,
standby_message_timeout))
{
/* Time to send feedback! */
@@ -859,10 +758,10 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
int usecs;
targettime = last_status + (standby_message_timeout - 1) * ((int64) 1000);
- localTimestampDifference(now,
- targettime,
- &secs,
- &usecs);
+ feTimestampDifference(now,
+ targettime,
+ &secs,
+ &usecs);
if (secs <= 0)
timeout.tv_sec = 1; /* Always sleep at least 1 sec */
else
@@ -966,7 +865,7 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
/* If the server requested an immediate reply, send one. */
if (replyRequested && still_sending)
{
- now = localGetCurrentTimestamp();
+ now = feGetCurrentTimestamp();
if (!sendFeedback(conn, blockpos, now, false))
goto error;
last_status = now;
@@ -996,7 +895,7 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
progname, r);
goto error;
}
- blockpos = recvint64(©buf[1]);
+ blockpos = fe_recvint64(©buf[1]);
/* Extract WAL location for this block */
xlogoff = blockpos % XLOG_SEG_SIZE;
diff --git a/src/bin/pg_basebackup/receivelog.h b/src/bin/pg_basebackup/receivelog.h
index 7c983cd..f4789a5 100644
--- a/src/bin/pg_basebackup/receivelog.h
+++ b/src/bin/pg_basebackup/receivelog.h
@@ -1,3 +1,5 @@
+#include "libpq-fe.h"
+
#include "access/xlogdefs.h"
/*
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index 1dfb80f..c8d436d 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -11,17 +11,35 @@
*-------------------------------------------------------------------------
*/
-#include "postgres_fe.h"
+/*
+ * We have to use postgres.h not postgres_fe.h here, because there's
+ * backend-only stuff in the datetime include files we need. But we need a
+ * frontend-ish environment otherwise. Hence this ugly hack.
+ */
+#define FRONTEND 1
+#include "postgres.h"
+
#include "streamutil.h"
+#include "common/fe_memutils.h"
+#include "utils/datetime.h"
+
#include <stdio.h>
#include <string.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+/* for ntohl/htonl */
+#include <netinet/in.h>
+#include <arpa/inet.h>
const char *progname;
char *connection_string = NULL;
char *dbhost = NULL;
char *dbuser = NULL;
char *dbport = NULL;
+char *dbname = NULL;
int dbgetpassword = 0; /* 0=auto, -1=never, 1=always */
static char *dbpassword = NULL;
PGconn *conn = NULL;
@@ -86,10 +104,10 @@ GetConnection(void)
}
keywords[i] = "dbname";
- values[i] = "replication";
+ values[i] = dbname == NULL ? "replication" : dbname;
i++;
keywords[i] = "replication";
- values[i] = "true";
+ values[i] = dbname == NULL ? "true" : "database";
i++;
keywords[i] = "fallback_application_name";
values[i] = progname;
@@ -210,3 +228,102 @@ GetConnection(void)
return tmpconn;
}
}
+
+
+/*
+ * Frontend version of GetCurrentTimestamp(), since we are not linked with
+ * backend code. The protocol always uses integer timestamps, regardless of
+ * server setting.
+ */
+int64
+feGetCurrentTimestamp(void)
+{
+ int64 result;
+ struct timeval tp;
+
+ gettimeofday(&tp, NULL);
+
+ result = (int64) tp.tv_sec -
+ ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY);
+
+ result = (result * USECS_PER_SEC) + tp.tv_usec;
+
+ return result;
+}
+
+/*
+ * Frontend version of TimestampDifference(), since we are not linked with
+ * backend code.
+ */
+void
+feTimestampDifference(int64 start_time, int64 stop_time,
+ long *secs, int *microsecs)
+{
+ int64 diff = stop_time - start_time;
+
+ if (diff <= 0)
+ {
+ *secs = 0;
+ *microsecs = 0;
+ }
+ else
+ {
+ *secs = (long) (diff / USECS_PER_SEC);
+ *microsecs = (int) (diff % USECS_PER_SEC);
+ }
+}
+
+/*
+ * Frontend version of TimestampDifferenceExceeds(), since we are not
+ * linked with backend code.
+ */
+bool
+feTimestampDifferenceExceeds(int64 start_time,
+ int64 stop_time,
+ int msec)
+{
+ int64 diff = stop_time - start_time;
+
+ return (diff >= msec * INT64CONST(1000));
+}
+
+/*
+ * Converts an int64 to network byte order.
+ */
+void
+fe_sendint64(int64 i, char *buf)
+{
+ uint32 n32;
+
+ /* High order half first, since we're doing MSB-first */
+ n32 = (uint32) (i >> 32);
+ n32 = htonl(n32);
+ memcpy(&buf[0], &n32, 4);
+
+ /* Now the low order half */
+ n32 = (uint32) i;
+ n32 = htonl(n32);
+ memcpy(&buf[4], &n32, 4);
+}
+
+/*
+ * Converts an int64 from network byte order to native format.
+ */
+int64
+fe_recvint64(char *buf)
+{
+ int64 result;
+ uint32 h32;
+ uint32 l32;
+
+ memcpy(&h32, buf, 4);
+ memcpy(&l32, buf + 4, 4);
+ h32 = ntohl(h32);
+ l32 = ntohl(l32);
+
+ result = h32;
+ result <<= 32;
+ result |= l32;
+
+ return result;
+}
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 77d6b86..4286df8 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -5,6 +5,7 @@ extern char *connection_string;
extern char *dbhost;
extern char *dbuser;
extern char *dbport;
+extern char *dbname;
extern int dbgetpassword;
/* Connection kept global so we can disconnect easily */
@@ -17,3 +18,12 @@ extern PGconn *conn;
}
extern PGconn *GetConnection(void);
+
+extern int64 feGetCurrentTimestamp(void);
+extern void feTimestampDifference(int64 start_time, int64 stop_time,
+ long *secs, int *microsecs);
+
+extern bool feTimestampDifferenceExceeds(int64 start_time, int64 stop_time,
+ int msec);
+extern void fe_sendint64(int64 i, char *buf);
+extern int64 fe_recvint64(char *buf);
--
1.8.4.21.g992c386.dirty
0007-wal_decoding-test_logical_decoding-Add-extension-for.patchtext/x-patch; charset=us-asciiDownload
From 94bf1588665cd89dde0135b877614ca22a2104dd Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen <ams@2ndQuadrant.com>
Date: Mon, 19 Aug 2013 13:24:31 +0200
Subject: [PATCH 7/8] wal_decoding: test_logical_decoding: Add extension for
easier testing of logical decoding
This extension provides three functions for manipulating replication slots:
* init_logical_replication - initiate a replication slot and wait for consistent state
* start_logical_replication - return all changes since the last call up to now, without blocking
* free_logical_replication - free the logical slot again
Those are pretty direct synonyms for the replication connection commands.
Due to questions about how to integrate logical replication tests this module
also contains the current tests of logical replication itself.
Author: Abhijit Menon-Sen
---
contrib/Makefile | 1 +
contrib/test_logical_decoding/Makefile | 33 ++
contrib/test_logical_decoding/expected/ddl.out | 625 +++++++++++++++++++++
contrib/test_logical_decoding/expected/rewrite.out | 70 +++
contrib/test_logical_decoding/logical.conf | 2 +
contrib/test_logical_decoding/sql/ddl.sql | 316 +++++++++++
contrib/test_logical_decoding/sql/rewrite.sql | 29 +
.../test_logical_decoding--1.0.sql | 6 +
.../test_logical_decoding/test_logical_decoding.c | 238 ++++++++
.../test_logical_decoding.control | 5 +
10 files changed, 1325 insertions(+)
create mode 100644 contrib/test_logical_decoding/Makefile
create mode 100644 contrib/test_logical_decoding/expected/ddl.out
create mode 100644 contrib/test_logical_decoding/expected/rewrite.out
create mode 100644 contrib/test_logical_decoding/logical.conf
create mode 100644 contrib/test_logical_decoding/sql/ddl.sql
create mode 100644 contrib/test_logical_decoding/sql/rewrite.sql
create mode 100644 contrib/test_logical_decoding/test_logical_decoding--1.0.sql
create mode 100644 contrib/test_logical_decoding/test_logical_decoding.c
create mode 100644 contrib/test_logical_decoding/test_logical_decoding.control
diff --git a/contrib/Makefile b/contrib/Makefile
index 6d2fe32..41cb892 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -51,6 +51,7 @@ SUBDIRS = \
tcn \
test_parser \
test_decoding \
+ test_logical_decoding \
tsearch2 \
unaccent \
vacuumlo \
diff --git a/contrib/test_logical_decoding/Makefile b/contrib/test_logical_decoding/Makefile
new file mode 100644
index 0000000..f1990d3
--- /dev/null
+++ b/contrib/test_logical_decoding/Makefile
@@ -0,0 +1,33 @@
+MODULE_big = test_logical_decoding
+OBJS = test_logical_decoding.o
+
+EXTENSION = test_logical_decoding
+DATA = test_logical_decoding--1.0.sql
+
+# Note: because we don't tell the Makefile there are any regression tests,
+# we have to clean those result files explicitly
+EXTRA_CLEAN = -r $(pg_regress_clean_files)
+
+subdir = contrib/test_logical_decoding
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+
+# Disabled because these tests require "wal_level=logical", which
+# typical installcheck users do not have (e.g. buildfarm clients).
+installcheck:;
+
+submake-regress:
+ $(MAKE) -C $(top_builddir)/src/test/regress
+
+submake-test_decoding:
+ $(MAKE) -C $(top_builddir)/contrib/test_decoding
+
+check: all | submake-regress submake-test_decoding
+ $(pg_regress_check) --temp-config $(top_srcdir)/contrib/test_logical_decoding/logical.conf \
+ --temp-install=./tmp_check \
+ --extra-install=contrib/test_decoding \
+ --extra-install=contrib/test_logical_decoding \
+ ddl rewrite
+
+PHONY: submake-test_decoding submake-regress
diff --git a/contrib/test_logical_decoding/expected/ddl.out b/contrib/test_logical_decoding/expected/ddl.out
new file mode 100644
index 0000000..c161a43
--- /dev/null
+++ b/contrib/test_logical_decoding/expected/ddl.out
@@ -0,0 +1,625 @@
+CREATE EXTENSION test_logical_decoding;
+-- predictability
+SET synchronous_commit = on;
+-- faster startup
+CHECKPOINT;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+-- fail because of an already existing slot
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ERROR: There already is a logical slot named "regression_slot"
+-- succeed once
+SELECT stop_logical_replication('regression_slot');
+ stop_logical_replication
+--------------------------
+ 0
+(1 row)
+
+-- fail
+SELECT stop_logical_replication('regression_slot');
+ERROR: couldn't find logical slot "regression_slot"
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+/* check whether status function reports us, only reproduceable columns */
+SELECT slot_name, plugin, active,
+ xmin::xid IS NOT NULL,
+ pg_xlog_location_diff(restart_decoding_lsn, '0/01000000') > 0
+FROM pg_stat_logical_decoding;
+ slot_name | plugin | active | ?column? | ?column?
+-----------------+---------------+--------+----------+----------
+ regression_slot | test_decoding | f | t | t
+(1 row)
+
+/*
+ * Check that changes are handled correctly when interleaved with ddl
+ */
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+COMMIT;
+ALTER TABLE replication_example ADD COLUMN bar int;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+COMMIT;
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+-- collect all changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+---------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:1 somedata[int4]:1 text[varchar]:1
+ table "replication_example": INSERT: id[int4]:2 somedata[int4]:1 text[varchar]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:3 somedata[int4]:2 text[varchar]:1 bar[int4]:4
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:4 somedata[int4]:2 text[varchar]:2 bar[int4]:4
+ table "replication_example": INSERT: id[int4]:5 somedata[int4]:2 text[varchar]:3 bar[int4]:4
+ table "replication_example": INSERT: id[int4]:6 somedata[int4]:2 text[varchar]:4 bar[int4]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:7 somedata[int4]:3 text[varchar]:1
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:8 somedata[int4]:3 text[varchar]:2
+ table "replication_example": INSERT: id[int4]:9 somedata[int4]:3 text[varchar]:3
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:10 somedata[int4]:4 somenum[varchar]:1
+ COMMIT
+(30 rows)
+
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
+-- throw away changes, they contain oids
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ count
+-------
+ 12
+(1 row)
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+BEGIN;
+INSERT INTO replication_example(somedata, somenum) VALUES (6, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod1 int;
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 2, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod2 int;
+INSERT INTO replication_example(somedata, somenum, zaphod2) VALUES (6, 3, 1);
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 4, 2);
+COMMIT;
+/*
+ * check whether the correct indexes are chosen for deletions
+ */
+CREATE TABLE tr_unique(id2 serial unique NOT NULL, data int);
+INSERT INTO tr_unique(data) VALUES(10);
+--show deletion with unique index
+DELETE FROM tr_unique;
+ALTER TABLE tr_unique RENAME TO tr_pkey;
+-- show changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "replication_example": INSERT: id[int4]:11 somedata[int4]:5 somenum[int4]:1
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:12 somedata[int4]:6 somenum[int4]:1
+ table "replication_example": INSERT: id[int4]:13 somedata[int4]:6 somenum[int4]:2 zaphod1[int4]:1
+ table "replication_example": INSERT: id[int4]:14 somedata[int4]:6 somenum[int4]:3 zaphod1[int4]:(null) zaphod2[int4]:1
+ table "replication_example": INSERT: id[int4]:15 somedata[int4]:6 somenum[int4]:4 zaphod1[int4]:2 zaphod2[int4]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "tr_unique": INSERT: id2[int4]:1 data[int4]:10
+ COMMIT
+ BEGIN
+ table "tr_unique": DELETE: id2[int4]:1
+ COMMIT
+ BEGIN
+ COMMIT
+(19 rows)
+
+-- hide changes bc of oid visible in full table rewrites
+ALTER TABLE tr_pkey ADD COLUMN id serial primary key;
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ count
+-------
+ 2
+(1 row)
+
+INSERT INTO tr_pkey(data) VALUES(1);
+--show deletion with primary key
+DELETE FROM tr_pkey;
+/* display results */
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+--------------------------------------------------------------
+ BEGIN
+ table "tr_pkey": INSERT: id2[int4]:2 data[int4]:1 id[int4]:1
+ COMMIT
+ BEGIN
+ table "tr_pkey": DELETE: id[int4]:1
+ COMMIT
+(6 rows)
+
+/*
+ * check that disk spooling works
+ */
+BEGIN;
+CREATE TABLE tr_etoomuch (id serial primary key, data int);
+INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+DELETE FROM tr_etoomuch WHERE id < 5000;
+UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+COMMIT;
+/* display results, but hide most of the output */
+SELECT count(*), min(data), max(data)
+FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1')
+GROUP BY substring(data, 1, 24)
+ORDER BY 1;
+ count | min | max
+-------+---------------------------------------------------------------+-------------------------------------------------------------
+ 1 | COMMIT | COMMIT
+ 1 | BEGIN | BEGIN
+ 4999 | table "tr_etoomuch": DELETE: id[int4]:1 | table "tr_etoomuch": DELETE: id[int4]:999
+ 5234 | table "tr_etoomuch": UPDATE: id[int4]:10000 data[int4]:-10000 | table "tr_etoomuch": UPDATE: id[int4]:9999 data[int4]:-9999
+ 10234 | table "tr_etoomuch": INSERT: id[int4]:10000 data[int4]:10000 | table "tr_etoomuch": INSERT: id[int4]:9 data[int4]:9
+(5 rows)
+
+/*
+ * check whether we subtransactions correctly in relation with each other
+ */
+CREATE TABLE tr_sub (id serial primary key, path text);
+-- toplevel, subtxn, toplevel, subtxn, subtxn
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('1-top-#1');
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#2');
+RELEASE SAVEPOINT a;
+SAVEPOINT b;
+SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#2');
+RELEASE SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-#1');
+RELEASE SAVEPOINT b;
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:1 path[text]:1-top-#1
+ table "tr_sub": INSERT: id[int4]:2 path[text]:1-top-1-#1
+ table "tr_sub": INSERT: id[int4]:3 path[text]:1-top-1-#2
+ table "tr_sub": INSERT: id[int4]:4 path[text]:1-top-2-1-#1
+ table "tr_sub": INSERT: id[int4]:5 path[text]:1-top-2-1-#2
+ table "tr_sub": INSERT: id[int4]:6 path[text]:1-top-2-#1
+ COMMIT
+(10 rows)
+
+-- check that we handle xlog assignments correctly
+BEGIN;
+-- nest 80 subtxns
+SAVEPOINT subtop;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+-- assign xid by inserting
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#1');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#2');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#3');
+RELEASE SAVEPOINT subtop;
+INSERT INTO tr_sub(path) VALUES ('2-top-#1');
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+--------------------------------------------------------------
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:7 path[text]:2-top-1...--#1
+ table "tr_sub": INSERT: id[int4]:8 path[text]:2-top-1...--#2
+ table "tr_sub": INSERT: id[int4]:9 path[text]:2-top-1...--#3
+ table "tr_sub": INSERT: id[int4]:10 path[text]:2-top-#1
+ COMMIT
+(6 rows)
+
+-- make sure rollbacked subtransactions aren't decoded
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-#1');
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-1-#1');
+SAVEPOINT b;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-2-#1');
+ROLLBACK TO SAVEPOINT b;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-#2');
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+-------------------------------------------------------------
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:11 path[text]:3-top-2-#1
+ table "tr_sub": INSERT: id[int4]:12 path[text]:3-top-2-1-#1
+ table "tr_sub": INSERT: id[int4]:14 path[text]:3-top-2-#2
+ COMMIT
+(5 rows)
+
+-- test whether a known, but not yet logged toplevel xact, followed by a
+-- subxact commit is handled correctly
+BEGIN;
+SELECT txid_current() != 0; -- so no fixed xid apears in the outfile
+ ?column?
+----------
+ t
+(1 row)
+
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('4-top-1-#1');
+RELEASE SAVEPOINT a;
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------
+(0 rows)
+
+/*
+ * Check whether treating a table as a catalog table works somewhat
+ */
+CREATE TABLE replication_metadata (
+ id serial primary key,
+ relation name NOT NULL,
+ options text[]
+)
+WITH (treat_as_catalog_table = true)
+;
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=true
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('foo', ARRAY['a', 'b']);
+ALTER TABLE replication_metadata RESET (treat_as_catalog_table);
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('bar', ARRAY['a', 'b']);
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=true
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('blub', NULL);
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = false);
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=false
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('zaphod', NULL);
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+----------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:1 relation[name]:foo options[_text]:{a,b}
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:2 relation[name]:bar options[_text]:{a,b}
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:3 relation[name]:blub options[_text]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:4 relation[name]:zaphod options[_text]:(null)
+ COMMIT
+(20 rows)
+
+/*
+ * check whether we handle updates/deletes correct with & without a pkey
+ */
+/* we should handle the case without a key at all more gracefully */
+CREATE TABLE table_without_key(id serial, data int);
+INSERT INTO table_without_key(data) VALUES(1),(2);
+DELETE FROM table_without_key WHERE data = 1;
+UPDATE table_without_key SET data = 3 WHERE data = 2;
+UPDATE table_without_key SET id = -id;
+UPDATE table_without_key SET id = -id;
+DELETE FROM table_without_key WHERE data = 3;
+CREATE TABLE table_with_pkey(id serial primary key, data int);
+INSERT INTO table_with_pkey(data) VALUES(1), (2);
+DELETE FROM table_with_pkey WHERE data = 1;
+UPDATE table_with_pkey SET data = 3 WHERE data = 2;
+UPDATE table_with_pkey SET id = -id;
+UPDATE table_with_pkey SET id = -id;
+DELETE FROM table_with_pkey WHERE data = 3;
+CREATE TABLE table_with_unique(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id DROP NOT NULL;
+INSERT INTO table_with_unique(data) VALUES(1), (2);
+DELETE FROM table_with_unique WHERE data = 1;
+UPDATE table_with_unique SET data = 3 WHERE data = 2;
+UPDATE table_with_unique SET id = -id;
+UPDATE table_with_unique SET id = -id;
+DELETE FROM table_with_unique WHERE data = 3;
+CREATE TABLE table_with_unique_not_null(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id SET NOT NULL; --already set
+INSERT INTO table_with_unique_not_null(data) VALUES(1), (2);
+DELETE FROM table_with_unique_not_null WHERE data = 1;
+UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
+UPDATE table_with_unique_not_null SET id = -id;
+UPDATE table_with_unique_not_null SET id = -id;
+DELETE FROM table_with_unique_not_null WHERE data = 3;
+CREATE TABLE table_with_oid(id serial, data int) WITH oids;
+CREATE UNIQUE INDEX table_with_oid_oid ON table_with_oid(oid);
+INSERT INTO table_with_oid(data) VALUES(1), (2);
+DELETE FROM table_with_oid WHERE data = 1;
+UPDATE table_with_oid SET data = 3 WHERE data = 2;
+DELETE FROM table_with_oid WHERE data = 3;
+UPDATE table_with_oid SET id = -id;
+UPDATE table_with_oid SET id = -id;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_without_key": INSERT: id[int4]:1 data[int4]:1
+ table "table_without_key": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_without_key": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_pkey": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_pkey": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_pkey": DELETE: id[int4]:1
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: old-pkey: id[int4]:2 new-tuple: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: old-pkey: id[int4]:-2 new-tuple: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": DELETE: id[int4]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_unique": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_unique": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_unique": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_unique_not_null": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": DELETE: id[int4]:1
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:2 new-tuple: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:-2 new-tuple: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": DELETE: id[int4]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_oid": INSERT: oid[oid]:16484 id[int4]:1 data[int4]:1
+ table "table_with_oid": INSERT: oid[oid]:16485 id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_oid": DELETE: oid[oid]:16484
+ COMMIT
+ BEGIN
+ table "table_with_oid": UPDATE: oid[oid]:16485 id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_oid": DELETE: oid[oid]:16485
+ COMMIT
+(105 rows)
+
+-- check toast support
+SELECT setseed(0);
+ setseed
+---------
+
+(1 row)
+
+CREATE TABLE toasttable(
+ id serial primary key,
+ toasted_col1 text,
+ rand1 float8 DEFAULT random(),
+ toasted_col2 text,
+ rand2 float8 DEFAULT random()
+ );
+-- uncompressed external toast data
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+-- compressed external toast data
+INSERT INTO toasttable(toasted_col2) SELECT repeat(string_agg(to_char(g.i, 'FM0000'), ''), 50) FROM generate_series(1, 500) g(i);
+-- update of existing column
+UPDATE toasttable
+ SET toasted_col1 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "toasttable": INSERT: id[int4]:1 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.840187716763467 toasted_col2[text]:(null) rand2[float8]:0.394382926635444
+ COMMIT
+ BEGIN
+ table "toasttable": INSERT: id[int4]:2 toasted_col1[text]:(null) rand1[float8]:0.783099223393947 toasted_col2[text]:0001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500 rand2[float8]:0.798440033104271
+ COMMIT
+ BEGIN
+ table "toasttable": UPDATE: id[int4]:1 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.840187716763467 toasted_col2[text]:(null) rand2[float8]:0.394382926635444
+ COMMIT
+(11 rows)
+
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+-- update of second column, first column unchanged
+UPDATE toasttable
+ SET toasted_col2 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+-- make sure we decode correctly even if the toast table is gone
+DROP TABLE toasttable;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "toasttable": INSERT: id[int4]:3 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.911647357512265 toasted_col2[text]:(null) rand2[float8]:0.197551369201392
+ COMMIT
+ BEGIN
+ table "toasttable": UPDATE: id[int4]:1 toasted_col1[text]:(unchanged-toast-datum) rand1[float8]:0.840187716763467 toasted_col2[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand2[float8]:0.394382926635444
+ COMMIT
+ BEGIN
+ COMMIT
+(8 rows)
+
+-- done, free logical replication slot
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------
+(0 rows)
+
+SELECT stop_logical_replication('regression_slot');
+ stop_logical_replication
+--------------------------
+ 0
+(1 row)
+
+/* check whether we aren't visible anymore now */
+SELECT * FROM pg_stat_logical_decoding;
+ slot_name | plugin | database | active | xmin | restart_decoding_lsn
+-----------+--------+----------+--------+------+----------------------
+(0 rows)
+
diff --git a/contrib/test_logical_decoding/expected/rewrite.out b/contrib/test_logical_decoding/expected/rewrite.out
new file mode 100644
index 0000000..392e465
--- /dev/null
+++ b/contrib/test_logical_decoding/expected/rewrite.out
@@ -0,0 +1,70 @@
+CREATE EXTENSION test_logical_decoding;
+ERROR: extension "test_logical_decoding" already exists
+-- predictability
+SET synchronous_commit = on;
+DROP TABLE IF EXISTS replication_example;
+-- faster startup
+CHECKPOINT;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+INSERT INTO replication_example(somedata) VALUES (1);
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+---------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:1 somedata[int4]:1 text[varchar]:(null)
+ COMMIT
+(5 rows)
+
+INSERT INTO replication_example(somedata) VALUES (2);
+VACUUM FULL pg_am;
+VACUUM FULL pg_amop;
+VACUUM FULL pg_proc;
+VACUUM FULL pg_opclass;
+VACUUM FULL pg_class;
+VACUUM FULL pg_type;
+VACUUM FULL pg_index;
+VACUUM FULL pg_database;
+INSERT INTO replication_example(somedata) VALUES (3);
+-- make old files go away
+CHECKPOINT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+---------------------------------------------------------------------------------------
+ BEGIN
+ table "replication_example": INSERT: id[int4]:2 somedata[int4]:2 text[varchar]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:3 somedata[int4]:3 text[varchar]:(null)
+ COMMIT
+(22 rows)
+
+SELECT stop_logical_replication('regression_slot');
+ stop_logical_replication
+--------------------------
+ 0
+(1 row)
+
diff --git a/contrib/test_logical_decoding/logical.conf b/contrib/test_logical_decoding/logical.conf
new file mode 100644
index 0000000..a7c6c86
--- /dev/null
+++ b/contrib/test_logical_decoding/logical.conf
@@ -0,0 +1,2 @@
+wal_level = logical
+max_logical_slots = 4
diff --git a/contrib/test_logical_decoding/sql/ddl.sql b/contrib/test_logical_decoding/sql/ddl.sql
new file mode 100644
index 0000000..b1eee39
--- /dev/null
+++ b/contrib/test_logical_decoding/sql/ddl.sql
@@ -0,0 +1,316 @@
+CREATE EXTENSION test_logical_decoding;
+-- predictability
+SET synchronous_commit = on;
+
+-- faster startup
+CHECKPOINT;
+
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+-- fail because of an already existing slot
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+-- succeed once
+SELECT stop_logical_replication('regression_slot');
+-- fail
+SELECT stop_logical_replication('regression_slot');
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+
+/* check whether status function reports us, only reproduceable columns */
+SELECT slot_name, plugin, active,
+ xmin::xid IS NOT NULL,
+ pg_xlog_location_diff(restart_decoding_lsn, '0/01000000') > 0
+FROM pg_stat_logical_decoding;
+
+/*
+ * Check that changes are handled correctly when interleaved with ddl
+ */
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+COMMIT;
+
+ALTER TABLE replication_example ADD COLUMN bar int;
+
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+COMMIT;
+
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+
+-- collect all changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
+-- throw away changes, they contain oids
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, somenum) VALUES (6, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod1 int;
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 2, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod2 int;
+INSERT INTO replication_example(somedata, somenum, zaphod2) VALUES (6, 3, 1);
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 4, 2);
+COMMIT;
+
+/*
+ * check whether the correct indexes are chosen for deletions
+ */
+
+CREATE TABLE tr_unique(id2 serial unique NOT NULL, data int);
+INSERT INTO tr_unique(data) VALUES(10);
+--show deletion with unique index
+DELETE FROM tr_unique;
+
+ALTER TABLE tr_unique RENAME TO tr_pkey;
+
+-- show changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- hide changes bc of oid visible in full table rewrites
+ALTER TABLE tr_pkey ADD COLUMN id serial primary key;
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO tr_pkey(data) VALUES(1);
+--show deletion with primary key
+DELETE FROM tr_pkey;
+
+/* display results */
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+/*
+ * check that disk spooling works
+ */
+BEGIN;
+CREATE TABLE tr_etoomuch (id serial primary key, data int);
+INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+DELETE FROM tr_etoomuch WHERE id < 5000;
+UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+COMMIT;
+
+/* display results, but hide most of the output */
+SELECT count(*), min(data), max(data)
+FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1')
+GROUP BY substring(data, 1, 24)
+ORDER BY 1;
+
+/*
+ * check whether we subtransactions correctly in relation with each other
+ */
+CREATE TABLE tr_sub (id serial primary key, path text);
+
+-- toplevel, subtxn, toplevel, subtxn, subtxn
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('1-top-#1');
+
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#2');
+RELEASE SAVEPOINT a;
+
+SAVEPOINT b;
+SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#2');
+RELEASE SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-#1');
+RELEASE SAVEPOINT b;
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- check that we handle xlog assignments correctly
+BEGIN;
+-- nest 80 subtxns
+SAVEPOINT subtop;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+-- assign xid by inserting
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#1');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#2');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#3');
+RELEASE SAVEPOINT subtop;
+INSERT INTO tr_sub(path) VALUES ('2-top-#1');
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- make sure rollbacked subtransactions aren't decoded
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-#1');
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-1-#1');
+SAVEPOINT b;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-2-#1');
+ROLLBACK TO SAVEPOINT b;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-#2');
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- test whether a known, but not yet logged toplevel xact, followed by a
+-- subxact commit is handled correctly
+BEGIN;
+SELECT txid_current() != 0; -- so no fixed xid apears in the outfile
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('4-top-1-#1');
+RELEASE SAVEPOINT a;
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+
+/*
+ * Check whether treating a table as a catalog table works somewhat
+ */
+CREATE TABLE replication_metadata (
+ id serial primary key,
+ relation name NOT NULL,
+ options text[]
+)
+WITH (treat_as_catalog_table = true)
+;
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('foo', ARRAY['a', 'b']);
+
+ALTER TABLE replication_metadata RESET (treat_as_catalog_table);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('bar', ARRAY['a', 'b']);
+
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('blub', NULL);
+
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = false);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('zaphod', NULL);
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+/*
+ * check whether we handle updates/deletes correct with & without a pkey
+ */
+
+/* we should handle the case without a key at all more gracefully */
+CREATE TABLE table_without_key(id serial, data int);
+INSERT INTO table_without_key(data) VALUES(1),(2);
+DELETE FROM table_without_key WHERE data = 1;
+UPDATE table_without_key SET data = 3 WHERE data = 2;
+UPDATE table_without_key SET id = -id;
+UPDATE table_without_key SET id = -id;
+DELETE FROM table_without_key WHERE data = 3;
+
+CREATE TABLE table_with_pkey(id serial primary key, data int);
+INSERT INTO table_with_pkey(data) VALUES(1), (2);
+DELETE FROM table_with_pkey WHERE data = 1;
+UPDATE table_with_pkey SET data = 3 WHERE data = 2;
+UPDATE table_with_pkey SET id = -id;
+UPDATE table_with_pkey SET id = -id;
+DELETE FROM table_with_pkey WHERE data = 3;
+
+CREATE TABLE table_with_unique(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id DROP NOT NULL;
+INSERT INTO table_with_unique(data) VALUES(1), (2);
+DELETE FROM table_with_unique WHERE data = 1;
+UPDATE table_with_unique SET data = 3 WHERE data = 2;
+UPDATE table_with_unique SET id = -id;
+UPDATE table_with_unique SET id = -id;
+DELETE FROM table_with_unique WHERE data = 3;
+
+CREATE TABLE table_with_unique_not_null(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id SET NOT NULL; --already set
+INSERT INTO table_with_unique_not_null(data) VALUES(1), (2);
+DELETE FROM table_with_unique_not_null WHERE data = 1;
+UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
+UPDATE table_with_unique_not_null SET id = -id;
+UPDATE table_with_unique_not_null SET id = -id;
+DELETE FROM table_with_unique_not_null WHERE data = 3;
+
+CREATE TABLE table_with_oid(id serial, data int) WITH oids;
+CREATE UNIQUE INDEX table_with_oid_oid ON table_with_oid(oid);
+INSERT INTO table_with_oid(data) VALUES(1), (2);
+DELETE FROM table_with_oid WHERE data = 1;
+UPDATE table_with_oid SET data = 3 WHERE data = 2;
+DELETE FROM table_with_oid WHERE data = 3;
+UPDATE table_with_oid SET id = -id;
+UPDATE table_with_oid SET id = -id;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- check toast support
+SELECT setseed(0);
+CREATE TABLE toasttable(
+ id serial primary key,
+ toasted_col1 text,
+ rand1 float8 DEFAULT random(),
+ toasted_col2 text,
+ rand2 float8 DEFAULT random()
+ );
+
+-- uncompressed external toast data
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+
+-- compressed external toast data
+INSERT INTO toasttable(toasted_col2) SELECT repeat(string_agg(to_char(g.i, 'FM0000'), ''), 50) FROM generate_series(1, 500) g(i);
+
+-- update of existing column
+UPDATE toasttable
+ SET toasted_col1 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+
+-- update of second column, first column unchanged
+UPDATE toasttable
+ SET toasted_col2 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+
+-- make sure we decode correctly even if the toast table is gone
+DROP TABLE toasttable;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- done, free logical replication slot
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+SELECT stop_logical_replication('regression_slot');
+
+/* check whether we aren't visible anymore now */
+SELECT * FROM pg_stat_logical_decoding;
diff --git a/contrib/test_logical_decoding/sql/rewrite.sql b/contrib/test_logical_decoding/sql/rewrite.sql
new file mode 100644
index 0000000..2400fe3
--- /dev/null
+++ b/contrib/test_logical_decoding/sql/rewrite.sql
@@ -0,0 +1,29 @@
+CREATE EXTENSION test_logical_decoding;
+-- predictability
+SET synchronous_commit = on;
+
+DROP TABLE IF EXISTS replication_example;
+
+-- faster startup
+CHECKPOINT;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+INSERT INTO replication_example(somedata) VALUES (1);
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO replication_example(somedata) VALUES (2);
+VACUUM FULL pg_am;
+VACUUM FULL pg_amop;
+VACUUM FULL pg_proc;
+VACUUM FULL pg_opclass;
+VACUUM FULL pg_class;
+VACUUM FULL pg_type;
+VACUUM FULL pg_index;
+VACUUM FULL pg_database;
+INSERT INTO replication_example(somedata) VALUES (3);
+
+-- make old files go away
+CHECKPOINT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+SELECT stop_logical_replication('regression_slot');
diff --git a/contrib/test_logical_decoding/test_logical_decoding--1.0.sql b/contrib/test_logical_decoding/test_logical_decoding--1.0.sql
new file mode 100644
index 0000000..b6e048c
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding--1.0.sql
@@ -0,0 +1,6 @@
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_logical_decoding" to load this file. \quit
+
+CREATE FUNCTION start_logical_replication (slotname name, pos text, VARIADIC options text[] DEFAULT '{}', OUT location text, OUT xid bigint, OUT data text) RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'start_logical_replication'
+LANGUAGE C IMMUTABLE STRICT;
diff --git a/contrib/test_logical_decoding/test_logical_decoding.c b/contrib/test_logical_decoding/test_logical_decoding.c
new file mode 100644
index 0000000..26ecdfa
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding.c
@@ -0,0 +1,238 @@
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "storage/fd.h"
+#include "miscadmin.h"
+#include "funcapi.h"
+
+PG_MODULE_MAGIC;
+
+Datum start_logical_replication(PG_FUNCTION_ARGS);
+
+static Tuplestorestate *tupstore = NULL;
+static TupleDesc tupdesc;
+
+static void
+LogicalOutputPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ resetStringInfo(ctx->out);
+}
+
+static void
+LogicalOutputWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ Datum values[3];
+ bool nulls[3];
+ char buf[60];
+
+ sprintf(buf, "%X/%X", (uint32) (lsn >> 32), (uint32) lsn);
+
+ memset(nulls, 0, sizeof(nulls));
+ values[0] = CStringGetTextDatum(buf);
+ values[1] = Int64GetDatum(xid);
+ values[2] = CStringGetTextDatum(ctx->out->data);
+
+ tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+}
+
+PG_FUNCTION_INFO_V1(start_logical_replication);
+
+Datum
+start_logical_replication(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext per_query_ctx;
+ MemoryContext oldcontext;
+
+ XLogRecPtr now;
+ XLogRecPtr startptr;
+ XLogRecPtr rp;
+
+ LogicalDecodingContext *ctx;
+
+ ResourceOwner old_resowner = CurrentResourceOwner;
+ ArrayType *arr;
+ Size ndim;
+ List *options = NIL;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ arr = PG_GETARG_ARRAYTYPE_P(2);
+ ndim = ARR_NDIM(arr);
+
+
+ per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+ if (ndim > 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("start_logical_replication only accept one dimension of arguments")));
+ }
+ else if (array_contains_nulls(arr))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("start_logical_replication expects NOT NULL options")));
+ }
+ else if (ndim == 1)
+ {
+ int nelems;
+ Datum *datum_opts;
+ int i;
+
+ Assert(ARR_ELEMTYPE(arr) == TEXTOID);
+
+ deconstruct_array(arr, TEXTOID, -1, false, 'i',
+ &datum_opts, NULL, &nelems);
+
+ if (nelems % 2 != 0)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("options need to be specified pairwise")));
+ }
+
+ for (i = 0; i < nelems; i += 2)
+ {
+ char *name = VARDATA(DatumGetTextP(datum_opts[i]));
+ char *opt = VARDATA(DatumGetTextP(datum_opts[i + 1]));
+
+ options = lappend(options, makeDefElem(name, (Node *) makeString(opt)));
+ }
+ }
+
+ tupstore = tuplestore_begin_heap(true, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = tupstore;
+ rsinfo->setDesc = tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * XXX: It's impolite to ignore our argument and keep decoding until the
+ * current position.
+ */
+ now = GetFlushRecPtr();
+
+ /*
+ * We need to create a normal_snapshot_reader, but adjust it to use our
+ * page_read callback, and also make its reorder buffer use our callback
+ * wrappers that don't depend on walsender.
+ */
+
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingReAcquireSlot(NameStr(*name));
+
+ ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false,
+ MyLogicalDecodingSlot->confirmed_flush,
+ options,
+ logical_read_local_xlog_page,
+ LogicalOutputPrepareWrite,
+ LogicalOutputWrite);
+
+ startptr = MyLogicalDecodingSlot->restart_decoding;
+
+ elog(DEBUG1, "Starting logical replication from %X/%X to %X/%X",
+ (uint32) (MyLogicalDecodingSlot->restart_decoding >> 32),
+ (uint32) MyLogicalDecodingSlot->restart_decoding,
+ (uint32) (now >> 32), (uint32) now);
+
+ CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner, "logical decoding");
+
+ /* invalidate non-timetravel entries */
+ InvalidateSystemCaches();
+
+ PG_TRY();
+ {
+
+ while ((startptr != InvalidXLogRecPtr && startptr < now) ||
+ (ctx->reader->EndRecPtr && ctx->reader->EndRecPtr < now))
+ {
+ XLogRecord *record;
+ char *errm = NULL;
+
+ record = XLogReadRecord(ctx->reader, startptr, &errm);
+ if (errm)
+ elog(ERROR, "%s", errm);
+
+ startptr = InvalidXLogRecPtr;
+
+ if (record != NULL)
+ {
+ XLogRecordBuffer buf;
+
+ buf.origptr = ctx->reader->ReadRecPtr;
+ buf.endptr = ctx->reader->EndRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+
+ /*
+ * The {begin_txn,change,commit_txn}_wrapper callbacks above
+ * will store the description into our tuplestore.
+ */
+ DecodeRecordIntoReorderBuffer(ctx, &buf);
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ LogicalDecodingReleaseSlot();
+
+ /*
+ * clear timetravel entries: XXX allowed in aborted TXN?
+ */
+ InvalidateSystemCaches();
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ rp = ctx->reader->EndRecPtr;
+ if (rp >= now)
+ {
+ elog(DEBUG1, "Reached endpoint (wanted: %X/%X, got: %X/%X)",
+ (uint32) (now >> 32), (uint32) now,
+ (uint32) (rp >> 32), (uint32) rp);
+ }
+
+ tuplestore_donestoring(tupstore);
+
+ CurrentResourceOwner = old_resowner;
+
+ /*
+ * Next time, start where we left off. (Hunting things, the family
+ * business..)
+ */
+ MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+
+ LogicalDecodingReleaseSlot();
+
+ return (Datum) 0;
+}
diff --git a/contrib/test_logical_decoding/test_logical_decoding.control b/contrib/test_logical_decoding/test_logical_decoding.control
new file mode 100644
index 0000000..0dce19f
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding.control
@@ -0,0 +1,5 @@
+# test_logical_decoding extension
+comment = 'test logical decoding'
+default_version = '1.0'
+module_pathname = '$libdir/test_logical_decoding'
+relocatable = true
--
1.8.4.21.g992c386.dirty
0008-wal_decoding-design-document-v2.4-and-snapshot-build.patchtext/x-patch; charset=us-asciiDownload
>From 667d52abc1599416a7190e00599eba536d890500 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:31 +0200
Subject: [PATCH 8/8] wal_decoding: design document v2.4 and snapshot building
design doc v0.5
---
src/backend/replication/logical/DESIGN.txt | 593 +++++++++++++++++++++
src/backend/replication/logical/Makefile | 6 +
.../replication/logical/README.SNAPBUILD.txt | 241 +++++++++
3 files changed, 840 insertions(+)
create mode 100644 src/backend/replication/logical/DESIGN.txt
create mode 100644 src/backend/replication/logical/README.SNAPBUILD.txt
diff --git a/src/backend/replication/logical/DESIGN.txt b/src/backend/replication/logical/DESIGN.txt
new file mode 100644
index 0000000..d76fdb4
--- /dev/null
+++ b/src/backend/replication/logical/DESIGN.txt
@@ -0,0 +1,593 @@
+//-*- mode: adoc -*-
+= High Level Design for Logical Replication in Postgres =
+:copyright: PostgreSQL Global Development Group 2012
+:author: Andres Freund, 2ndQuadrant Ltd.
+:email: andres@2ndQuadrant.com
+
+== Introduction ==
+
+This document aims to first explain why we think postgres needs another
+replication solution and what that solution needs to offer in our opinion. Then
+it sketches out our proposed implementation.
+
+In contrast to an earlier version of the design document which talked about the
+implementation of four parts of replication solutions:
+
+1. Source data generation
+1. Transportation of that data
+1. Applying the changes
+1. Conflict resolution
+
+this version only plans to talk about the first part in detail as it is an
+independent and complex part usable for a wide range of use cases which we want
+to get included into postgres in a first step.
+
+=== Previous discussions ===
+
+There are two rather large threads discussing several parts of the initial
+prototype and proposed architecture:
+
+- http://archives.postgresql.org/message-id/201206131327.24092.andres@2ndquadrant.com[Logical Replication/BDR prototype and architecture]
+- http://archives.postgresql.org/message-id/201206211341.25322.andres@2ndquadrant.com[Catalog/Metadata consistency during changeset extraction from WAL]
+
+Those discussions lead to some fundamental design changes which are presented in this document.
+
+=== Changes from v1 ===
+* At least a partial decoding step required/possible on the source system
+* No intermediate ("schema only") instances required
+* DDL handling, without event triggers
+* A very simple text conversion is provided for debugging/demo purposes
+* Smaller scope
+
+== Existing approaches to replication in Postgres ==
+
+If any currently used approach to replication can be made to support every
+use-case/feature we need, it likely is not a good idea to implement something
+different. Currently three basic approaches are in use in/around postgres
+today:
+
+. Trigger based
+. Recovery based/Physical footnote:[Often referred to by terms like Hot Standby, Streaming Replication, Point In Time Recovery]
+. Statement based
+
+Statement based replication has obvious and known problems with consistency and
+correctness making it hard to use in the general case so we will not further
+discuss it here.
+
+Lets have a look at the advantages/disadvantages of the other approaches:
+
+=== Trigger based Replication ===
+
+This variant has a multitude of significant advantages:
+
+* implementable in userspace
+* easy to customize
+* just about everything can be made configurable
+* cross version support
+* cross architecture support
+* can feed into systems other than postgres
+* no overhead from writes to non-replicated tables
+* writable standbys
+* mature solutions
+* multimaster implementations possible & existing
+
+But also a number of disadvantages, some of them very hard to solve:
+
+* essentially duplicates the amount of writes (or even more!)
+* synchronous replication hard or impossible to implement
+* noticeable CPU overhead
+** trigger functions
+** text conversion of data
+* complex parts implemented in several solutions
+* not in core
+
+Especially the higher amount of writes might seem easy to solve at a first
+glance but a solution not using a normal transactional table for its log/queue
+has to solve a lot of problems. The major ones are:
+
+* crash safety, restartability & spilling to disk
+* consistency with the commit status of transactions
+* only a minimal amount of synchronous work should be done inside individual
+transactions
+
+In our opinion those problems are restricting progress/wider distribution of
+these class of solutions. It is our aim though that existing solutions in this
+space - most prominently slony and londiste - can benefit from the work we are
+doing & planning to do by incorporating at least parts of the changeset
+generation infrastructure.
+
+=== Recovery based Replication ===
+
+This type of solution, being built into postgres and of increasing popularity,
+has and will have its use cases and we do not aim to replace but to complement
+it. We plan to reuse some of the infrastructure and to make it possible to mix
+both modes of replication
+
+Advantages:
+
+* builtin
+* built on existing infrastructure from crash recovery
+* efficient
+** minimal CPU, memory overhead on primary
+** low amount of additional writes
+* synchronous operation mode
+* low maintenance once setup
+* handles DDL
+
+Disadvantages:
+
+* standbys are read only
+* no cross version support
+* no cross architecture support
+* no replication into foreign systems
+* hard to customize
+* not configurable on the level of database, tables, ...
+
+== Goals ==
+
+As seen in the previous short survey of the two major interesting classes of
+replication solution there is a significant gap between those. Our aim is to
+make it smaller.
+
+We aim for:
+
+* in core
+* low CPU overhead
+* low storage overhead
+* asynchronous, optionally synchronous operation modes
+* robust
+* modular
+* basis for other technologies (sharding, replication into other DBMS's, ...)
+* basis for at least one multi-master solution
+* make the implementation as unintrusive as possible, but not more
+
+== New Architecture ==
+
+=== Overview ===
+
+Our proposal is to reuse the basic principle of WAL based replication, namely
+reusing data that already needs to be written for another purpose, and extend
+it to allow most, but not all, the flexibility of trigger based solutions.
+We want to do that by decoding the WAL back into a non-physical form.
+
+To get the flexibility we and others want we propose that the last step of
+changeset generation, transforming it into a format that can be used by the
+replication consumer, is done in an extensible manner. In the schema the part
+that does that is described as 'Output Plugin'. To keep the amount of
+duplication between different plugins as low as possible the plugin should only
+do a a very limited amount of work.
+
+The following paragraphs contain reasoning for the individual design decisions
+made and their highlevel design.
+
+=== Schematics ===
+
+The basic proposed architecture for changeset extraction is presented in the
+following diagram. The first part should look familiar to anyone knowing
+postgres' architecture. The second is where most of the new magic happens.
+
+[[basic-schema]]
+.Architecture Schema
+["ditaa"]
+------------------------------------------------------------------------------
+ Traditional Stuff
+
+ +---------+---------+---------+---------+----+
+ | Backend | Backend | Backend | Autovac | ...|
+ +----+----+---+-----+----+----+----+----+-+--+
+ | | | | |
+ +------+ | +--------+ | |
+ +-+ | | | +----------------+ |
+ | | | | | |
+ | v v v v |
+ | +------------+ |
+ | | WAL writer |<------------------+
+ | +------------+
+ | | | | | |
+ v v v v v v +-------------------+
++--------+ +---------+ +->| Startup/Recovery |
+|{s} | |{s} | | +-------------------+
+|Catalog | | WAL |---+->| SR/Hot Standby |
+| | | | | +-------------------+
++--------+ +---------+ +->| Point in Time |
+ ^ | +-------------------+
+ ---|----------|--------------------------------
+ | New Stuff
++---+ |
+| v Running separately
+| +----------------+ +=-------------------------+
+| | Walsender | | | |
+| | v | | +-------------------+ |
+| +-------------+ | | +->| Logical Rep. | |
+| | WAL | | | | +-------------------+ |
++-| decoding | | | +->| Multimaster | |
+| +------+------/ | | | +-------------------+ |
+| | | | | +->| Slony | |
+| | v | | | +-------------------+ |
+| +-------------+ | | +->| Auditing | |
+| | TX | | | | +-------------------+ |
++-| reassembly | | | +->| Mysql/... | |
+| +-------------/ | | | +-------------------+ |
+| | | | | +->| Custom Solutions | |
+| | v | | | +-------------------+ |
+| +-------------+ | | +->| Debugging | |
+| | Output | | | | +-------------------+ |
++-| Plugin |--|--|-+->| Data Recovery | |
+ +-------------/ | | +-------------------+ |
+ | | | |
+ +----------------+ +--------------------------|
+------------------------------------------------------------------------------
+
+=== WAL enrichement ===
+
+To be able to decode individual WAL records at the very minimal they need to
+contain enough information to reconstruct what has happened to which row. The
+action is already encoded in the WAL records header in most of the cases.
+
+As an example of missing data, the WAL record emitted when a row gets deleted,
+only contains its physical location. At the very least we need a way to
+identify the deleted row: in a relational database the minimal amount of data
+that does that should be the primary key footnote:[Yes, there are use cases
+where the whole row is needed, or where no primary key can be found].
+
+We propose that for now it is enough to extend the relevant WAL record with
+additional data when the newly introduced 'WAL_level = logical' is set.
+
+Previously it has been argued on the hackers mailing list that a generic 'WAL
+record annotation' mechanism might be a good thing. That mechanism would allow
+to attach arbitrary data to individual wal records making it easier to extend
+postgres to support something like what we propose.. While we don't oppose that
+idea we think it is largely orthogonal issue to this proposal as a whole
+because the format of a WAL records is version dependent by nature and the
+necessary changes for our easy way are small, so not much effort is lost.
+
+A full annotation capability is a complex endeavour on its own as the parts of
+the code generating the relevant WAL records has somewhat complex requirements
+and cannot easily be configured from the outside.
+
+Currently this is contained in the http://archives.postgresql.org/message-id/1347669575-14371-6-git-send-email-andres@2ndquadrant.com[Log enough data into the wal to reconstruct logical changes from it] patch.
+
+=== WAL parsing & decoding ===
+
+The main complexity when reading the WAL as stored on disk is that the format
+is somewhat complex and the existing parser is too deeply integrated in the
+recovery system to be directly reusable. Once a reusable parser exists decoding
+the binary data into individual WAL records is a small problem.
+
+Currently two competing proposals for this module exist, each having its own
+merits. In the grand scheme of this proposal it is irrelevant which one gets
+picked as long as the functionality gets integrated.
+
+The mailing list post
+http:http://archives.postgresql.org/message-id/1347669575-14371-3-git-send-email-andres@2ndquadrant.com[Add
+support for a generic wal reading facility dubbed XLogReader] contains both
+competing patches and discussion around which one is preferable.
+
+Once the WAL has been decoded into individual records two major issues exist:
+
+1. records from different transactions and even individual user level actions
+are intermingled
+1. the data attached to records cannot be interpreted on its own, it is only
+meaningful with a lot of required information (including table, columns, types
+and more)
+
+The solution to the first issue is described in the next section: <<tx-reassembly>>
+
+The second problem is probably the reason why no mature solution to reuse the
+WAL for logical changeset generation exists today. See the <<snapbuilder>>
+paragraph for some details.
+
+As decoding, Transaction reassembly and Snapshot building are interdependent
+they currently are implemented in the same patch:
+http://archives.postgresql.org/message-id/1347669575-14371-8-git-send-email-andres@2ndquadrant.com[Introduce
+wal decoding via catalog timetravel]
+
+That patch also includes a small demonstration that the approach works in the
+presence of DDL:
+
+[[example-of-decoding]]
+.Decoding example
+[NOTE]
+---------------------------
+/* just so we keep a sensible xmin horizon */
+ROLLBACK PREPARED 'f';
+BEGIN;
+CREATE TABLE keepalive();
+PREPARE TRANSACTION 'f';
+
+DROP TABLE IF EXISTS replication_example;
+
+SELECT pg_current_xlog_insert_location();
+CHECKPOINT;
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text
+varchar(120));
+begin;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+commit;
+
+
+ALTER TABLE replication_example ADD COLUMN bar int;
+
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+
+/* slightly more complex schema change, still no table rewrite */
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+commit;
+
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+
+/* complex schema change, changing types of existing column, rewriting the table */
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING
+(somenum::int4);
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+
+SELECT pg_current_xlog_insert_location();
+
+/* now decode what has been written to the WAL during that time */
+
+SELECT decode_xlog('0/1893D78', '0/18BE398');
+
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:1 somedata[int4]:1 text[varchar]:1
+WARNING: tuple is: id[int4]:2 somedata[int4]:1 text[varchar]:2
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:3 somedata[int4]:2 text[varchar]:1 bar[int4]:4
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:4 somedata[int4]:2 text[varchar]:2 bar[int4]:4
+WARNING: tuple is: id[int4]:5 somedata[int4]:2 text[varchar]:3 bar[int4]:4
+WARNING: tuple is: id[int4]:6 somedata[int4]:2 text[varchar]:4 bar[int4]:
+(null)
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:7 somedata[int4]:3 text[varchar]:1
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:8 somedata[int4]:3 text[varchar]:2
+WARNING: tuple is: id[int4]:9 somedata[int4]:3 text[varchar]:3
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:10 somedata[int4]:4 somenum[varchar]:1
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:11 somedata[int4]:5 somenum[int4]:1
+WARNING: COMMIT
+
+---------------------------
+
+[[tx-reassembly]]
+=== TX reassembly ===
+
+In order to make usage of the decoded stream easy we want to present the user
+level code with a correctly ordered image of individual transactions at once
+because otherwise every user will have to reassemble transactions themselves.
+
+Transaction reassembly needs to solve several problems:
+
+1. changes inside a transaction can be interspersed with other transactions
+1. a top level transaction only knows which subtransactions belong to it when
+it reads the commit record
+1. individual user level actions can be smeared over multiple records (TOAST)
+
+Our proposed module solves 1) and 2) by building individual streams of records
+split by xid. While not fully implemented yet we plan to spill those individual
+xid streams to disk after a certain amount of memory is used. This can be
+implemented without any change in the external interface.
+
+As all the individual streams are already sorted by LSN by definition - we
+build them from the wal in a FIFO manner, and the position in the WAL is the
+definition of the LSN footnote:[the LSN is just the byte position int the WAL
+stream] - the individual changes can be merged efficiently by a k-way merge
+(without sorting!) by keeping the individual streams in a binary heap.
+
+To manipulate the binary heap a generic implementation is proposed. Several
+independent implementations of binary heaps already exist in the postgres code,
+but none of them is generic. The patch is available at
+http://archives.postgresql.org/message-id/1347669575-14371-2-git-send-email-andres@2ndquadrant.com[Add
+minimal binary heap implementation].
+
+[NOTE]
+============
+The reassembly component was previously coined ApplyCache because it was
+proposed to run on replication consumers just before applying changes. This is
+not the case anymore.
+
+It is still called that way in the source of the patch recently submitted.
+============
+
+[[snapbuilder]]
+=== Snapshot building ===
+
+To decode the contents of wal records describing data changes we need to decode
+and transform their contents. A single tuple is stored in a data structure
+called HeapTuple. As stored on disk that structure doesn't contain any
+information about the format of its contents.
+
+The basic problem is twofold:
+
+1. The wal records only contain the relfilenode not the relation oid of a table
+11. The relfilenode changes when an action performing a full table rewrite is performed
+1. To interpret a HeapTuple correctly the exact schema definition from back
+when the wal record was inserted into the wal stream needs to be available
+
+We chose to implement timetraveling access to the system catalog using
+postgres' MVCC nature & implementation because of the following advantages:
+
+* low amount of additional data in wal
+* genericity
+* similarity of implementation to Hot Standby, quite a bit of the infrastructure is reusable
+* all kinds of DDL can be handled in reliable manner
+* extensibility to user defined catalog like tables
+
+Timetravel access to the catalog means that we are able to look at the catalog
+just as it looked when changes were generated. That allows us to get the
+correct information about the contents of the aforementioned HeapTuple's so we
+can decode them reliably.
+
+Other solutions we thought about that fell through:
+* catalog only proxy instances that apply schema changes exactly to the point
+ were decoding using ``old fashioned'' wal replay
+* do the decoding on a 2nd machine, replicating all DDL exactly, rely on the catalog there
+* do not allow DDL at all
+* always add enough data into the WAL to allow decoding
+* build a fully versioned catalog
+
+The email thread available under
+http://archives.postgresql.org/message-id/201206211341.25322.andres@2ndquadrant.com[Catalog/Metadata
+consistency during changeset extraction from WAL] contains some details,
+advantages and disadvantages about the different possible implementations.
+
+How we build snapshots is somewhat intricate and complicated and seems to be
+out of scope for this document. We will provide a second document discussing
+the implementation in detail. Let's just assume it is possible from here on.
+
+[NOTE]
+Some details are already available in comments inside 'src/backend/replication/logical/snapbuild.{c,h}'.
+
+=== Output Plugin ===
+
+As already mentioned previously our aim is to make the implementation of output
+plugins as simple and non-redundant as possible as we expect several different
+ones with different use cases to emerge quickly. See <<basic-schema>> for a
+list of possible output plugins that we think might emerge.
+
+Although we for now only plan to tackle logical replication and based on that a
+multi-master implementation in the near future we definitely aim to provide all
+use-cases with something easily useable!
+
+To decode and translate local transaction an output plugin needs to be able to
+transform transactions as a whole so it can apply them as a meaningful
+transaction at the other side.
+
+What we do to provide that is, that very time we find a transaction commit and
+thus have completed reassembling the transaction we start to provide the
+individual changes to the output plugin. It currently only has to fill out 3
+callbacks:
+[options="header"]
+|=====================================================================================================================================
+|Callback |Passed Parameters |Called per TX | Use
+|begin |xid |once |Begin of a reassembled transaction
+|change |xid, subxid, change, mvcc snapshot |every change |Gets passed every change so it can transform it to the target format
+|commit |xid |once |End of a reassembled transaction
+|=====================================================================================================================================
+
+During each of those callback an appropriate timetraveling SnapshotNow snapshot
+is setup so the callbacks can perform all read-only catalog accesses they need,
+including using the sys/rel/catcache. For obvious reasons only read access is
+allowed.
+
+The snapshot guarantees that the result of lookups are be the same as they
+were/would have been when the change was originally created.
+
+Additionally they get passed a MVCC snapshot, to e.g. run sql queries on
+catalogs or similar.
+
+[IMPORTANT]
+============
+At the moment none of these snapshots can be used to access normal user
+tables. Adding additional tables to the allowed set is easy implementation
+wise, but every transaction changing such tables incurs a noticeably higher
+overhead.
+============
+
+For now transactions won't be decoded/output in parallel. There are ideas to
+improve on this, but we don't think the complexity is appropriate for the first
+release of this feature.
+
+This is an adoption barrier for databases where large amounts of data get
+loaded/written in one transaction.
+
+=== Setup of replication nodes ===
+
+When setting up a new standby/consumer of a primary some problem exist
+independent of the implementation of the consumer. The gist of the problem is
+that when making a base backup and starting to stream all changes since that
+point transactions that were running during all this cannot be included:
+
+* Transaction that have not committed before starting to dump a database are
+ invisible to the dumping process
+
+* Transactions that began before the point from which on the WAL is being
+ decoded are incomplete and cannot be replayed
+
+Our proposal for a solution to this is to detect points in the WAL stream where we can provide:
+
+. A snapshot exported similarly to pg_export_snapshot() footnote:[http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION] that can be imported with +SET TRANSACTION SNAPSHOT+ footnote:[http://www.postgresql.org/docs/devel/static/sql-set-transaction.html]
+. A stream of changes that will include the complete data of all transactions seen as running by the snapshot generated in 1)
+
+See the diagram.
+
+[[setup-schema]]
+.Control flow during setup of a new node
+["ditaa",scaling="0.7"]
+------------------------------------------------------------------------------
++----------------+
+| Walsender | | +------------+
+| v | | Consumer |
++-------------+ |<--IDENTIFY_SYSTEM-------------| |
+| WAL | | | |
+| decoding | |----....---------------------->| |
++------+------/ | | |
+| | | | |
+| v | | |
++-------------+ |<--INIT_LOGICAL $PLUGIN--------| |
+| TX | | | |
+| reassembly | |---FOUND_STARTING %X/%X------->| |
++-------------/ | | |
+| | |---FOUND_CONSISTENT %X/%X----->| |
+| v |---pg_dump snapshot----------->| |
++-------------+ |---replication slot %P-------->| |
+| Output | | | |
+| Plugin | | ^ | |
++-------------/ | | | |
+| | +-run pg_dump separately --| |
+| | | |
+| |<--STREAM_DATA-----------------| |
+| | | |
+| |---data ---------------------->| |
+| | | |
+| | | |
+| | ---- SHUTDOWN ------------- | |
+| | | |
+| | | |
+| |<--RESTART_LOGICAL $PLUGIN %P--| |
+| | | |
+| |---data----------------------->| |
+| | | |
+| | | |
++----------------+ +------------+
+
+------------------------------------------------------------------------------
+
+=== Disadvantages of the approach ===
+
+* somewhat intricate code for snapshot timetravel
+* output plugins/walsenders need to work per database as they access the catalog
+* when sending to multiple standbys some work is done multiple times
+* decoding/applying multiple transactions in parallel is somewhat hard
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 310a45c..6fae278 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -17,3 +17,9 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
OBJS = decode.o logical.o logicalfuncs.o reorderbuffer.o snapbuild.o
include $(top_srcdir)/src/backend/common.mk
+
+DESIGN.pdf: DESIGN.txt
+ a2x -v --fop -f pdf -D $(shell pwd) $<
+
+README.SNAPBUILD.pdf: README.SNAPBUILD.txt
+ a2x -v --fop -f pdf -D $(shell pwd) $<
diff --git a/src/backend/replication/logical/README.SNAPBUILD.txt b/src/backend/replication/logical/README.SNAPBUILD.txt
new file mode 100644
index 0000000..b6c7470
--- /dev/null
+++ b/src/backend/replication/logical/README.SNAPBUILD.txt
@@ -0,0 +1,241 @@
+= Snapshot Building =
+:author: Andres Freund, 2nQuadrant Ltd
+
+== Why do we need timetravel catalog access ==
+
+When doing WAL decoding (see DESIGN.txt for reasons to do so), we need to know
+how the catalog looked at the point a record was inserted into the WAL, because
+without that information we don't know much more about the record other than
+its length. It's just an arbitrary bunch of bytes without further information.
+Unfortunately, due the possibility that the table definition might change we
+cannot just access a newer version of the catalog and assume the table
+definition continues to be the same.
+
+If only the type information were required, it might be enough to annotate the
+wal records with a bit more information (table oid, table name, column name,
+column type) --- but as we want to be able to convert the output to more useful
+formats such as text, we additionally need to be able to call output functions.
+Those need a normal environment including the usual caches and normal catalog
+access to lookup operators, functions and other types.
+
+Our solution to this is to add the capability to access the catalog such as it
+was at the time the record was inserted into the WAL. The locking used during
+WAL generation guarantees the catalog is/was in a consistent state at that
+point. We call this 'time-travel catalog access'.
+
+Interesting cases include:
+
+- enums
+- composite types
+- extension types
+- non-C functions
+- relfilenode to table OID mapping
+
+Due to postgres' non-overwriting storage manager, regular modifications of a
+table's content are theoretically non-destructive. The problem is that there is
+no way to access an arbitrary point in time even if the data for it is there.
+
+This module adds the capability to do so in the very limited set of
+circumstances we need it in for WAL decoding. It does *not* provide a general
+time-travelling facility.
+
+A 'Snapshot' is the data structure used in postgres to describe which tuples
+are visible and which are not. We need to build a Snapshot which can be used to
+access the catalog the way it looked when the wal record was inserted.
+
+Restrictions:
+
+- Only works for catalog tables or tables explicitly marked as such.
+- Snapshot modifications are somewhat expensive
+- it cannot build initial visibility information for every point in time, it
+ needs a specific circumstances to start.
+
+== How are time-travel snapshots built ==
+
+'Hot Standby' added infrastructure to build snapshots from WAL during recovery in
+the 9.0 release. Most of that can be reused for our purposes.
+
+We cannot reuse all of the hot standby infrastructure because:
+
+- we are not in recovery
+- we need to look at interim states *inside* a transaction
+- we need the capability to have multiple different snapshots arround at the same time
+
+Normally the catalog is accessed using SnapshotNow which can legally be
+replaced by SnapshotMVCC that has been taken at the start of a scan. So catalog
+timetravel contains infrastructure to make SnapshotNow catalog access use
+appropriate MVCC snapshots. They aren't generated with GetSnapshotData()
+though, but reassembled from WAL contents.
+
+We collect our data in a normal struct SnapshotData, repurposing some fields
+creatively:
+
+- +Snapshot->xip+ contains all transaction we consider committed
+- +Snapshot->subxip+ contains all transactions belonging to our transaction,
+ including the toplevel one
+- +Snapshot->active_count+ is used as a refcount
+
+The meaning of +xip+ is inverted in comparison with non-timetravel snapshots in
+the sense that members of the array are the committed transactions, not the in
+progress ones. Because usually only a tiny percentage of comitted transactions
+will have modified the catalog between xmin and xmax this allows us to keep the
+array small in the usual cases. It also makes subtransaction handling easier
+since we neither need to query pg_subtrans (which we couldn't anyway since it's
+truncated at restart) nor have problems with suboverflowed snapshots.
+
+== Building of initial snapshot ==
+
+We can start building an initial snapshot as soon as we find either an
++XLOG_RUNNING_XACTS+ or an +XLOG_CHECKPOINT_SHUTDOWN+ record because they allow us
+to know how many transactions are running.
+
+We need to know which transactions were running when we start to build a
+snapshot/start decoding as we don't have enough information about them (they
+could have done catalog modifications before we started watching). Also, we
+wouldn't have the complete contents of those transactions, because we started
+reading after they began. (The latter is also important when building
+snapshots that can be used to build a consistent initial clone.)
+
+There also is the problem that +XLOG_RUNNING_XACT+ records can be
+'suboverflowed' which means there were more running subtransactions than
+fitting into shared memory. In that case we use the same incremental building
+trick hot standby uses which is either
+
+1. wait till further +XLOG_RUNNING_XACT+ records have a running->oldestRunningXid
+after the initial xl_runnign_xacts->nextXid
+2. wait for a further +XLOG_RUNNING_XACT+ that is not overflowed or
+a +XLOG_CHECKPOINT_SHUTDOWN+
+
+When we start building a snapshot we are in the +SNAPBUILD_START+ state. As
+soon as we find any visibility information, even if incomplete, we change to
++SNAPBUILD_INITIAL_POINT+.
+
+When we have collected enough information to decode any transaction starting
+after that point in time we fall over to +SNAPBUILD_FULL_SNAPSHOT+. If those
+transactions commit before the next state is reached, we throw their complete
+contents away.
+
+As soon as all transactions that were running when we switched over to
++SNAPBUILD_FULL_SNAPSHOT+ commit, we change state to +SNAPBUILD_CONSISTENT+.
+Every transaction that commits from now on gets handed to the output plugin.
+When doing the switch to +SNAPBUILD_CONSISTENT+ we optionally export a snapshot
+which makes all transactions that committed up to this point visible. This
+exported snapshot can be used to run pg_dump; replaying all changes emitted
+by the output plugin on a database restored from such a dump will result in
+a consistent clone.
+
+["ditaa",scaling="0.8"]
+---------------
+
+ +-------------------------+
+ +----|SNAPBUILD_START |-------------+
+ | +-------------------------+ |
+ | | |
+ | | |
+ | running_xacts with running xacts |
+ | | |
+ | | |
+ | v |
+ | +-------------------------+ v
+ | |SNAPBUILD_FULL_SNAPSHOT |------------>|
+ | +-------------------------+ |
+XLOG_RUNNING_XACTS | saved snapshot
+ with zero xacts | at running_xacts's lsn
+ | | |
+ | all running toplevel TXNs finished |
+ | | |
+ | v |
+ | +-------------------------+ |
+ +--->|SNAPBUILD_CONSISTENT |<------------+
+ +-------------------------+
+
+---------------
+
+== Snapshot Management ==
+
+Whenever a transaction is detected as having started during decoding in
++SNAPBUILD_FULL_SNAPSHOT+ state, we distribute the currently maintained
+snapshot to it (i.e. call ReorderBufferSetBaseSnapshot). This serves as its
+initial snapshot. Unless there are concurrent catalog changes that snapshot
+will be used for the decoding the entire transaction's changes.
+
+Whenever a transaction-with-catalog-changes commits, we iterate over all
+concurrently active transactions and add a new SnapshotNow to it
+(ReorderBufferAddSnapshot(current_lsn)). This is required because any row
+written from now that point on will have used the changed catalog contents.
+
+When decoding a transaction that made catalog changes itself we tell that
+transaction that (ReorderBufferAddNewCommandId(current_lsn)) which will cause
+the decoding to use the appropriate command id from that point on.
+
+SnapshotNow's need to be setup globally so the syscache and other pieces access
+it transparently. This is done using two new tqual.h functions:
+SetupDecodingSnapshots() and RevertFromDecodingSnapshots().
+
+== Catalog/User Table Detection ==
+
+Since we only want to store committed transactions that actually modified the
+catalog we need a way to detect that from WAL:
+
+Right now, we assume that every transaction that commits before we reach
++SNAPBUILD_CONSISTENT+ state has made catalog modifications since we can't rely
+on having seen the entire transaction before that. That's not harmful beside
+incurring some price in memory usage and runtime.
+
+After having reached consistency we recognize catalog modifying transactions
+via HEAP2_NEW_CID and HEAP_INPLACE that are logged by catalog modifying
+actions.
+
+== mixed DDL/DML transaction handling ==
+
+When a transactions uses DDL and DML in the same transaction things get a bit
+more complicated because we need to handle CommandIds and ComboCids as we need
+to use the correct version of the catalog when decoding the individual tuples.
+
+For that we emit the new HEAP2_NEW_CID records which contain the physical tuple
+location, cmin and cmax when the catalog is modified. If we need to detect
+visibility of a catalog tuple that has been modified in our own transaction -
+which we can detect via xmin/xmax - we look in a hash table using the location
+as key to get correct cmin/cmax values.
+From those values we can also extract the commandid that generated the record.
+
+All this only needs to happen in the transaction performing the DDL.
+
+== Cache Handling ==
+
+As we allow usage of the normal {sys,cat,rel,..}cache we also need to integrate
+cache invalidation. For transactions that only do DDL thats easy as everything
+is already provided by HS. Everytime we read a commit record we apply the
+sinval messages contained therein.
+
+For transactions that contain DDL and DML cache invalidation needs to happen
+more frequently because we need to all tore down all caches that just got
+modified. To do that we simply apply all invalidation messages that got
+collected at the end of transaction and apply them everytime we've decoded
+single change. At some point this can get optimized by generating new local
+invalidation messages, but that seems too complicated for now.
+
+XXX: talk about syscache handling of relmapped relation.
+
+== xmin Horizon Handling ==
+
+Reusing MVCC for timetravel access has one obvious major problem: VACUUM. Rows
+we still need for decoding cannot be removed but at the same time we cannot
+keep data in the catalog indefinitely.
+
+For that we peg the xmin horizon that's used to decide which rows can be
+removed. We only need to prevent removal of those rows for catalog like
+relations, not for all user tables. For that reason a separate xmin horizon
+RecentGlobalDataXmin got introduced.
+
+Since we need to persist that knowledge across restarts we keep the xmin for a
+in the logical slots which are safed in a crashsafe manner. They are restored
+from disk into memory at server startup.
+
+== Restartable Decoding ==
+
+As we want to generate a consistent stream of changes we need to have the
+ability to start from a previously decoded location without waiting possibly
+very long to reach consistency. For that reason we dump the current visibility
+information to disk everytime we read an xl_running_xacts record.
+
--
1.8.4.21.g992c386.dirty
What's with 0001-Improve-regression-test-for-8410.patch? Did you mean
to include that?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-09-15 10:03:54 -0400, Peter Eisentraut wrote:
What's with 0001-Improve-regression-test-for-8410.patch? Did you mean
to include that?
Gah, no. That's already committed and unrelated. Stupid wildcard.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote:
Attached you can find the newest version of the logical changeset
generation patchset.
You probably have bigger things to worry about, but please check the
results of cpluspluscheck, because some of the header files don't
include header files they depend on.
(I guess that's really pgcompinclude's job to find out, but
cpluspluscheck seems to be easier to use.)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-09-15 11:20:20 -0400, Peter Eisentraut wrote:
On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote:
Attached you can find the newest version of the logical changeset
generation patchset.You probably have bigger things to worry about, but please check the
results of cpluspluscheck, because some of the header files don't
include header files they depend on.
Hm. I tried to get that right, but it's been a while since I last
checked. I don't regularly use cpluspluscheck because it doesn't work in
VPATH builds... We really need to fix that.
I'll push a fix for that to the git tree, don't think that's worth a
resend in itself.
Thanks,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/15/13 11:30 AM, Andres Freund wrote:
On 2013-09-15 11:20:20 -0400, Peter Eisentraut wrote:
On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote:
Attached you can find the newest version of the logical changeset
generation patchset.You probably have bigger things to worry about, but please check the
results of cpluspluscheck, because some of the header files don't
include header files they depend on.Hm. I tried to get that right, but it's been a while since I last
checked. I don't regularly use cpluspluscheck because it doesn't work in
VPATH builds... We really need to fix that.I'll push a fix for that to the git tree, don't think that's worth a
resend in itself.
This patch set now fails to apply because of the commit "Rename various
"freeze multixact" variables".
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-09-17 09:45:28 -0400, Peter Eisentraut wrote:
On 9/15/13 11:30 AM, Andres Freund wrote:
On 2013-09-15 11:20:20 -0400, Peter Eisentraut wrote:
On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote:
Attached you can find the newest version of the logical changeset
generation patchset.You probably have bigger things to worry about, but please check the
results of cpluspluscheck, because some of the header files don't
include header files they depend on.Hm. I tried to get that right, but it's been a while since I last
checked. I don't regularly use cpluspluscheck because it doesn't work in
VPATH builds... We really need to fix that.I'll push a fix for that to the git tree, don't think that's worth a
resend in itself.This patch set now fails to apply because of the commit "Rename various
"freeze multixact" variables".
And I am even partially guilty for that patch...
Rebased patches attached.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patchtext/x-patch; charset=us-asciiDownload
>From cdcac9ccbc3103285be4984f648ebe86551c0841 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 1/8] wal_decoding: Allow walsender's to connect to a specific
database
Extend the existing 'replication' parameter to not only allow a boolean value
but also "database". If the latter is specified we connect to the database
specified in 'dbname'.
This is useful for future walsender commands which need database interaction,
e.g. changeset extraction.
---
doc/src/sgml/protocol.sgml | 24 +++++++++---
src/backend/postmaster/postmaster.c | 23 ++++++++++--
.../libpqwalreceiver/libpqwalreceiver.c | 4 +-
src/backend/replication/walsender.c | 43 +++++++++++++++++++---
src/backend/utils/init/postinit.c | 5 +++
src/bin/pg_basebackup/pg_basebackup.c | 4 +-
src/bin/pg_basebackup/pg_receivexlog.c | 4 +-
src/bin/pg_basebackup/receivelog.c | 4 +-
src/include/replication/walsender.h | 1 +
9 files changed, 89 insertions(+), 23 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0b2e60e..2ea14e5 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1301,10 +1301,13 @@
<para>
To initiate streaming replication, the frontend sends the
-<literal>replication</> parameter in the startup message. This tells the
-backend to go into walsender mode, wherein a small set of replication commands
-can be issued instead of SQL statements. Only the simple query protocol can be
-used in walsender mode.
+<literal>replication</> parameter in the startup message. A boolean value
+of <literal>true</> tells the backend to go into walsender mode, wherein a
+small set of replication commands can be issued instead of SQL statements. Only
+the simple query protocol can be used in walsender mode.
+Passing a <literal>database</> as the value instructs walsender to connect to
+the database specified in the <literal>dbname</> paramter which will in future
+allow some additional commands to the ones specified below to be run.
The commands accepted in walsender mode are:
@@ -1314,7 +1317,7 @@ The commands accepted in walsender mode are:
<listitem>
<para>
Requests the server to identify itself. Server replies with a result
- set of a single row, containing three fields:
+ set of a single row, containing four fields:
</para>
<para>
@@ -1356,6 +1359,17 @@ The commands accepted in walsender mode are:
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ dbname
+ </term>
+ <listitem>
+ <para>
+ Database connected to or NULL.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
</listitem>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 01d2618..a31b01d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1894,10 +1894,21 @@ retry1:
port->cmdline_options = pstrdup(valptr);
else if (strcmp(nameptr, "replication") == 0)
{
- if (!parse_bool(valptr, &am_walsender))
+ /*
+ * Due to backward compatibility concerns replication is a
+ * bybrid beast which allows the value to be either a boolean
+ * or the string 'database'. The latter connects to a specific
+ * database which is e.g. required for changeset extraction.
+ */
+ if (strcmp(valptr, "database") == 0)
+ {
+ am_walsender = true;
+ am_db_walsender = true;
+ }
+ else if (!parse_bool(valptr, &am_walsender))
ereport(FATAL,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid value for boolean option \"replication\"")));
+ errmsg("invalid value for option \"replication\", legal values are false, 0, true, 1 or database")));
}
else
{
@@ -1983,8 +1994,12 @@ retry1:
if (strlen(port->user_name) >= NAMEDATALEN)
port->user_name[NAMEDATALEN - 1] = '\0';
- /* Walsender is not related to a particular database */
- if (am_walsender)
+ /*
+ * Generic walsender, e.g. for streaming replication, is not connected to a
+ * particular database. But walsenders used for logical replication need to
+ * connect to a specific database.
+ */
+ if (am_walsender && !am_db_walsender)
port->database_name[0] = '\0';
/*
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6bc0aa1..ee0f1fe 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -130,7 +130,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
"the primary server: %s",
PQerrorMessage(streamConn))));
}
- if (PQnfields(res) != 3 || PQntuples(res) != 1)
+ if (PQnfields(res) != 4 || PQntuples(res) != 1)
{
int ntuples = PQntuples(res);
int nfields = PQnfields(res);
@@ -138,7 +138,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
PQclear(res);
ereport(ERROR,
(errmsg("invalid response from primary server"),
- errdetail("Expected 1 tuple with 3 fields, got %d tuples with %d fields.",
+ errdetail("Expected 1 tuple with 4 fields, got %d tuples with %d fields.",
ntuples, nfields)));
}
primary_sysid = PQgetvalue(res, 0, 0);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index afd559d..b00a91a 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -46,7 +46,10 @@
#include "access/timeline.h"
#include "access/transam.h"
#include "access/xlog_internal.h"
+#include "access/xact.h"
+
#include "catalog/pg_type.h"
+#include "commands/dbcommands.h"
#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -89,9 +92,10 @@ WalSndCtlData *WalSndCtl = NULL;
WalSnd *MyWalSnd = NULL;
/* Global state */
-bool am_walsender = false; /* Am I a walsender process ? */
+bool am_walsender = false; /* Am I a walsender process? */
bool am_cascading_walsender = false; /* Am I cascading WAL to
- * another standby ? */
+ * another standby? */
+bool am_db_walsender = false; /* connect to database? */
/* User-settable parameters for walsender */
int max_wal_senders = 0; /* the maximum number of concurrent walsenders */
@@ -243,10 +247,12 @@ IdentifySystem(void)
char tli[11];
char xpos[MAXFNAMELEN];
XLogRecPtr logptr;
+ char* dbname = NULL;
/*
- * Reply with a result set with one row, three columns. First col is
- * system ID, second is timeline ID, and third is current xlog location.
+ * Reply with a result set with one row, four columns. First col is system
+ * ID, second is timeline ID, third is current xlog location and the fourth
+ * contains the database name if we are connected to one.
*/
snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
@@ -265,9 +271,23 @@ IdentifySystem(void)
snprintf(xpos, sizeof(xpos), "%X/%X", (uint32) (logptr >> 32), (uint32) logptr);
+ if (MyDatabaseId != InvalidOid)
+ {
+ MemoryContext cur = CurrentMemoryContext;
+
+ /* syscache access needs a transaction env. */
+ StartTransactionCommand();
+ /* make dbname live outside TX context */
+ MemoryContextSwitchTo(cur);
+ dbname = get_database_name(MyDatabaseId);
+ CommitTransactionCommand();
+ /* CommitTransactionCommand switches to TopMemoryContext */
+ MemoryContextSwitchTo(cur);
+ }
+
/* Send a RowDescription message */
pq_beginmessage(&buf, 'T');
- pq_sendint(&buf, 3, 2); /* 3 fields */
+ pq_sendint(&buf, 4, 2); /* 4 fields */
/* first field */
pq_sendstring(&buf, "systemid"); /* col name */
@@ -295,17 +315,28 @@ IdentifySystem(void)
pq_sendint(&buf, -1, 2);
pq_sendint(&buf, 0, 4);
pq_sendint(&buf, 0, 2);
+
+ /* fourth field */
+ pq_sendstring(&buf, "dbname");
+ pq_sendint(&buf, 0, 4);
+ pq_sendint(&buf, 0, 2);
+ pq_sendint(&buf, TEXTOID, 4);
+ pq_sendint(&buf, -1, 2);
+ pq_sendint(&buf, 0, 4);
+ pq_sendint(&buf, 0, 2);
pq_endmessage(&buf);
/* Send a DataRow message */
pq_beginmessage(&buf, 'D');
- pq_sendint(&buf, 3, 2); /* # of columns */
+ pq_sendint(&buf, 4, 2); /* # of columns */
pq_sendint(&buf, strlen(sysid), 4); /* col1 len */
pq_sendbytes(&buf, (char *) &sysid, strlen(sysid));
pq_sendint(&buf, strlen(tli), 4); /* col2 len */
pq_sendbytes(&buf, (char *) tli, strlen(tli));
pq_sendint(&buf, strlen(xpos), 4); /* col3 len */
pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
+ pq_sendint(&buf, strlen(dbname), 4); /* col4 len */
+ pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
pq_endmessage(&buf);
}
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 2c7f0f1..56c352c 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -725,7 +725,12 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
ereport(FATAL,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser or replication role to start walsender")));
+ }
+ if (am_walsender &&
+ (in_dbname == NULL || in_dbname[0] == '\0') &&
+ dboid == InvalidOid)
+ {
/* process any options passed in the startup packet */
if (MyProcPort != NULL)
process_startup_options(MyProcPort, am_superuser);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a1e12a8..89e2376 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1361,11 +1361,11 @@ BaseBackup(void)
progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
disconnect_and_exit(1);
}
- if (PQntuples(res) != 1 || PQnfields(res) != 3)
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
{
fprintf(stderr,
_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
- progname, PQntuples(res), PQnfields(res), 1, 3);
+ progname, PQntuples(res), PQnfields(res), 1, 4);
disconnect_and_exit(1);
}
sysidentifier = pg_strdup(PQgetvalue(res, 0, 0));
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 787a395..fe8aef6 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -252,11 +252,11 @@ StreamLog(void)
progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
disconnect_and_exit(1);
}
- if (PQntuples(res) != 1 || PQnfields(res) != 3)
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
{
fprintf(stderr,
_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
- progname, PQntuples(res), PQnfields(res), 1, 3);
+ progname, PQntuples(res), PQnfields(res), 1, 4);
disconnect_and_exit(1);
}
servertli = atoi(PQgetvalue(res, 0, 1));
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index d56a4d7..22a5340 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -534,11 +534,11 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
PQclear(res);
return false;
}
- if (PQnfields(res) != 3 || PQntuples(res) != 1)
+ if (PQnfields(res) != 4 || PQntuples(res) != 1)
{
fprintf(stderr,
_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
- progname, PQntuples(res), PQnfields(res), 1, 3);
+ progname, PQntuples(res), PQnfields(res), 1, 4);
PQclear(res);
return false;
}
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2cc7ddf..5097235 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -19,6 +19,7 @@
/* global state */
extern bool am_walsender;
extern bool am_cascading_walsender;
+extern bool am_db_walsender;
extern bool wake_wal_senders;
/* user-settable parameters */
--
1.8.4.21.g992c386.dirty
0002-wal_decoding-Log-xl_running_xact-s-at-a-higher-frequ.patchtext/x-patch; charset=us-asciiDownload
>From 463cdb627c47b2e3945ae87fb6f594252be3c570 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 2/8] wal_decoding: Log xl_running_xact's at a higher frequency
than checkpoints are done
Logging information about running xacts more frequently is beneficial for both,
hot standby which can reach consistency faster and release some resources
earlier using this information, and future logical replication which can
initialize quicker using this.
Do so in the background writer which seems to be the best choice as its
regularly running and shouldn't be busy for too long without getting back into
its main loop.
Also mark xl_running_xact records as being relevant for async commit so the wal
writer writes them out soonish instead of possibly waiting a long time.
---
src/backend/postmaster/bgwriter.c | 62 +++++++++++++++++++++++++++++++++++++++
src/backend/storage/ipc/standby.c | 27 ++++++++++++++---
src/include/storage/standby.h | 2 +-
3 files changed, 86 insertions(+), 5 deletions(-)
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 286ae86..13d57c5 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -54,9 +54,11 @@
#include "storage/shmem.h"
#include "storage/smgr.h"
#include "storage/spin.h"
+#include "storage/standby.h"
#include "utils/guc.h"
#include "utils/memutils.h"
#include "utils/resowner.h"
+#include "utils/timestamp.h"
/*
@@ -71,6 +73,20 @@ int BgWriterDelay = 200;
#define HIBERNATE_FACTOR 50
/*
+ * Interval in which standby snapshots are logged into the WAL stream, in
+ * milliseconds.
+ */
+#define LOG_SNAPSHOT_INTERVAL_MS 15000
+
+/*
+ * LSN and timestamp at which we last issued a LogStandbySnapshot(), to avoid
+ * doing so too often or repeatedly if there has been no other write activity
+ * in the system.
+ */
+static TimestampTz last_snapshot_ts;
+static XLogRecPtr last_snapshot_lsn = InvalidXLogRecPtr;
+
+/*
* Flags set by interrupt handlers for later service in the main loop.
*/
static volatile sig_atomic_t got_SIGHUP = false;
@@ -142,6 +158,12 @@ BackgroundWriterMain(void)
CurrentResourceOwner = ResourceOwnerCreate(NULL, "Background Writer");
/*
+ * We just started, assume there has been either a shutdown or
+ * end-of-recovery snapshot.
+ */
+ last_snapshot_ts = GetCurrentTimestamp();
+
+ /*
* Create a memory context that we will do all our work in. We do this so
* that we can reset the context during error recovery and thereby avoid
* possible memory leaks. Formerly this code just ran in
@@ -276,6 +298,46 @@ BackgroundWriterMain(void)
}
/*
+ * Log a new xl_running_xacts every now and then so replication can get
+ * into a consistent state faster (think of suboverflowed snapshots)
+ * and clean up resources (locks, KnownXids*) more frequently. The
+ * costs of this are relatively low, so doing it 4 times
+ * (LOG_SNAPSHOT_INTERVAL_MS) a minute seems fine.
+ *
+ * We assume the interval for writing xl_running_xacts is
+ * significantly bigger than BgWriterDelay, so we don't complicate the
+ * overall timeout handling but just assume we're going to get called
+ * often enough even if hibernation mode is active. It's not that
+ * important that log_snap_interval_ms is met strictly. To make sure
+ * we're not waking the disk up unneccesarily on an idle system we
+ * check whether there has been any WAL inserted since the last time
+ * we've logged a running xacts.
+ *
+ * We do this logging in the bgwriter as its the only process thats
+ * run regularly and returns to its mainloop all the
+ * time. E.g. Checkpointer, when active, is barely ever in its
+ * mainloop and thus makes it hard to log regularly.
+ */
+ if (XLogStandbyInfoActive() && !RecoveryInProgress())
+ {
+ TimestampTz timeout = 0;
+ TimestampTz now = GetCurrentTimestamp();
+ timeout = TimestampTzPlusMilliseconds(last_snapshot_ts,
+ LOG_SNAPSHOT_INTERVAL_MS);
+
+ /*
+ * only log if enough time has passed and some xlog record has been
+ * inserted.
+ */
+ if (now >= timeout &&
+ last_snapshot_lsn != GetXLogInsertRecPtr())
+ {
+ last_snapshot_lsn = LogStandbySnapshot();
+ last_snapshot_ts = now;
+ }
+ }
+
+ /*
* Sleep until we are signaled or BgWriterDelay has elapsed.
*
* Note: the feedback control loop in BgBufferSync() expects that we
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index c704412..97da1a0 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -42,7 +42,7 @@ static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlis
ProcSignalReason reason);
static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
-static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
+static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
@@ -853,10 +853,13 @@ standby_redo(XLogRecPtr lsn, XLogRecord *record)
* currently running xids, performed by StandbyReleaseOldLocks().
* Zero xids should no longer be possible, but we may be replaying WAL
* from a time when they were possible.
+ *
+ * Returns the RecPtr of the last inserted record.
*/
-void
+XLogRecPtr
LogStandbySnapshot(void)
{
+ XLogRecPtr recptr;
RunningTransactions running;
xl_standby_lock *locks;
int nlocks;
@@ -876,9 +879,12 @@ LogStandbySnapshot(void)
* record we write, because standby will open up when it sees this.
*/
running = GetRunningTransactionData();
- LogCurrentRunningXacts(running);
+ recptr = LogCurrentRunningXacts(running);
+
/* GetRunningTransactionData() acquired XidGenLock, we must release it */
LWLockRelease(XidGenLock);
+
+ return recptr;
}
/*
@@ -889,7 +895,7 @@ LogStandbySnapshot(void)
* is a contiguous chunk of memory and never exists fully until it is
* assembled in WAL.
*/
-static void
+static XLogRecPtr
LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
{
xl_running_xacts xlrec;
@@ -939,6 +945,19 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
CurrRunningXacts->oldestRunningXid,
CurrRunningXacts->latestCompletedXid,
CurrRunningXacts->nextXid);
+
+ /*
+ * Ensure running_xacts information is synced to disk not too far in the
+ * future. We don't want to stall anything though (i.e. use XLogFlush()),
+ * so we let the wal writer do it during normal
+ * operation. XLogSetAsyncXactLSN() conveniently will mark the LSN as
+ * to-be-synced and nudge the WALWriter into action if sleeping. Check
+ * XLogBackgroundFlush() for details why a record might not be flushed
+ * without it.
+ */
+ XLogSetAsyncXactLSN(recptr);
+
+ return recptr;
}
/*
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 7f3f051..d4a8fe4 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -113,6 +113,6 @@ typedef RunningTransactionsData *RunningTransactions;
extern void LogAccessExclusiveLock(Oid dbOid, Oid relOid);
extern void LogAccessExclusiveLockPrepare(void);
-extern void LogStandbySnapshot(void);
+extern XLogRecPtr LogStandbySnapshot(void);
#endif /* STANDBY_H */
--
1.8.4.21.g992c386.dirty
0003-wal_decoding-Add-information-about-a-tables-primary-.patchtext/x-patch; charset=us-asciiDownload
>From be59001586a9baa731876744fa84cf7987b59fe3 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 3/8] wal_decoding: Add information about a tables primary key
to struct RelationData
'rd_primary' now contains the Oid of an index over uniquely identifying
columns. Several types of indexes are interesting and are collected in that
order:
* Primary Key
* oid index
* the first (OID order) unique, immediate, non-partial and
non-expression index over one or more NOT NULL'ed columns
To gather rd_primary value RelationGetIndexList() needs to have been called.
This is helpful because for logical replication we frequently - on the sending
and receiving side - need to lookup that index and RelationGetIndexList already
gathers all the necessary information.
This could be used to replace tablecmd.c's transformFkeyGetPrimaryKey, but
would change the meaning of that, so it seems to require additional discussion.
---
src/backend/utils/cache/relcache.c | 52 +++++++++++++++++++++++++++++++++++---
src/include/utils/rel.h | 12 +++++++++
2 files changed, 61 insertions(+), 3 deletions(-)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index b4cc6ad..44dd0d2 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3462,7 +3462,9 @@ RelationGetIndexList(Relation relation)
ScanKeyData skey;
HeapTuple htup;
List *result;
- Oid oidIndex;
+ Oid oidIndex = InvalidOid;
+ Oid pkeyIndex = InvalidOid;
+ Oid candidateIndex = InvalidOid;
MemoryContext oldcxt;
/* Quick exit if we already computed the list. */
@@ -3519,17 +3521,61 @@ RelationGetIndexList(Relation relation)
Assert(!isnull);
indclass = (oidvector *) DatumGetPointer(indclassDatum);
+ if (!IndexIsValid(index))
+ continue;
+
/* Check to see if it is a unique, non-partial btree index on OID */
- if (IndexIsValid(index) &&
- index->indnatts == 1 &&
+ if (index->indnatts == 1 &&
index->indisunique && index->indimmediate &&
index->indkey.values[0] == ObjectIdAttributeNumber &&
indclass->values[0] == OID_BTREE_OPS_OID &&
heap_attisnull(htup, Anum_pg_index_indpred))
oidIndex = index->indexrelid;
+
+ if (index->indisunique &&
+ index->indimmediate &&
+ heap_attisnull(htup, Anum_pg_index_indpred))
+ {
+ /* always prefer primary keys */
+ if (index->indisprimary)
+ pkeyIndex = index->indexrelid;
+ else if (!OidIsValid(pkeyIndex)
+ && !OidIsValid(oidIndex)
+ && !OidIsValid(candidateIndex))
+ {
+ int key;
+ bool found = true;
+ for (key = 0; key < index->indnatts; key++)
+ {
+ int16 attno = index->indkey.values[key];
+ Form_pg_attribute attr;
+ /* internal column, like oid */
+ if (attno <= 0)
+ continue;
+
+ attr = relation->rd_att->attrs[attno - 1];
+ if (!attr->attnotnull)
+ {
+ found = false;
+ break;
+ }
+ }
+ if (found)
+ candidateIndex = index->indexrelid;
+ }
+ }
}
systable_endscan(indscan);
+
+ if (OidIsValid(pkeyIndex))
+ relation->rd_primary = pkeyIndex;
+ /* prefer oid indexes over normal candidate ones */
+ else if (OidIsValid(oidIndex))
+ relation->rd_primary = oidIndex;
+ else if (OidIsValid(candidateIndex))
+ relation->rd_primary = candidateIndex;
+
heap_close(indrel, AccessShareLock);
/* Now save a copy of the completed list in the relcache entry. */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 589c9a8..0281b4b 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -111,6 +111,18 @@ typedef struct RelationData
TriggerDesc *trigdesc; /* Trigger info, or NULL if rel has none */
/*
+ * The 'best' primary or candidate key that has been found, only set
+ * correctly if RelationGetIndexList has been called/rd_indexvalid > 0.
+ *
+ * Indexes are chosen in the following order:
+ * * Primary Key
+ * * oid index
+ * * the first (OID order) unique, immediate, non-partial and
+ * non-expression index over one or more NOT NULL'ed columns
+ */
+ Oid rd_primary;
+
+ /*
* rd_options is set whenever rd_rel is loaded into the relcache entry.
* Note that you can NOT look into rd_rel for this data. NULL means "use
* defaults".
--
1.8.4.21.g992c386.dirty
0004-wal_decoding-Introduce-wal-decoding-via-catalog-time.patchtext/x-patch; charset=us-asciiDownload
>From 0a15a118d9b88a3e327cf76dfe297c17bf17fb01 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 4/8] wal_decoding: Introduce wal decoding via catalog
timetravel
This introduces several things:
* 'reorderbuffer' module which reassembles transactions from a stream of interspersed changes
* 'snapbuilder' which builds catalog snapshots so that tuples from wal can be understood
* logging more data into wal to facilitate logical decoding
* wal decoding into an reorderbuffer
* shared library output plugins with 5 callbacks
* init
* begin
* change
* commit
* walsender infrastructur to stream out changes and to keep the global xmin low enough
* INIT_LOGICAL_REPLICATION $plugin; waits till a consistent snapshot is built and returns
* initial LSN
* replication slot identifier
* id of a pg_export() style snapshot
* START_LOGICAL_REPLICATION $id $lsn; streams out changes
* uses named output plugins for output specification
Todo:
* better integrated testing infrastructure
* more docs about the internals
Lowlevel:
* resource owner handling is suboptimal
* invalidations from uninteresting transactions (e.g. from other databases, old ones)
need to be processed anyway
* error handling in walsender is suboptimal
* pg_receivellog needs to send a reply immediately when postgres is shutting down
Input, Testing and Review by:
Heikki Linnakangas
Kevin Grittner
Michael Paquier
Abhijit Menon-Sen
Peter Gheogegan
Robert Haas
Simon Riggs
Steve Singer
Code By:
Andres Freund
With code contributions by:
Abhijit Menon-Sen
Craig Ringer
Alvaro Herrera
Conflicts:
src/backend/replication/repl_gram.y
---
src/backend/access/common/reloptions.c | 10 +
src/backend/access/heap/heapam.c | 465 ++++-
src/backend/access/heap/pruneheap.c | 2 +
src/backend/access/index/indexam.c | 14 +-
src/backend/access/rmgrdesc/heapdesc.c | 9 +
src/backend/access/rmgrdesc/xlogdesc.c | 1 +
src/backend/access/transam/twophase.c | 4 +-
src/backend/access/transam/xact.c | 48 +-
src/backend/access/transam/xlog.c | 14 +-
src/backend/catalog/catalog.c | 14 +-
src/backend/catalog/index.c | 15 +-
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/analyze.c | 2 +-
src/backend/commands/cluster.c | 2 +
src/backend/commands/trigger.c | 3 +-
src/backend/commands/vacuum.c | 5 +-
src/backend/commands/vacuumlazy.c | 3 +
src/backend/postmaster/postmaster.c | 2 +-
src/backend/replication/Makefile | 2 +
src/backend/replication/logical/Makefile | 19 +
src/backend/replication/logical/decode.c | 687 ++++++
src/backend/replication/logical/logical.c | 1046 ++++++++++
src/backend/replication/logical/logicalfuncs.c | 361 ++++
src/backend/replication/logical/reorderbuffer.c | 2548 +++++++++++++++++++++++
src/backend/replication/logical/snapbuild.c | 1581 ++++++++++++++
src/backend/replication/repl_gram.y | 75 +-
src/backend/replication/repl_scanner.l | 55 +-
src/backend/replication/walreceiver.c | 2 +-
src/backend/replication/walsender.c | 733 ++++++-
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procarray.c | 72 +-
src/backend/storage/ipc/standby.c | 15 +
src/backend/utils/cache/inval.c | 4 +-
src/backend/utils/cache/relcache.c | 113 +-
src/backend/utils/misc/guc.c | 12 +
src/backend/utils/misc/postgresql.conf.sample | 11 +-
src/backend/utils/time/snapmgr.c | 7 +-
src/backend/utils/time/tqual.c | 270 ++-
src/bin/initdb/initdb.c | 4 +-
src/bin/pg_controldata/pg_controldata.c | 2 +
src/include/access/heapam_xlog.h | 59 +-
src/include/access/transam.h | 5 +
src/include/access/xact.h | 1 +
src/include/access/xlog.h | 8 +-
src/include/access/xlogreader.h | 13 +-
src/include/catalog/catalog.h | 1 +
src/include/catalog/pg_proc.h | 6 +
src/include/commands/vacuum.h | 2 +-
src/include/nodes/nodes.h | 3 +
src/include/nodes/replnodes.h | 35 +
src/include/replication/decode.h | 20 +
src/include/replication/logical.h | 198 ++
src/include/replication/logicalfuncs.h | 21 +
src/include/replication/output_plugin.h | 70 +
src/include/replication/reorderbuffer.h | 342 +++
src/include/replication/snapbuild.h | 81 +
src/include/replication/walsender_private.h | 6 +-
src/include/storage/itemptr.h | 3 +
src/include/storage/lwlock.h | 1 +
src/include/storage/procarray.h | 2 +-
src/include/storage/sinval.h | 2 +
src/include/utils/inval.h | 1 +
src/include/utils/rel.h | 30 +-
src/include/utils/relcache.h | 11 +-
src/include/utils/snapmgr.h | 3 +
src/include/utils/tqual.h | 21 +-
src/test/regress/expected/rules.out | 9 +-
src/tools/pgindent/typedefs.list | 40 +
68 files changed, 9033 insertions(+), 206 deletions(-)
create mode 100644 src/backend/replication/logical/Makefile
create mode 100644 src/backend/replication/logical/decode.c
create mode 100644 src/backend/replication/logical/logical.c
create mode 100644 src/backend/replication/logical/logicalfuncs.c
create mode 100644 src/backend/replication/logical/reorderbuffer.c
create mode 100644 src/backend/replication/logical/snapbuild.c
create mode 100644 src/include/replication/decode.h
create mode 100644 src/include/replication/logical.h
create mode 100644 src/include/replication/logicalfuncs.h
create mode 100644 src/include/replication/output_plugin.h
create mode 100644 src/include/replication/reorderbuffer.h
create mode 100644 src/include/replication/snapbuild.h
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index b5fd30a..e1e5040 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -63,6 +63,14 @@ static relopt_bool boolRelOpts[] =
},
{
{
+ "treat_as_catalog_table",
+ "Treat table as a catalog table for the purpose of logical replication",
+ RELOPT_KIND_HEAP
+ },
+ false
+ },
+ {
+ {
"fastupdate",
"Enables \"fast update\" feature for this GIN index",
RELOPT_KIND_GIN
@@ -1166,6 +1174,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
offsetof(StdRdOptions, security_barrier)},
{"check_option", RELOPT_TYPE_STRING,
offsetof(StdRdOptions, check_option_offset)},
+ {"treat_as_catalog_table", RELOPT_TYPE_BOOL,
+ offsetof(StdRdOptions, treat_as_catalog_table)}
};
options = parseRelOptions(reloptions, validate, kind, &numoptions);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ead3d69..1a7281f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -85,12 +85,14 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
- HeapTuple newtup, bool all_visible_cleared,
- bool new_all_visible_cleared);
+ HeapTuple newtup, HeapTuple old_idx_tup,
+ bool all_visible_cleared, bool new_all_visible_cleared);
static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
- Bitmapset *hot_attrs, Bitmapset *key_attrs,
- bool *satisfies_hot, bool *satisfies_key,
- HeapTuple oldtup, HeapTuple newtup);
+ Bitmapset *hot_attrs,
+ Bitmapset *key_attrs, Bitmapset *ckey_attrs,
+ bool *satisfies_hot, bool *satisfies_key,
+ bool *satisfies_ckey,
+ HeapTuple oldtup, HeapTuple newtup);
static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
uint16 old_infomask2, TransactionId add_to_xmax,
LockTupleMode mode, bool is_update,
@@ -108,6 +110,8 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
static bool ConditionalMultiXactIdWait(MultiXactId multi,
MultiXactStatus status, int *remaining,
uint16 infomask);
+static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
+static HeapTuple ExtractKeyTuple(Relation rel, HeapTuple tup);
/*
@@ -342,8 +346,10 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- Assert(TransactionIdIsValid(RecentGlobalXmin));
- heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
+ if (IsSystemRelation(scan->rs_rd) || RelationIsDoingTimetravel(scan->rs_rd))
+ heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
+ else
+ heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalDataXmin);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1743,10 +1749,16 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
*/
if (!skip)
{
+ /* setup the redirected t_self for the benefit of timetravel access */
+ ItemPointerSet(&(heapTuple->t_self), BufferGetBlockNumber(buffer), offnum);
+
/* If it's visible per the snapshot, we must return it */
valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);
CheckForSerializableConflictOut(valid, relation, heapTuple,
buffer, snapshot);
+ /* reset original, non-redirected, tid */
+ heapTuple->t_self = *tid;
+
if (valid)
{
ItemPointerSetOffsetNumber(tid, offnum);
@@ -2101,11 +2113,24 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- XLogRecData rdata[3];
+ XLogRecData rdata[4];
Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
+ bool need_tuple_data;
+
+ /*
+ * For logical replication, we need the tuple even if we're doing a
+ * full page write, so make sure to log it separately. (XXX We could
+ * alternatively store a pointer into the FPW).
+ *
+ * Also, if this is a catalog, we need to transmit combocids to
+ * properly decode, so log that as well.
+ */
+ need_tuple_data = RelationIsLogicallyLogged(relation);
+ if (RelationIsDoingTimetravel(relation))
+ log_heap_new_cid(relation, heaptup);
- xlrec.all_visible_cleared = all_visible_cleared;
+ xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
xlrec.target.node = relation->rd_node;
xlrec.target.tid = heaptup->t_self;
rdata[0].data = (char *) &xlrec;
@@ -2124,18 +2149,35 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
*/
rdata[1].data = (char *) &xlhdr;
rdata[1].len = SizeOfHeapHeader;
- rdata[1].buffer = buffer;
+ rdata[1].buffer = need_tuple_data ? InvalidBuffer : buffer;
rdata[1].buffer_std = true;
rdata[1].next = &(rdata[2]);
/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
rdata[2].data = (char *) heaptup->t_data + offsetof(HeapTupleHeaderData, t_bits);
rdata[2].len = heaptup->t_len - offsetof(HeapTupleHeaderData, t_bits);
- rdata[2].buffer = buffer;
+ rdata[2].buffer = need_tuple_data ? InvalidBuffer : buffer;
rdata[2].buffer_std = true;
rdata[2].next = NULL;
/*
+ * add record for the buffer without actual content thats removed if
+ * fpw is done for that buffer
+ */
+ if (need_tuple_data)
+ {
+ rdata[2].next = &(rdata[3]);
+
+ rdata[3].data = NULL;
+ rdata[3].len = 0;
+ rdata[3].buffer = buffer;
+ rdata[3].buffer_std = true;
+ rdata[3].next = NULL;
+
+ xlrec.flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+ }
+
+ /*
* If this is the single and first tuple on page, we can reinit the
* page instead of restoring the whole thing. Set flag, and hide
* buffer references from XLogInsert.
@@ -2144,7 +2186,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
{
info |= XLOG_HEAP_INIT_PAGE;
- rdata[1].buffer = rdata[2].buffer = InvalidBuffer;
+ rdata[1].buffer = rdata[2].buffer = rdata[3].buffer = InvalidBuffer;
}
recptr = XLogInsert(RM_HEAP_ID, info, rdata);
@@ -2270,6 +2312,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
Page page;
bool needwal;
Size saveFreeSpace;
+ bool need_tuple_data = RelationIsLogicallyLogged(relation);
+ bool need_cids = RelationIsDoingTimetravel(relation);
needwal = !(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation);
saveFreeSpace = RelationGetTargetPageFreeSpace(relation,
@@ -2356,7 +2400,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
{
XLogRecPtr recptr;
xl_heap_multi_insert *xlrec;
- XLogRecData rdata[2];
+ XLogRecData rdata[3];
uint8 info = XLOG_HEAP2_MULTI_INSERT;
char *tupledata;
int totaldatalen;
@@ -2386,7 +2430,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
/* the rest of the scratch space is used for tuple data */
tupledata = scratchptr;
- xlrec->all_visible_cleared = all_visible_cleared;
+ xlrec->flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
xlrec->node = relation->rd_node;
xlrec->blkno = BufferGetBlockNumber(buffer);
xlrec->ntuples = nthispage;
@@ -2418,6 +2462,13 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
datalen);
tuphdr->datalen = datalen;
scratchptr += datalen;
+
+ /*
+ * We don't use heap_multi_insert for catalog tuples yet, but
+ * better be prepared...
+ */
+ if (need_cids)
+ log_heap_new_cid(relation, heaptup);
}
totaldatalen = scratchptr - tupledata;
Assert((scratchptr - scratch) < BLCKSZ);
@@ -2429,17 +2480,33 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
rdata[1].data = tupledata;
rdata[1].len = totaldatalen;
- rdata[1].buffer = buffer;
+ rdata[1].buffer = need_tuple_data ? InvalidBuffer : buffer;
rdata[1].buffer_std = true;
rdata[1].next = NULL;
/*
+ * add record for the buffer without actual content thats removed if
+ * fpw is done for that buffer
+ */
+ if (need_tuple_data)
+ {
+ rdata[1].next = &(rdata[2]);
+
+ rdata[2].data = NULL;
+ rdata[2].len = 0;
+ rdata[2].buffer = buffer;
+ rdata[2].buffer_std = true;
+ rdata[2].next = NULL;
+ xlrec->flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+ }
+
+ /*
* If we're going to reinitialize the whole page using the WAL
* record, hide buffer reference from XLogInsert.
*/
if (init)
{
- rdata[1].buffer = InvalidBuffer;
+ rdata[1].buffer = rdata[2].buffer = InvalidBuffer;
info |= XLOG_HEAP_INIT_PAGE;
}
@@ -2559,6 +2626,9 @@ heap_delete(Relation relation, ItemPointer tid,
bool have_tuple_lock = false;
bool iscombo;
bool all_visible_cleared = false;
+ bool need_tuple_data = RelationNeedsWAL(relation) &&
+ RelationIsLogicallyLogged(relation);
+ HeapTuple idx_tuple = NULL; /* primary key of the tuple */
Assert(ItemPointerIsValid(tid));
@@ -2732,6 +2802,15 @@ l1:
/* replace cid with a combo cid if necessary */
HeapTupleHeaderAdjustCmax(tp.t_data, &cid, &iscombo);
+ /*
+ * Compute primary key tuple before entering the critical section so we
+ * don't PANIC uppon a memory allocation failure.
+ */
+ if (need_tuple_data)
+ {
+ idx_tuple = ExtractKeyTuple(relation, &tp);
+ }
+
START_CRIT_SECTION();
/*
@@ -2784,9 +2863,13 @@ l1:
{
xl_heap_delete xlrec;
XLogRecPtr recptr;
- XLogRecData rdata[2];
+ XLogRecData rdata[4];
+
+ /* For logical decode we need combocids to properly decode the catalog */
+ if (RelationIsDoingTimetravel(relation))
+ log_heap_new_cid(relation, &tp);
- xlrec.all_visible_cleared = all_visible_cleared;
+ xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
xlrec.infobits_set = compute_infobits(tp.t_data->t_infomask,
tp.t_data->t_infomask2);
xlrec.target.node = relation->rd_node;
@@ -2803,6 +2886,34 @@ l1:
rdata[1].buffer_std = true;
rdata[1].next = NULL;
+ /*
+ * Log primary key of the deleted tuple
+ */
+ if (need_tuple_data && idx_tuple != NULL)
+ {
+ xl_heap_header xlhdr;
+
+ xlhdr.t_infomask2 = idx_tuple->t_data->t_infomask2;
+ xlhdr.t_infomask = idx_tuple->t_data->t_infomask;
+ xlhdr.t_hoff = idx_tuple->t_data->t_hoff;
+
+ rdata[1].next = &(rdata[2]);
+ rdata[2].data = (char*)&xlhdr;
+ rdata[2].len = SizeOfHeapHeader;
+ rdata[2].buffer = InvalidBuffer;
+ rdata[2].next = NULL;
+
+ rdata[2].next = &(rdata[3]);
+ rdata[3].data = (char *) idx_tuple->t_data
+ + offsetof(HeapTupleHeaderData, t_bits);
+ rdata[3].len = idx_tuple->t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+ rdata[3].buffer = InvalidBuffer;
+ rdata[3].next = NULL;
+
+ xlrec.flags |= XLOG_HEAP_CONTAINS_OLD_KEY;
+ }
+
recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE, rdata);
PageSetLSN(page, recptr);
@@ -2932,9 +3043,11 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
TransactionId xid = GetCurrentTransactionId();
Bitmapset *hot_attrs;
Bitmapset *key_attrs;
+ Bitmapset *ckey_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
+ HeapTuple old_idx_tuple = NULL;
Page page;
BlockNumber block;
MultiXactStatus mxact_status;
@@ -2950,6 +3063,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool iscombo;
bool satisfies_hot;
bool satisfies_key;
+ bool satisfies_ckey;
bool use_hot_update = false;
bool key_intact;
bool all_visible_cleared = false;
@@ -2977,8 +3091,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
* Note that we get a copy here, so we need not worry about relcache flush
* happening midway through.
*/
- hot_attrs = RelationGetIndexAttrBitmap(relation, false);
- key_attrs = RelationGetIndexAttrBitmap(relation, true);
+ hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL);
+ key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
+ ckey_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_CANDIDATE_KEY);
block = ItemPointerGetBlockNumber(otid);
buffer = ReadBuffer(relation, block);
@@ -3036,9 +3152,9 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitiously arrive at the same key values.
*/
- HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs,
+ HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs, ckey_attrs,
&satisfies_hot, &satisfies_key,
- &oldtup, newtup);
+ &satisfies_ckey, &oldtup, newtup);
if (satisfies_key)
{
*lockmode = LockTupleNoKeyExclusive;
@@ -3508,6 +3624,12 @@ l2:
PageSetFull(page);
}
+ /* compute tuple for loggical logging */
+ if (!satisfies_ckey && RelationIsLogicallyLogged(relation))
+ {
+ old_idx_tuple = ExtractKeyTuple(relation, &oldtup);
+ }
+
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -3583,11 +3705,20 @@ l2:
/* XLOG stuff */
if (RelationNeedsWAL(relation))
{
- XLogRecPtr recptr = log_heap_update(relation, buffer,
- newbuf, &oldtup, heaptup,
- all_visible_cleared,
- all_visible_cleared_new);
+ XLogRecPtr recptr;
+
+ /* For logical decode we need combocids to properly decode the catalog */
+ if (RelationIsDoingTimetravel(relation))
+ {
+ log_heap_new_cid(relation, &oldtup);
+ log_heap_new_cid(relation, heaptup);
+ }
+ recptr = log_heap_update(relation, buffer,
+ newbuf, &oldtup, heaptup,
+ old_idx_tuple,
+ all_visible_cleared,
+ all_visible_cleared_new);
if (newbuf != buffer)
{
PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -3739,18 +3870,23 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* modify columns used in the key.
*/
static void
-HeapSatisfiesHOTandKeyUpdate(Relation relation,
- Bitmapset *hot_attrs, Bitmapset *key_attrs,
+HeapSatisfiesHOTandKeyUpdate(Relation relation, Bitmapset *hot_attrs,
+ Bitmapset *key_attrs, Bitmapset *ckey_attrs,
bool *satisfies_hot, bool *satisfies_key,
+ bool *satisfies_ckey,
HeapTuple oldtup, HeapTuple newtup)
{
int next_hot_attnum;
int next_key_attnum;
+ int next_ckey_attnum;
bool hot_result = true;
bool key_result = true;
- bool key_done = false;
+ bool ckey_result = true;
bool hot_done = false;
+ Assert(bms_is_subset(ckey_attrs, key_attrs));
+ Assert(bms_is_subset(key_attrs, hot_attrs));
+
next_hot_attnum = bms_first_member(hot_attrs);
if (next_hot_attnum == -1)
hot_done = true;
@@ -3759,28 +3895,25 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
next_hot_attnum += FirstLowInvalidHeapAttributeNumber;
next_key_attnum = bms_first_member(key_attrs);
- if (next_key_attnum == -1)
- key_done = true;
- else
+ if (next_key_attnum != -1)
/* Adjust for system attributes */
next_key_attnum += FirstLowInvalidHeapAttributeNumber;
+ next_ckey_attnum = bms_first_member(ckey_attrs);
+ if (next_ckey_attnum != -1)
+ /* Adjust for system attributes */
+ next_ckey_attnum += FirstLowInvalidHeapAttributeNumber;
+
for (;;)
{
int check_now;
bool changed;
- /* both bitmapsets are now empty */
- if (key_done && hot_done)
+ /* bitmapsets are now empty, hot includes others */
+ if (hot_done)
break;
- /* XXX there's probably an easier way ... */
- if (hot_done)
- check_now = next_key_attnum;
- if (key_done)
- check_now = next_hot_attnum;
- else
- check_now = Min(next_hot_attnum, next_key_attnum);
+ check_now = next_hot_attnum;
changed = !heap_tuple_attr_equals(RelationGetDescr(relation),
check_now, oldtup, newtup);
@@ -3790,11 +3923,15 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
hot_result = false;
if (check_now == next_key_attnum)
key_result = false;
+ if (check_now == next_ckey_attnum)
+ ckey_result = false;
}
/* if both are false now, we can stop checking */
- if (!hot_result && !key_result)
+ if (!hot_result && !key_result && !ckey_result)
+ {
break;
+ }
if (check_now == next_hot_attnum)
{
@@ -3808,16 +3945,22 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
if (check_now == next_key_attnum)
{
next_key_attnum = bms_first_member(key_attrs);
- if (next_key_attnum == -1)
- key_done = true;
- else
+ if (next_key_attnum != -1)
/* Adjust for system attributes */
next_key_attnum += FirstLowInvalidHeapAttributeNumber;
}
+ if (check_now == next_ckey_attnum)
+ {
+ next_ckey_attnum = bms_first_member(ckey_attrs);
+ if (next_ckey_attnum != -1)
+ /* Adjust for system attributes */
+ next_ckey_attnum += FirstLowInvalidHeapAttributeNumber;
+ }
}
*satisfies_hot = hot_result;
*satisfies_key = key_result;
+ *satisfies_ckey = ckey_result;
}
/*
@@ -5839,15 +5982,22 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
static XLogRecPtr
log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
+ HeapTuple idx_tuple,
bool all_visible_cleared, bool new_all_visible_cleared)
{
xl_heap_update xlrec;
- xl_heap_header xlhdr;
+ xl_heap_header_len xlhdr;
+ xl_heap_header_len xlhdr_idx;
uint8 info;
XLogRecPtr recptr;
- XLogRecData rdata[4];
+ XLogRecData rdata[7];
Page page = BufferGetPage(newbuf);
+ /*
+ * Just as for XLOG_HEAP_INSERT we need to make sure the tuple
+ */
+ bool need_tuple_data = RelationIsLogicallyLogged(reln);
+
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
@@ -5862,9 +6012,12 @@ log_heap_update(Relation reln, Buffer oldbuf,
xlrec.old_infobits_set = compute_infobits(oldtup->t_data->t_infomask,
oldtup->t_data->t_infomask2);
xlrec.new_xmax = HeapTupleHeaderGetRawXmax(newtup->t_data);
- xlrec.all_visible_cleared = all_visible_cleared;
+ xlrec.flags = 0;
+ if (all_visible_cleared)
+ xlrec.flags |= XLOG_HEAP_ALL_VISIBLE_CLEARED;
xlrec.newtid = newtup->t_self;
- xlrec.new_all_visible_cleared = new_all_visible_cleared;
+ if (new_all_visible_cleared)
+ xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
rdata[0].data = (char *) &xlrec;
rdata[0].len = SizeOfHeapUpdate;
@@ -5877,33 +6030,78 @@ log_heap_update(Relation reln, Buffer oldbuf,
rdata[1].buffer_std = true;
rdata[1].next = &(rdata[2]);
- xlhdr.t_infomask2 = newtup->t_data->t_infomask2;
- xlhdr.t_infomask = newtup->t_data->t_infomask;
- xlhdr.t_hoff = newtup->t_data->t_hoff;
+ xlhdr.header.t_infomask2 = newtup->t_data->t_infomask2;
+ xlhdr.header.t_infomask = newtup->t_data->t_infomask;
+ xlhdr.header.t_hoff = newtup->t_data->t_hoff;
+ xlhdr.t_len = newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
- /*
- * As with insert records, we need not store the rdata[2] segment if we
- * decide to store the whole buffer instead.
- */
rdata[2].data = (char *) &xlhdr;
- rdata[2].len = SizeOfHeapHeader;
- rdata[2].buffer = newbuf;
+ rdata[2].len = SizeOfHeapHeaderLen;
+ rdata[2].buffer = need_tuple_data ? InvalidBuffer : newbuf;
rdata[2].buffer_std = true;
rdata[2].next = &(rdata[3]);
/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
- rdata[3].data = (char *) newtup->t_data + offsetof(HeapTupleHeaderData, t_bits);
+ rdata[3].data = (char *) newtup->t_data
+ + offsetof(HeapTupleHeaderData, t_bits);
rdata[3].len = newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
- rdata[3].buffer = newbuf;
+ rdata[3].buffer = need_tuple_data ? InvalidBuffer : newbuf;
rdata[3].buffer_std = true;
rdata[3].next = NULL;
+ /*
+ * separate storage for the buffer reference of the new page in the
+ * wal_level >= logical case
+ */
+ if(need_tuple_data)
+ {
+ rdata[3].next = &(rdata[4]);
+
+ rdata[4].data = NULL,
+ rdata[4].len = 0;
+ rdata[4].buffer = newbuf;
+ rdata[4].buffer_std = true;
+ rdata[4].next = NULL;
+ xlrec.flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+
+ /* candidate key changed and we have a candidate key */
+ if (idx_tuple)
+ {
+ /* don't really need this, but its more comfy */
+ xlhdr_idx.header.t_infomask2 = idx_tuple->t_data->t_infomask2;
+ xlhdr_idx.header.t_infomask = idx_tuple->t_data->t_infomask;
+ xlhdr_idx.header.t_hoff = idx_tuple->t_data->t_hoff;
+ xlhdr_idx.t_len = idx_tuple->t_len;
+
+ rdata[4].next = &(rdata[5]);
+ rdata[5].data = (char *) &xlhdr_idx;
+ rdata[5].len = SizeOfHeapHeaderLen;
+ rdata[5].buffer = InvalidBuffer;
+ rdata[5].next = &(rdata[6]);
+
+ /* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
+ rdata[6].data = (char *) idx_tuple->t_data
+ + offsetof(HeapTupleHeaderData, t_bits);
+ rdata[6].len = idx_tuple->t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+ rdata[6].buffer = InvalidBuffer;
+ rdata[6].next = NULL;
+
+ xlrec.flags |= XLOG_HEAP_CONTAINS_OLD_KEY;
+ }
+ }
+
/* If new tuple is the single and first tuple on page... */
if (ItemPointerGetOffsetNumber(&(newtup->t_self)) == FirstOffsetNumber &&
PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
{
+ XLogRecData *rcur = &rdata[0];
info |= XLOG_HEAP_INIT_PAGE;
- rdata[2].buffer = rdata[3].buffer = InvalidBuffer;
+ while (rcur != NULL)
+ {
+ rcur->buffer = InvalidBuffer;
+ rcur = rcur->next;
+ }
}
recptr = XLogInsert(RM_HEAP_ID, info, rdata);
@@ -6010,6 +6208,112 @@ log_newpage_buffer(Buffer buffer)
}
/*
+ * Perform XLogInsert of a XLOG_HEAP2_NEW_CID record
+ *
+ * This is only used in wal_level >= WAL_LEVEL_LOGICAL
+ */
+static XLogRecPtr
+log_heap_new_cid(Relation relation, HeapTuple tup)
+{
+ xl_heap_new_cid xlrec;
+
+ XLogRecPtr recptr;
+ XLogRecData rdata[1];
+ HeapTupleHeader hdr = tup->t_data;
+
+ Assert(ItemPointerIsValid(&tup->t_self));
+ Assert(tup->t_tableOid != InvalidOid);
+
+ xlrec.top_xid = GetTopTransactionId();
+ xlrec.target.node = relation->rd_node;
+ xlrec.target.tid = tup->t_self;
+
+ /*
+ * if the tuple got inserted & deleted in the same TX we definitely have a
+ * combocid, set cmin and cmax.
+ */
+ if (hdr->t_infomask & HEAP_COMBOCID)
+ {
+ xlrec.cmin = HeapTupleHeaderGetCmin(hdr);
+ xlrec.cmax = HeapTupleHeaderGetCmax(hdr);
+ xlrec.combocid = HeapTupleHeaderGetRawCommandId(hdr);
+ }
+ /* No combocid, so only cmin or cmax can be set by this TX */
+ else
+ {
+ /* tuple inserted */
+ if (hdr->t_infomask & HEAP_XMAX_INVALID)
+ {
+ xlrec.cmin = HeapTupleHeaderGetRawCommandId(hdr);
+ xlrec.cmax = InvalidCommandId;
+ }
+ /* tuple from a different tx updated or deleted */
+ else
+ {
+ xlrec.cmin = InvalidCommandId;
+ xlrec.cmax = HeapTupleHeaderGetRawCommandId(hdr);
+
+ }
+ xlrec.combocid = InvalidCommandId;
+ }
+
+ rdata[0].data = (char *) &xlrec;
+ rdata[0].len = SizeOfHeapNewCid;
+ rdata[0].buffer = InvalidBuffer;
+ rdata[0].next = NULL;
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_NEW_CID, rdata);
+
+ return recptr;
+}
+
+static HeapTuple
+ExtractKeyTuple(Relation relation, HeapTuple tp)
+{
+ HeapTuple idx_tuple = NULL;
+ TupleDesc desc = RelationGetDescr(relation);
+ Relation idx_rel;
+ TupleDesc idx_desc;
+ Datum idx_vals[INDEX_MAX_KEYS];
+ bool idx_isnull[INDEX_MAX_KEYS];
+ int natt;
+
+ /* needs to already have been fetched? */
+ if (relation->rd_indexvalid == 0)
+ RelationGetIndexList(relation);
+
+ if (!OidIsValid(relation->rd_primary))
+ {
+ elog(DEBUG1, "Could not find primary key for table with oid %u",
+ RelationGetRelid(relation));
+ }
+ else
+ {
+ idx_rel = RelationIdGetRelation(relation->rd_primary);
+ idx_desc = RelationGetDescr(idx_rel);
+
+ for (natt = 0; natt < idx_desc->natts; natt++)
+ {
+ int attno = idx_rel->rd_index->indkey.values[natt];
+ if (attno == ObjectIdAttributeNumber)
+ {
+ idx_vals[natt] = HeapTupleGetOid(tp);
+ idx_isnull[natt] = false;
+ }
+ else
+ {
+ idx_vals[natt] =
+ fastgetattr(tp, attno, desc, &idx_isnull[natt]);
+ }
+ Assert(!idx_isnull[natt]);
+ }
+ idx_tuple = heap_form_tuple(idx_desc, idx_vals, idx_isnull);
+ RelationClose(idx_rel);
+ }
+ return idx_tuple;
+}
+
+/*
* Handles CLEANUP_INFO
*/
static void
@@ -6370,7 +6674,7 @@ heap_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
Buffer vmbuffer = InvalidBuffer;
@@ -6419,7 +6723,7 @@ heap_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
/* Mark the page as a candidate for pruning */
PageSetPrunable(page, record->xl_xid);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
/* Make sure there is no forward chain link in t_ctid */
@@ -6453,7 +6757,7 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
Buffer vmbuffer = InvalidBuffer;
@@ -6524,7 +6828,7 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
PageSetLSN(page, lsn);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
MarkBufferDirty(buffer);
@@ -6587,7 +6891,7 @@ heap_xlog_multi_insert(XLogRecPtr lsn, XLogRecord *record)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->node);
Buffer vmbuffer = InvalidBuffer;
@@ -6670,7 +6974,7 @@ heap_xlog_multi_insert(XLogRecPtr lsn, XLogRecord *record)
PageSetLSN(page, lsn);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
MarkBufferDirty(buffer);
@@ -6709,7 +7013,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
HeapTupleHeaderData hdr;
char data[MaxHeapTupleSize];
} tbuf;
- xl_heap_header xlhdr;
+ xl_heap_header_len xlhdr;
int hsize;
uint32 newlen;
Size freespace;
@@ -6718,7 +7022,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
BlockNumber block = ItemPointerGetBlockNumber(&xlrec->target.tid);
@@ -6796,7 +7100,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
/* Mark the page as a candidate for pruning */
PageSetPrunable(page, record->xl_xid);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
/*
@@ -6820,7 +7124,7 @@ newt:;
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->new_all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
BlockNumber block = ItemPointerGetBlockNumber(&xlrec->newtid);
@@ -6878,13 +7182,13 @@ newsame:;
if (PageGetMaxOffsetNumber(page) + 1 < offnum)
elog(PANIC, "heap_update_redo: invalid max offset number");
- hsize = SizeOfHeapUpdate + SizeOfHeapHeader;
+ hsize = SizeOfHeapUpdate + SizeOfHeapHeaderLen;
- newlen = record->xl_len - hsize;
- Assert(newlen <= MaxHeapTupleSize);
memcpy((char *) &xlhdr,
(char *) xlrec + SizeOfHeapUpdate,
- SizeOfHeapHeader);
+ SizeOfHeapHeaderLen);
+ newlen = xlhdr.t_len;
+ Assert(newlen <= MaxHeapTupleSize);
htup = &tbuf.hdr;
MemSet((char *) htup, 0, sizeof(HeapTupleHeaderData));
/* PG73FORMAT: get bitmap [+ padding] [+ oid] + data */
@@ -6892,9 +7196,9 @@ newsame:;
(char *) xlrec + hsize,
newlen);
newlen += offsetof(HeapTupleHeaderData, t_bits);
- htup->t_infomask2 = xlhdr.t_infomask2;
- htup->t_infomask = xlhdr.t_infomask;
- htup->t_hoff = xlhdr.t_hoff;
+ htup->t_infomask2 = xlhdr.header.t_infomask2;
+ htup->t_infomask = xlhdr.header.t_infomask;
+ htup->t_hoff = xlhdr.header.t_hoff;
HeapTupleHeaderSetXmin(htup, record->xl_xid);
HeapTupleHeaderSetCmin(htup, FirstCommandId);
@@ -6906,7 +7210,7 @@ newsame:;
if (offnum == InvalidOffsetNumber)
elog(PANIC, "heap_update_redo: failed to add tuple");
- if (xlrec->new_all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
@@ -7157,6 +7461,9 @@ heap2_redo(XLogRecPtr lsn, XLogRecord *record)
case XLOG_HEAP2_LOCK_UPDATED:
heap_xlog_lock_updated(lsn, record);
break;
+ case XLOG_HEAP2_NEW_CID:
+ /* nothing to do on a real replay, only during logical decoding */
+ break;
default:
elog(PANIC, "heap2_redo: unknown op code %u", info);
}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3ec10a0..7fe9f32 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -75,6 +75,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, TransactionId OldestXmin)
Page page = BufferGetPage(buffer);
Size minfree;
+ Assert(TransactionIdIsValid(OldestXmin));
+
/*
* Let's see if we really need pruning.
*
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index b878155..3bac4a5 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -67,7 +67,10 @@
#include "access/relscan.h"
#include "access/transam.h"
+#include "access/xlog.h"
+
#include "catalog/index.h"
+#include "catalog/catalog.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -520,8 +523,15 @@ index_fetch_heap(IndexScanDesc scan)
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != scan->xs_cbuf)
- heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
- RecentGlobalXmin);
+ {
+ if (IsSystemRelation(scan->heapRelation)
+ || RelationIsDoingTimetravel(scan->heapRelation))
+ heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
+ RecentGlobalXmin);
+ else
+ heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
+ RecentGlobalDataXmin);
+ }
}
/* Obtain share-lock on the buffer so we can examine visibility */
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index bc8b985..c750fef 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -184,6 +184,15 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
xlrec->infobits_set);
out_target(buf, &(xlrec->target));
}
+ else if (info == XLOG_HEAP2_NEW_CID)
+ {
+ xl_heap_new_cid *xlrec = (xl_heap_new_cid *) rec;
+
+ appendStringInfo(buf, "new_cid: ");
+ out_target(buf, &(xlrec->target));
+ appendStringInfo(buf, "; cmin: %u, cmax: %u, combo: %u",
+ xlrec->cmin, xlrec->cmax, xlrec->combocid);
+ }
else
appendStringInfo(buf, "UNKNOWN");
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1b36f9a..e0900e2 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -28,6 +28,7 @@ const struct config_enum_entry wal_level_options[] = {
{"minimal", WAL_LEVEL_MINIMAL, false},
{"archive", WAL_LEVEL_ARCHIVE, false},
{"hot_standby", WAL_LEVEL_HOT_STANDBY, false},
+ {"logical", WAL_LEVEL_LOGICAL, false},
{NULL, 0, false}
};
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e975f8d..d46a50e 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -47,6 +47,7 @@
#include "access/twophase.h"
#include "access/twophase_rmgr.h"
#include "access/xact.h"
+#include "access/xlog.h"
#include "access/xlogutils.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
@@ -1920,7 +1921,8 @@ RecoverPreparedTransactions(void)
* the prepared transaction generated xid assignment records. Test
* here must match one used in AssignTransactionId().
*/
- if (InHotStandby && hdr->nsubxacts >= PGPROC_MAX_CACHED_SUBXIDS)
+ if (InHotStandby && (hdr->nsubxacts >= PGPROC_MAX_CACHED_SUBXIDS ||
+ XLogLogicalInfoActive()))
overwriteOK = true;
/*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0591f3f..b937ffe 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -146,6 +146,7 @@ typedef struct TransactionStateData
int prevSecContext; /* previous SecurityRestrictionContext */
bool prevXactReadOnly; /* entry-time xact r/o state */
bool startedInRecovery; /* did we start in recovery? */
+ bool guaranteedlyLogged; /* has xid been logged? */
struct TransactionStateData *parent; /* back link to parent */
} TransactionStateData;
@@ -175,6 +176,7 @@ static TransactionStateData TopTransactionStateData = {
0, /* previous SecurityRestrictionContext */
false, /* entry-time xact r/o state */
false, /* startedInRecovery */
+ false, /* guaranteedlyLogged */
NULL /* link to parent state block */
};
@@ -391,6 +393,21 @@ GetCurrentTransactionIdIfAny(void)
}
/*
+ * MarkCurrentTransactionIdLoggedIfAny
+ *
+ * Remember that the current xid - if it is assigned - now has been wal logged.
+ */
+void
+MarkCurrentTransactionIdLoggedIfAny(void)
+{
+ if (TransactionIdIsValid(CurrentTransactionState->transactionId))
+ {
+ CurrentTransactionState->guaranteedlyLogged = true;
+ }
+}
+
+
+/*
* GetStableLatestTransactionId
*
* Get the transaction's XID if it has one, else read the next-to-be-assigned
@@ -431,6 +448,7 @@ AssignTransactionId(TransactionState s)
{
bool isSubXact = (s->parent != NULL);
ResourceOwner currentOwner;
+ bool log_unknown_top = false;
/* Assert that caller didn't screw up */
Assert(!TransactionIdIsValid(s->transactionId));
@@ -438,7 +456,7 @@ AssignTransactionId(TransactionState s)
/*
* Ensure parent(s) have XIDs, so that a child always has an XID later
- * than its parent. Musn't recurse here, or we might get a stack overflow
+ * than its parent. May not recurse here, or we might get a stack overflow
* if we're at the bottom of a huge stack of subtransactions none of which
* have XIDs yet.
*/
@@ -455,6 +473,8 @@ AssignTransactionId(TransactionState s)
p = p->parent;
}
+ Assert(parentOffset);
+
/*
* This is technically a recursive call, but the recursion will never
* be more than one layer deep.
@@ -466,6 +486,21 @@ AssignTransactionId(TransactionState s)
}
/*
+ * When wal_level=logical, guarantee that a subtransaction's xid can only
+ * be seen in the WAL stream if its toplevel xid has been logged before. If
+ * necessary we log a xact_assignment record with fewer than
+ * PGPROC_MAX_CACHED_SUBXIDS. Note that it is fine if guaranteedlyLogged
+ * isn't set for a transaction even though it appears in a wal record,
+ * we'll just superfluously log something.
+ */
+ if (isSubXact && XLogLogicalInfoActive() &&
+ !TopTransactionStateData.guaranteedlyLogged)
+ {
+ log_unknown_top = true;
+ }
+
+
+ /*
* Generate a new Xid and record it in PG_PROC and pg_subtrans.
*
* NB: we must make the subtrans entry BEFORE the Xid appears anywhere in
@@ -519,6 +554,9 @@ AssignTransactionId(TransactionState s)
* top-level transaction that each subxact belongs to. This is correct in
* recovery only because aborted subtransactions are separately WAL
* logged.
+ *
+ * This is correct even for the case where several levels above us didn't
+ * have an xid assigned as we recursed up to them beforehand.
*/
if (isSubXact && XLogStandbyInfoActive())
{
@@ -529,7 +567,8 @@ AssignTransactionId(TransactionState s)
* ensure this test matches similar one in
* RecoverPreparedTransactions()
*/
- if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS)
+ if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS ||
+ log_unknown_top)
{
XLogRecData rdata[2];
xl_xact_assignment xlrec;
@@ -548,13 +587,15 @@ AssignTransactionId(TransactionState s)
rdata[0].next = &rdata[1];
rdata[1].data = (char *) unreportedXids;
- rdata[1].len = PGPROC_MAX_CACHED_SUBXIDS * sizeof(TransactionId);
+ rdata[1].len = nUnreportedXids * sizeof(TransactionId);
rdata[1].buffer = InvalidBuffer;
rdata[1].next = NULL;
(void) XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT, rdata);
nUnreportedXids = 0;
+ /* mark top, not current xact as having been logged */
+ TopTransactionStateData.guaranteedlyLogged = true;
}
}
}
@@ -1733,6 +1774,7 @@ StartTransaction(void)
* initialize reported xid accounting
*/
nUnreportedXids = 0;
+ s->guaranteedlyLogged = false;
/*
* must initialize resource-management stuff first
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fc495d6..fbb505d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
#include "postmaster/startup.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
+#include "replication/logical.h"
#include "storage/barrier.h"
#include "storage/bufmgr.h"
#include "storage/fd.h"
@@ -1191,6 +1192,8 @@ begin:;
*/
WALInsertSlotRelease();
+ MarkCurrentTransactionIdLoggedIfAny();
+
END_CRIT_SECTION();
/*
@@ -6332,6 +6335,13 @@ StartupXLOG(void)
XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
XLogCtl->ckptXid = checkPoint.nextXid;
+
+ /*
+ * Startup logical state, needs to be setup now so we have proper data
+ * during restore. XXX
+ */
+ StartupLogicalReplication(checkPoint.redo);
+
/*
* Initialize unlogged LSN. On a clean shutdown, it's restored from the
* control file. On recovery, all unlogged relations are blown away, so
@@ -8312,7 +8322,7 @@ CreateCheckPoint(int flags)
* StartupSUBTRANS hasn't been called yet.
*/
if (!RecoveryInProgress())
- TruncateSUBTRANS(GetOldestXmin(true, false));
+ TruncateSUBTRANS(GetOldestXmin(true, true, false, false));
/* Real work is done, but log and update stats before releasing lock. */
LogCheckpointEnd(false);
@@ -8672,7 +8682,7 @@ CreateRestartPoint(int flags)
* this because StartupSUBTRANS hasn't been called yet.
*/
if (EnableHotStandby)
- TruncateSUBTRANS(GetOldestXmin(true, false));
+ TruncateSUBTRANS(GetOldestXmin(true, true, false, false));
/* Real work is done, but log and update before releasing lock. */
LogCheckpointEnd(true);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index c1287a7..0d4cfcb 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -106,7 +106,6 @@ GetDatabasePath(Oid dbNode, Oid spcNode)
return path;
}
-
/*
* IsSystemRelation
* True iff the relation is a system catalog relation.
@@ -123,8 +122,17 @@ GetDatabasePath(Oid dbNode, Oid spcNode)
bool
IsSystemRelation(Relation relation)
{
- return IsSystemNamespace(RelationGetNamespace(relation)) ||
- IsToastNamespace(RelationGetNamespace(relation));
+ return IsSystemRelationId(RelationGetRelid(relation));
+}
+
+/*
+ * IsSystemRelationId
+ * True iff the relation is a system catalog relation.
+ */
+bool
+IsSystemRelationId(Oid relid)
+{
+ return relid < FirstNormalObjectId;
}
/*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b73ee4f..49ea38b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2174,9 +2174,20 @@ IndexBuildHeapScan(Relation heapRelation,
}
else
{
+ /*
+ * We can ignore a) pegged xmins b) shared relations if we don't scan
+ * something acting as a catalog.
+ */
+ bool include_systables =
+ IsSystemRelation(heapRelation) ||
+ RelationIsDoingTimetravel(heapRelation);
+
snapshot = SnapshotAny;
/* okay to ignore lazy VACUUMs here */
- OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true);
+ OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared,
+ include_systables,
+ true,
+ false);
}
scan = heap_beginscan_strat(heapRelation, /* relation */
@@ -3340,7 +3351,7 @@ reindex_relation(Oid relid, int flags)
/* Ensure rd_indexattr is valid; see comments for RelationSetIndexList */
if (is_pg_class)
- (void) RelationGetIndexAttrBitmap(rel, false);
+ (void) RelationGetIndexAttrBitmap(rel, INDEX_ATTR_BITMAP_ALL);
PG_TRY();
{
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 575a40f..2acaf54 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -613,6 +613,16 @@ CREATE VIEW pg_stat_replication AS
WHERE S.usesysid = U.oid AND
S.pid = W.pid;
+CREATE VIEW pg_stat_logical_decoding AS
+ SELECT
+ L.slot_name,
+ L.plugin,
+ L.database,
+ L.active,
+ L.xmin,
+ L.restart_decoding_lsn
+ FROM pg_stat_get_logical_decoding_slots() AS L;
+
CREATE VIEW pg_stat_database AS
SELECT
D.oid AS datid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9845b0b..7a05cea 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1081,7 +1081,7 @@ acquire_sample_rows(Relation onerel, int elevel,
totalblocks = RelationGetNumberOfBlocks(onerel);
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
- OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true);
+ OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true, true, false);
/* Prepare for sampling block numbers */
BlockSampler_Init(&bs, totalblocks, targrows);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index f6a5bfe..76b2904 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -859,6 +859,8 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
*/
vacuum_set_xid_limits(freeze_min_age, freeze_table_age,
OldHeap->rd_rel->relisshared,
+ IsSystemRelation(OldHeap)
+ || RelationIsDoingTimetravel(OldHeap),
&OldestXmin, &FreezeXid, NULL, &MultiXactCutoff);
/*
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index d86e9ad..912f7a8 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2355,7 +2355,8 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
* concurrency.
*/
modifiedCols = GetModifiedColumns(relinfo, estate);
- keyCols = RelationGetIndexAttrBitmap(relinfo->ri_RelationDesc, true);
+ keyCols = RelationGetIndexAttrBitmap(relinfo->ri_RelationDesc,
+ INDEX_ATTR_BITMAP_KEY);
if (bms_overlap(keyCols, modifiedCols))
lockmode = LockTupleExclusive;
else
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 27aea73..3528c27 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -381,6 +381,7 @@ void
vacuum_set_xid_limits(int freeze_min_age,
int freeze_table_age,
bool sharedRel,
+ bool catalogRel,
TransactionId *oldestXmin,
TransactionId *freezeLimit,
TransactionId *freezeTableLimit,
@@ -399,7 +400,7 @@ vacuum_set_xid_limits(int freeze_min_age,
* working on a particular table at any time, and that each vacuum is
* always an independent transaction.
*/
- *oldestXmin = GetOldestXmin(sharedRel, true);
+ *oldestXmin = GetOldestXmin(sharedRel, catalogRel, true, false);
Assert(TransactionIdIsNormal(*oldestXmin));
@@ -720,7 +721,7 @@ vac_update_datfrozenxid(void)
* committed pg_class entries for new tables; see AddNewRelationTuple().
* So we cannot produce a wrong minimum by starting with this.
*/
- newFrozenXid = GetOldestXmin(true, true);
+ newFrozenXid = GetOldestXmin(true, true, true, false);
/*
* Similarly, initialize the MultiXact "min" with the value that would be
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index bb4e03e..3e90a1a 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -44,6 +44,7 @@
#include "access/multixact.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
+#include "catalog/catalog.h"
#include "catalog/storage.h"
#include "commands/dbcommands.h"
#include "commands/vacuum.h"
@@ -202,6 +203,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
vacuum_set_xid_limits(vacstmt->freeze_min_age, vacstmt->freeze_table_age,
onerel->rd_rel->relisshared,
+ IsSystemRelation(onerel)
+ || RelationIsDoingTimetravel(onerel),
&OldestXmin, &FreezeLimit, &freezeTableLimit,
&MultiXactCutoff);
scan_all = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a31b01d..8a52cdc 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -818,7 +818,7 @@ PostmasterMain(int argc, char *argv[])
(errmsg("WAL archival (archive_mode=on) requires wal_level \"archive\" or \"hot_standby\"")));
if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
ereport(ERROR,
- (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"archive\" or \"hot_standby\"")));
+ (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"archive\", \"logical\" or \"hot_standby\"")));
/*
* Other one-time internal sanity checks can go here, if they are fast.
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 2dde011..2e13e27 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,8 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
OBJS = walsender.o walreceiverfuncs.o walreceiver.o basebackup.o \
repl_gram.o syncrep.o
+SUBDIRS = logical
+
include $(top_srcdir)/src/backend/common.mk
# repl_scanner is compiled as part of repl_gram
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
new file mode 100644
index 0000000..310a45c
--- /dev/null
+++ b/src/backend/replication/logical/Makefile
@@ -0,0 +1,19 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for src/backend/replication/logical
+#
+# IDENTIFICATION
+# src/backend/replication/logical/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/logical
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
+
+OBJS = decode.o logical.o logicalfuncs.o reorderbuffer.o snapbuild.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
new file mode 100644
index 0000000..53043b9
--- /dev/null
+++ b/src/backend/replication/logical/decode.c
@@ -0,0 +1,687 @@
+/*-------------------------------------------------------------------------
+ *
+ * decode.c
+ * Decodes WAL records fed from xlogreader.h read into an reorderbuffer
+ * while simultaneously letting snapbuild.c build an appropriate snapshots
+ * to decode those.
+ *
+ * NOTE:
+ * This basically tries to handle all low level xlog stuff for
+ * reorderbuffer.c and snapbuild.c. There's some minor leakage where a
+ * specific record's struct is used to pass data along, but that's just
+ * because those are convenient and uncomplicated to read.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logical/decode.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+
+#include "access/heapam.h"
+#include "access/heapam_xlog.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+
+#include "catalog/pg_control.h"
+
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+
+#include "storage/standby.h"
+
+/* RMGR Handlers */
+static void DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+
+/* individual record(group)'s handlers */
+static void DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeMultiInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ TransactionId xid, int nsubxacts, TransactionId *sub_xids,
+ int ninval_msgs, SharedInvalidationMessage *msg);
+static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecPtr lsn,
+ TransactionId xid, TransactionId *sub_xids, int nsubxacts,
+ bool was_commit);
+
+/* common function to decode tuples */
+static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
+
+void
+DecodeRecordIntoReorderBuffer(LogicalDecodingContext *ctx,
+ XLogRecordBuffer *buf)
+{
+ /* cast so we get a warning when new rmgrs are added */
+ switch ((RmgrIds) buf->record.xl_rmid)
+ {
+ case RM_XLOG_ID:
+ DecodeXLogOp(ctx, buf);
+ break;
+
+ case RM_XACT_ID:
+ DecodeXactOp(ctx, buf);
+ break;
+
+ case RM_STANDBY_ID:
+ DecodeStandbyOp(ctx, buf);
+ break;
+
+ case RM_HEAP_ID:
+ DecodeHeapOp(ctx, buf);
+ break;
+
+ case RM_HEAP2_ID:
+ DecodeHeap2Op(ctx, buf);
+ break;
+
+ /* irrelevant for changeset extraction */
+ case RM_SMGR_ID:
+ case RM_CLOG_ID:
+ case RM_DBASE_ID:
+ case RM_TBLSPC_ID:
+ case RM_MULTIXACT_ID:
+ case RM_RELMAP_ID:
+ case RM_BTREE_ID:
+ case RM_HASH_ID:
+ case RM_GIN_ID:
+ case RM_GIST_ID:
+ case RM_SEQ_ID:
+ case RM_SPGIST_ID:
+ break;
+ case RM_NEXT_ID:
+ elog(ERROR, "unexpected NEXT_ID record");
+ }
+}
+
+static void
+DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ SnapBuild *builder = ctx->snapshot_builder;
+ ReorderBuffer *reorder = ctx->reorder;
+ XLogRecord *r = &buf->record;
+
+ /* no point in doing anything yet */
+ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+ return;
+
+ switch (r->xl_info & ~XLR_INFO_MASK)
+ {
+ case XLOG_XACT_COMMIT:
+ {
+ xl_xact_commit *xlrec;
+ TransactionId *subxacts = NULL;
+ SharedInvalidationMessage *invals = NULL;
+
+ xlrec = (xl_xact_commit *) buf->record_data;
+
+ subxacts = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+ invals = (SharedInvalidationMessage *) &(subxacts[xlrec->nsubxacts]);
+
+ /* FIXME: skip if wrong db? */
+
+ DecodeCommit(ctx, buf, r->xl_xid, xlrec->nsubxacts, subxacts,
+ xlrec->nmsgs, invals);
+
+ break;
+ }
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ xl_xact_commit_prepared *prec;
+ xl_xact_commit *xlrec;
+ TransactionId *subxacts;
+ SharedInvalidationMessage *invals = NULL;
+
+
+ prec = (xl_xact_commit_prepared *) buf->record_data;
+ xlrec = &prec->crec;
+
+ subxacts = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+ invals = (SharedInvalidationMessage *) &(subxacts[xlrec->nsubxacts]);
+
+ /* FIXME: skip if wrong db? */
+
+ DecodeCommit(ctx, buf, r->xl_xid, xlrec->nsubxacts, subxacts,
+ xlrec->nmsgs, invals);
+
+ break;
+ }
+ case XLOG_XACT_COMMIT_COMPACT:
+ {
+ xl_xact_commit_compact *xlrec;
+
+#if 0
+ /* FIXME: should we error out? */
+ elog(WARNING, "unexpectedly got compact commit");
+#endif
+ xlrec = (xl_xact_commit_compact *) buf->record_data;
+
+ DecodeCommit(ctx, buf, r->xl_xid,
+ xlrec->nsubxacts, xlrec->subxacts,
+ 0, NULL);
+ break;
+ }
+ case XLOG_XACT_ABORT:
+ {
+ xl_xact_abort *xlrec;
+ TransactionId *sub_xids;
+
+ xlrec = (xl_xact_abort *) buf->record_data;
+
+ sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+
+ DecodeAbort(ctx, buf->origptr, r->xl_xid,
+ sub_xids, xlrec->nsubxacts, false);
+ break;
+ }
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ xl_xact_abort_prepared *prec;
+ xl_xact_abort *xlrec;
+ TransactionId *sub_xids;
+
+ prec = (xl_xact_abort_prepared *) buf->record_data;
+ xlrec = &prec->arec;
+
+ sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+
+ /* r->xl_xid is committed in a separate record */
+ DecodeAbort(ctx, buf->origptr, prec->xid,
+ sub_xids, xlrec->nsubxacts, false);
+ break;
+ }
+
+ case XLOG_XACT_ASSIGNMENT:
+ {
+ xl_xact_assignment *xlrec;
+ int i;
+ TransactionId *sub_xid;
+
+ xlrec = (xl_xact_assignment *) buf->record_data;
+
+ sub_xid = &xlrec->xsub[0];
+
+ for (i = 0; i < xlrec->nsubxacts; i++)
+ {
+ ReorderBufferAssignChild(reorder, xlrec->xtop,
+ *(sub_xid++), buf->origptr);
+ }
+ break;
+ }
+ case XLOG_XACT_PREPARE:
+
+ /*
+ * XXX: we could replay the transaction and prepare it
+ * as well.
+ */
+ break;
+ default:
+ break;
+ }
+}
+
+static void
+DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ SnapBuild *builder = ctx->snapshot_builder;
+ XLogRecord *r = &buf->record;
+
+ switch (r->xl_info & ~XLR_INFO_MASK)
+ {
+ case XLOG_RUNNING_XACTS:
+ SnapBuildProcessRunningXacts(builder, buf->origptr,
+ (xl_running_xacts *) buf->record_data);
+ break;
+ case XLOG_STANDBY_LOCK:
+ break;
+ default:
+ elog(ERROR, "unexpected standby record type");
+ }
+}
+static void
+DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ SnapBuild *builder = ctx->snapshot_builder;
+
+ switch (buf->record.xl_info & ~XLR_INFO_MASK)
+ {
+ /* this is also used in END_OF_RECOVERY checkpoints */
+ case XLOG_CHECKPOINT_SHUTDOWN:
+ case XLOG_END_OF_RECOVERY:
+ SnapBuildSerializationPoint(builder, buf->origptr);
+
+ /*
+ * abort all transactions that still deemed to be in progress, they
+ * aren't actually in progress anymore. Do not abort prepared
+ * transactions that have been prepared for commit.
+ *
+ * FIXME: implement.
+ */
+ break;
+ case XLOG_CHECKPOINT_ONLINE:
+ /*
+ * a RUNNING_XACTS record will have been logged near to this, we
+ * can restart from there.
+ */
+ break;
+ case XLOG_NOOP:
+ case XLOG_NEXTOID:
+ case XLOG_SWITCH:
+ case XLOG_BACKUP_END:
+ case XLOG_PARAMETER_CHANGE:
+ case XLOG_RESTORE_POINT:
+ case XLOG_FPW_CHANGE:
+ case XLOG_FPI:
+ break;
+ }
+}
+
+static void
+DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ uint8 info = buf->record.xl_info & XLOG_HEAP_OPMASK;
+ TransactionId xid = buf->record.xl_xid;
+ SnapBuild *builder = ctx->snapshot_builder;
+
+ /* no point in doing anything yet */
+ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+ return;
+
+ switch (info)
+ {
+ case XLOG_HEAP_INSERT:
+ if (SnapBuildProcessChange(builder, xid, buf->origptr))
+ DecodeInsert(ctx, buf);
+ break;
+
+ /*
+ * Treat HOT update as normal updates, there is no useful
+ * information in the fact that we could make it a HOT update
+ * locally and the WAL layout is compatible.
+ */
+ case XLOG_HEAP_HOT_UPDATE:
+ case XLOG_HEAP_UPDATE:
+ if (SnapBuildProcessChange(builder, xid, buf->origptr))
+ DecodeUpdate(ctx, buf);
+ break;
+
+ case XLOG_HEAP_DELETE:
+ if (SnapBuildProcessChange(builder, xid, buf->origptr))
+ DecodeDelete(ctx, buf);
+ break;
+
+ case XLOG_HEAP_NEWPAGE:
+ /*
+ * XXX: There doesn't seem to be a usecase for decoding
+ * HEAP_NEWPAGE's. Its only used in various indexam's and CLUSTER,
+ * neither of which should be relevant for the logical
+ * changestream.
+ */
+ break;
+ case XLOG_HEAP_INPLACE:
+ /* cannot be important for our purposes, not part of transaction */
+ if (!TransactionIdIsValid(xid))
+ break;
+
+ SnapBuildProcessChange(builder, xid, buf->origptr);
+ /* heap_inplace is only done in catalog modifying txns */
+ ReorderBufferXidSetTimetravel(ctx->reorder, xid, buf->origptr);
+ break;
+ case XLOG_HEAP_LOCK:
+ break;
+ default:
+ elog(ERROR, "unexpected info value %u", info);
+ break;
+ }
+}
+
+static void
+DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ uint8 info = buf->record.xl_info & XLOG_HEAP_OPMASK;
+ TransactionId xid = buf->record.xl_xid;
+ SnapBuild *builder = ctx->snapshot_builder;
+
+ /* no point in doing anything yet */
+ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+ return;
+
+ switch (info)
+ {
+ case XLOG_HEAP2_MULTI_INSERT:
+ if (SnapBuildProcessChange(builder, xid, buf->origptr))
+ DecodeMultiInsert(ctx, buf);
+ break;
+ case XLOG_HEAP2_NEW_CID:
+ {
+ xl_heap_new_cid *xlrec;
+ xlrec = (xl_heap_new_cid *) buf->record_data;
+ SnapBuildProcessNewCid(builder, xid, buf->origptr, xlrec);
+
+ break;
+ }
+ /*
+ * everything else here is just low level stuff we're not
+ * interested in
+ */
+ case XLOG_HEAP2_FREEZE:
+ case XLOG_HEAP2_CLEAN:
+ case XLOG_HEAP2_CLEANUP_INFO:
+ case XLOG_HEAP2_VISIBLE:
+ case XLOG_HEAP2_LOCK_UPDATED:
+ break;
+ default:
+ elog(ERROR, "unexpected info value %u", info);
+ }
+}
+
+static void
+DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, TransactionId xid,
+ int nsubxacts, TransactionId *sub_xids,
+ int ninval_msgs, SharedInvalidationMessage *msgs)
+{
+ int i;
+
+ /* always need the invalidation messages */
+ if (ninval_msgs > 0)
+ {
+ ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
+ ninval_msgs, msgs);
+ ReorderBufferXidSetTimetravel(ctx->reorder, xid, buf->origptr);
+ }
+
+ SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
+ nsubxacts, sub_xids);
+
+ /*
+ * If we are not interested in anything up to this LSN convert the commit
+ * into an ABORT to cleanup.
+ *
+ * FIXME: this needs to replay invalidations anyway!
+ */
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr))
+ {
+ DecodeAbort(ctx, buf->origptr, xid, sub_xids, nsubxacts, true);
+ return;
+ }
+
+ for (i = 0; i < nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, *sub_xids,
+ buf->origptr, buf->endptr);
+ sub_xids++;
+ }
+
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr);
+}
+
+static void
+DecodeAbort(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
+ TransactionId *sub_xids, int nsubxacts, bool was_commit)
+{
+ int i;
+
+ /*
+ * this is a bit grotty, but if we're "faking" an abort we've already gone
+ * through
+ */
+ if (!was_commit)
+ SnapBuildAbortTxn(ctx->snapshot_builder, xid,
+ nsubxacts, sub_xids);
+
+
+ /* FIXME: process invalidations anyway if was_commit */
+
+ for (i = 0; i < nsubxacts; i++)
+ {
+ ReorderBufferAbort(ctx->reorder, *sub_xids, lsn);
+ sub_xids++;
+ }
+
+ ReorderBufferAbort(ctx->reorder, xid, lsn);
+}
+
+static void
+DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_insert *xlrec;
+ ReorderBufferChange *change;
+
+ xlrec = (xl_heap_insert *) buf->record_data;
+
+ /* XXX: nicer */
+ if (xlrec->target.node.dbNode != ctx->slot->database)
+ return;
+
+ change = ReorderBufferGetChange(ctx->reorder);
+ change->action = REORDER_BUFFER_CHANGE_INSERT;
+ memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+ {
+ Assert(r->xl_len > (SizeOfHeapInsert + SizeOfHeapHeader));
+
+ change->newtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+
+ DecodeXLogTuple((char *) xlrec + SizeOfHeapInsert,
+ r->xl_len - SizeOfHeapInsert,
+ change->newtuple);
+ }
+
+ ReorderBufferQueueChange(ctx->reorder, r->xl_xid, buf->origptr, change);
+}
+
+static void
+DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_update *xlrec;
+ xl_heap_header_len *xlhdr;
+ ReorderBufferChange *change;
+ char *data;
+
+ xlrec = (xl_heap_update *) buf->record_data;
+ xlhdr = (xl_heap_header_len *) (buf->record_data + SizeOfHeapUpdate);
+
+ /* XXX: nicer */
+ if (xlrec->target.node.dbNode != ctx->slot->database)
+ return;
+
+ change = ReorderBufferGetChange(ctx->reorder);
+ change->action = REORDER_BUFFER_CHANGE_UPDATE;
+ memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+ data = (char *) &xlhdr->header;
+
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+ {
+ Assert(r->xl_len > (SizeOfHeapUpdate + SizeOfHeapHeaderLen));
+
+ change->newtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+
+ DecodeXLogTuple(data,
+ xlhdr->t_len + SizeOfHeapHeader,
+ change->newtuple);
+ /* skip over the rest of the tuple header */
+ data += SizeOfHeapHeader;
+ /* skip over the tuple data */
+ data += xlhdr->t_len;
+ }
+
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_OLD_KEY)
+ {
+ xlhdr = (xl_heap_header_len *) data;
+ change->oldtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+ DecodeXLogTuple((char *) &xlhdr->header,
+ xlhdr->t_len + SizeOfHeapHeader,
+ change->oldtuple);
+ data = (char *) &xlhdr->header;
+ data += SizeOfHeapHeader;
+ data += xlhdr->t_len;
+ }
+
+ ReorderBufferQueueChange(ctx->reorder, r->xl_xid, buf->origptr, change);
+}
+
+static void
+DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_delete *xlrec;
+ ReorderBufferChange *change;
+
+ xlrec = (xl_heap_delete *) buf->record_data;
+
+ /* XXX: nicer */
+ if (xlrec->target.node.dbNode != ctx->slot->database)
+ return;
+
+ change = ReorderBufferGetChange(ctx->reorder);
+ change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+ memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+ /* old primary key stored */
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_OLD_KEY)
+ {
+ Assert(r->xl_len > (SizeOfHeapDelete + SizeOfHeapHeader));
+
+ change->oldtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+
+ DecodeXLogTuple((char *) xlrec + SizeOfHeapDelete,
+ r->xl_len - SizeOfHeapDelete,
+ change->oldtuple);
+ }
+ ReorderBufferQueueChange(ctx->reorder, r->xl_xid, buf->origptr, change);
+}
+
+/*
+ * Decode xl_heap_multi_insert record into multiple changes.
+ */
+static void
+DecodeMultiInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_multi_insert *xlrec;
+ int i;
+ char *data;
+ bool isinit = (r->xl_info & XLOG_HEAP_INIT_PAGE) != 0;
+
+ xlrec = (xl_heap_multi_insert *) buf->record_data;
+
+ /* XXX: nicer */
+ if (xlrec->node.dbNode != ctx->slot->database)
+ return;
+
+ data = buf->record_data + SizeOfHeapMultiInsert;
+
+ /*
+ * OffsetNumbers (which are not of interest to us) are stored when
+ * XLOG_HEAP_INIT_PAGE is not set -- skip over them.
+ */
+ if (!isinit)
+ data += sizeof(OffsetNumber) * xlrec->ntuples;
+
+ for (i = 0; i < xlrec->ntuples; i++)
+ {
+ ReorderBufferChange *change;
+ xl_multi_insert_tuple *xlhdr;
+ int datalen;
+ ReorderBufferTupleBuf *tuple;
+
+ change = ReorderBufferGetChange(ctx->reorder);
+ change->action = REORDER_BUFFER_CHANGE_INSERT;
+ memcpy(&change->relnode, &xlrec->node, sizeof(RelFileNode));
+
+ /*
+ * CONTAINS_NEW_TUPLE will always be set currently as multi_insert
+ * isn't used for catalogs, but better be future proof.
+ *
+ * We decode the tuple in pretty much the same way as DecodeXLogTuple,
+ * but since the layout is slightly different, we can't use it here.
+ */
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+ {
+ change->newtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+
+ tuple = change->newtuple;
+
+ /* not a disk based tuple */
+ ItemPointerSetInvalid(&tuple->tuple.t_self);
+
+ xlhdr = (xl_multi_insert_tuple *) SHORTALIGN(data);
+ data = ((char *) xlhdr) + SizeOfMultiInsertTuple;
+ datalen = xlhdr->datalen;
+
+ /* we can only figure this out after reassembling the transactions */
+ tuple->tuple.t_tableOid = InvalidOid;
+ tuple->tuple.t_data = &tuple->header;
+ tuple->tuple.t_len = datalen + offsetof(HeapTupleHeaderData, t_bits);
+
+ memset(&tuple->header, 0, sizeof(HeapTupleHeaderData));
+
+ memcpy((char *) &tuple->header + offsetof(HeapTupleHeaderData, t_bits),
+ (char *) data,
+ datalen);
+ data += datalen;
+
+ tuple->header.t_infomask = xlhdr->t_infomask;
+ tuple->header.t_infomask2 = xlhdr->t_infomask2;
+ tuple->header.t_hoff = xlhdr->t_hoff;
+ }
+
+ ReorderBufferQueueChange(ctx->reorder, r->xl_xid, buf->origptr, change);
+ }
+}
+
+/*
+ * Read a tuple of size 'len' from 'data' into 'tuple'.
+ */
+static void
+DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
+{
+ xl_heap_header xlhdr;
+ int datalen = len - SizeOfHeapHeader;
+
+ Assert(datalen >= 0);
+ Assert(datalen <= MaxHeapTupleSize);
+
+ tuple->tuple.t_len = datalen + offsetof(HeapTupleHeaderData, t_bits);
+
+ /* not a disk based tuple */
+ ItemPointerSetInvalid(&tuple->tuple.t_self);
+
+ /* we can only figure this out after reassembling the transactions */
+ tuple->tuple.t_tableOid = InvalidOid;
+ tuple->tuple.t_data = &tuple->header;
+
+ /* data is not stored aligned, copy to aligned storage */
+ memcpy((char *) &xlhdr,
+ data,
+ SizeOfHeapHeader);
+
+ memset(&tuple->header, 0, sizeof(HeapTupleHeaderData));
+
+ memcpy((char *) &tuple->header + offsetof(HeapTupleHeaderData, t_bits),
+ data + SizeOfHeapHeader,
+ datalen);
+
+ tuple->header.t_infomask = xlhdr.t_infomask;
+ tuple->header.t_infomask2 = xlhdr.t_infomask2;
+ tuple->header.t_hoff = xlhdr.t_hoff;
+}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
new file mode 100644
index 0000000..656e995
--- /dev/null
+++ b/src/backend/replication/logical/logical.c
@@ -0,0 +1,1046 @@
+/*-------------------------------------------------------------------------
+ *
+ * logical.c
+ *
+ * Logical decoding shared memory management
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logical/logical.c
+ *
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+#include <sys/stat.h>
+
+#include "access/transam.h"
+
+#include "fmgr.h"
+#include "miscadmin.h"
+
+#include "replication/logical.h"
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/fd.h"
+
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+/*
+ * logical replication on-disk data structures.
+ */
+typedef struct LogicalDecodingSlotOnDisk
+{
+ uint32 magic;
+ LogicalDecodingSlot slot;
+} LogicalDecodingSlotOnDisk;
+
+#define LOGICAL_MAGIC 0x1051CA1 /* format identifier */
+
+/* Control array for logical decoding */
+LogicalDecodingCtlData *LogicalDecodingCtl = NULL;
+
+/* My slot for logical rep in the shared memory array */
+LogicalDecodingSlot *MyLogicalDecodingSlot = NULL;
+
+/* user settable parameters */
+int max_logical_slots = 0; /* the maximum number of logical slots */
+
+static void LogicalSlotKill(int code, Datum arg);
+
+/* persistency functions */
+static void RestoreLogicalSlot(const char *name);
+static void CreateLogicalSlot(LogicalDecodingSlot *slot);
+static void SaveLogicalSlot(LogicalDecodingSlot *slot);
+static void SaveLogicalSlotInternal(LogicalDecodingSlot *slot, const char *path);
+static void DeleteLogicalSlot(LogicalDecodingSlot *slot);
+
+
+/* Report shared-memory space needed by LogicalDecodingShmemInit */
+Size
+LogicalDecodingShmemSize(void)
+{
+ Size size = 0;
+
+ if (max_logical_slots == 0)
+ return size;
+
+ size = offsetof(LogicalDecodingCtlData, logical_slots);
+ size = add_size(size,
+ mul_size(max_logical_slots, sizeof(LogicalDecodingSlot)));
+
+ return size;
+}
+
+/* Allocate and initialize walsender-related shared memory */
+void
+LogicalDecodingShmemInit(void)
+{
+ bool found;
+
+ if (max_logical_slots == 0)
+ return;
+
+ LogicalDecodingCtl = (LogicalDecodingCtlData *)
+ ShmemInitStruct("Logical Decoding Ctl", LogicalDecodingShmemSize(),
+ &found);
+
+ if (!found)
+ {
+ int i;
+
+ /* First time through, so initialize */
+ MemSet(LogicalDecodingCtl, 0, LogicalDecodingShmemSize());
+
+ LogicalDecodingCtl->xmin = InvalidTransactionId;
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *slot =
+ &LogicalDecodingCtl->logical_slots[i];
+
+ slot->xmin = InvalidTransactionId;
+ slot->effective_xmin = InvalidTransactionId;
+ SpinLockInit(&slot->mutex);
+ }
+ }
+}
+
+static void
+LogicalSlotKill(int code, Datum arg)
+{
+ /* LOCK? */
+ if (MyLogicalDecodingSlot && MyLogicalDecodingSlot->active)
+ {
+ MyLogicalDecodingSlot->active = false;
+ }
+ MyLogicalDecodingSlot = NULL;
+}
+
+/*
+ * Set the xmin required for catalog timetravel for the specific decoding slot.
+ */
+void
+IncreaseLogicalXminForSlot(XLogRecPtr lsn, TransactionId xmin)
+{
+ Assert(MyLogicalDecodingSlot != NULL);
+
+ SpinLockAcquire(&MyLogicalDecodingSlot->mutex);
+
+ /*
+ * Only increase if the previous values have been applied, otherwise we
+ * might never end up updating if the receiver acks too slowly.
+ */
+ if (MyLogicalDecodingSlot->candidate_lsn == InvalidXLogRecPtr ||
+ (lsn == MyLogicalDecodingSlot->candidate_lsn &&
+ !TransactionIdIsValid(MyLogicalDecodingSlot->candidate_xmin)))
+ {
+ MyLogicalDecodingSlot->candidate_lsn = lsn;
+ MyLogicalDecodingSlot->candidate_xmin = xmin;
+ elog(DEBUG1, "got new xmin %u at %X/%X", xmin,
+ (uint32) (lsn >> 32), (uint32) lsn);
+ }
+ SpinLockRelease(&MyLogicalDecodingSlot->mutex);
+}
+
+void
+IncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart_lsn)
+{
+ Assert(MyLogicalDecodingSlot != NULL);
+ Assert(restart_lsn != InvalidXLogRecPtr);
+ Assert(current_lsn != InvalidXLogRecPtr);
+
+ SpinLockAcquire(&MyLogicalDecodingSlot->mutex);
+
+ /*
+ * Only increase if the previous values have been applied, otherwise we
+ * might never end up updating if the receiver acks too slowly. A missed
+ * value here will just cause some extra effort after reconnecting.
+ */
+ if (MyLogicalDecodingSlot->candidate_lsn == InvalidXLogRecPtr ||
+ (current_lsn == MyLogicalDecodingSlot->candidate_lsn &&
+ MyLogicalDecodingSlot->candidate_restart_decoding == InvalidXLogRecPtr))
+ {
+ MyLogicalDecodingSlot->candidate_lsn = current_lsn;
+ MyLogicalDecodingSlot->candidate_restart_decoding = restart_lsn;
+
+ elog(DEBUG1, "got new restart lsn %X/%X at %X/%X",
+ (uint32) (restart_lsn >> 32), (uint32) restart_lsn,
+ (uint32) (current_lsn >> 32), (uint32) current_lsn);
+
+ }
+ SpinLockRelease(&MyLogicalDecodingSlot->mutex);
+}
+
+void
+LogicalConfirmReceivedLocation(XLogRecPtr lsn)
+{
+ Assert(lsn != InvalidXLogRecPtr);
+
+ /* Do an unlocked check for candidate_lsn first. */
+ if (MyLogicalDecodingSlot->candidate_lsn != InvalidXLogRecPtr)
+ {
+ bool updated_xmin = false;
+ bool updated_restart = false;
+
+ /* use volatile pointer to prevent code rearrangement */
+ volatile LogicalDecodingSlot *slot = MyLogicalDecodingSlot;
+
+ SpinLockAcquire(&slot->mutex);
+
+ slot->confirmed_flush = lsn;
+
+ /* if were past the location required for bumping xmin, do so */
+ if (slot->candidate_lsn != InvalidXLogRecPtr &&
+ slot->candidate_lsn < lsn)
+ {
+ /*
+ * We have to write the changed xmin to disk *before* we change
+ * the in-memory value, otherwise after a crash we wouldn't know
+ * that some catalog tuples might have been removed already.
+ *
+ * Ensure that by first writing to ->xmin and only update
+ * ->effective_xmin once the new state is fsynced to disk. After a
+ * crash ->effective_xmin is set to ->xmin.
+ */
+ if (TransactionIdIsValid(slot->candidate_xmin) &&
+ slot->xmin != slot->candidate_xmin)
+ {
+ slot->xmin = slot->candidate_xmin;
+ updated_xmin = true;
+ }
+
+ if (slot->candidate_restart_decoding != InvalidXLogRecPtr &&
+ slot->restart_decoding != slot->candidate_restart_decoding)
+ {
+ slot->restart_decoding = slot->candidate_restart_decoding;
+ updated_restart = true;
+ }
+
+ slot->candidate_lsn = InvalidXLogRecPtr;
+ slot->candidate_xmin = InvalidTransactionId;
+ slot->candidate_restart_decoding = InvalidXLogRecPtr;
+ }
+
+ SpinLockRelease(&slot->mutex);
+
+ /* first write new xmin to disk, so we know whats up after a crash */
+ if (updated_xmin || updated_restart)
+ /* cast away volatile, thats ok. */
+ SaveLogicalSlot((LogicalDecodingSlot *) slot);
+
+ /*
+ * now the new xmin is safely on disk, we can let the global value
+ * advance
+ */
+ if (updated_xmin)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->effective_xmin = slot->xmin;
+ SpinLockRelease(&slot->mutex);
+
+ ComputeLogicalXmin();
+ }
+ }
+ else
+ {
+ volatile LogicalDecodingSlot *slot = MyLogicalDecodingSlot;
+
+ SpinLockAcquire(&slot->mutex);
+ slot->confirmed_flush = lsn;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
+/*
+ * Compute the xmin between all of the decoding slots and store it in
+ * WalSndCtlData.
+ */
+void
+ComputeLogicalXmin(void)
+{
+ int i;
+ TransactionId xmin = InvalidTransactionId;
+ LogicalDecodingSlot *slot;
+
+ Assert(LogicalDecodingCtl);
+
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&slot->mutex);
+ if (slot->in_use &&
+ TransactionIdIsValid(slot->effective_xmin) && (
+ !TransactionIdIsValid(xmin) ||
+ TransactionIdPrecedes(slot->effective_xmin, xmin))
+ )
+ {
+ xmin = slot->effective_xmin;
+ }
+ SpinLockRelease(&slot->mutex);
+ }
+ LogicalDecodingCtl->xmin = xmin;
+ LWLockRelease(ProcArrayLock);
+
+ elog(DEBUG1, "computed new global xmin for decoding: %u", xmin);
+}
+
+/*
+ * Make sure the current settings & environment are capable of doing logical
+ * replication.
+ */
+void
+CheckLogicalReplicationRequirements(void)
+{
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ /* XXX invent class 51 for code 51028? */
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication requires wal_level=logical")));
+
+ if (MyDatabaseId == InvalidOid)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication requires to be connected to a database")));
+
+ if (max_logical_slots == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ (errmsg("logical replication requires needs max_logical_slots > 0"))));
+}
+
+/*
+ * Search for a free slot, mark it as used and acquire a valid xmin horizon
+ * value.
+ */
+void
+LogicalDecodingAcquireFreeSlot(const char *name, const char *plugin)
+{
+ LogicalDecodingSlot *slot;
+ bool name_in_use;
+ int i;
+
+ Assert(!MyLogicalDecodingSlot);
+
+ CheckLogicalReplicationRequirements();
+
+ LWLockAcquire(LogicalReplicationCtlLock, LW_EXCLUSIVE);
+
+ /* First, make sure the requested name is not in use. */
+
+ name_in_use = false;
+ for (i = 0; i < max_logical_slots && !name_in_use; i++)
+ {
+ LogicalDecodingSlot *s = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&s->mutex);
+ if (s->in_use && strcmp(name, NameStr(s->name)) == 0)
+ name_in_use = true;
+ SpinLockRelease(&s->mutex);
+ }
+
+ if (name_in_use)
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("There already is a logical slot named \"%s\"", name)));
+
+ /* Find the first available (not in_use (=> not active)) slot. */
+
+ slot = NULL;
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *s = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&s->mutex);
+ if (!s->in_use)
+ {
+ Assert(!s->active);
+ /* NOT releasing the lock yet */
+ slot = s;
+ break;
+ }
+ SpinLockRelease(&s->mutex);
+ }
+
+ LWLockRelease(LogicalReplicationCtlLock);
+
+ if (!slot)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+ errmsg("couldn't find free logical slot. free one or increase max_logical_slots")));
+
+ MyLogicalDecodingSlot = slot;
+
+ /* Lets start with enough information if we can */
+ if (!RecoveryInProgress())
+ slot->restart_decoding = LogStandbySnapshot();
+ else
+ slot->restart_decoding = GetRedoRecPtr();
+
+ slot->in_use = true;
+ slot->active = true;
+ slot->database = MyDatabaseId;
+ /* XXX: do we want to use truncate identifier instead? */
+ strncpy(NameStr(slot->plugin), plugin, NAMEDATALEN);
+ NameStr(slot->plugin)[NAMEDATALEN - 1] = '\0';
+ strncpy(NameStr(slot->name), name, NAMEDATALEN);
+ NameStr(slot->name)[NAMEDATALEN - 1] = '\0';
+
+ /* Arrange to clean up at exit/error */
+ on_shmem_exit(LogicalSlotKill, 0);
+
+ /* release slot so it can be examined by others */
+ SpinLockRelease(&slot->mutex);
+
+ /* XXX: verify that the specified plugin is valid */
+
+ /*
+ * Acquire the current global xmin value and directly set the logical xmin
+ * before releasing the lock if necessary. We do this so wal decoding is
+ * guaranteed to have all catalog rows produced by xacts with an xid >
+ * walsnd->xmin available.
+ *
+ * We can't use ComputeLogicalXmin here as that acquires ProcArrayLock
+ * separately which would open a short window for the global xmin to
+ * advance above walsnd->xmin.
+ */
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+ slot->effective_xmin = GetOldestXmin(true, true, true, true);
+ slot->xmin = slot->effective_xmin;
+
+ if (!TransactionIdIsValid(LogicalDecodingCtl->xmin) ||
+ NormalTransactionIdPrecedes(slot->effective_xmin, LogicalDecodingCtl->xmin))
+ LogicalDecodingCtl->xmin = slot->effective_xmin;
+ LWLockRelease(ProcArrayLock);
+
+ Assert(slot->effective_xmin <= GetOldestXmin(true, true, true, false));
+
+ LWLockAcquire(LogicalReplicationCtlLock, LW_EXCLUSIVE);
+ CreateLogicalSlot(slot);
+ LWLockRelease(LogicalReplicationCtlLock);
+}
+
+/*
+ * Find an previously initiated slot and mark it as used again.
+ */
+void
+LogicalDecodingReAcquireSlot(const char *name)
+{
+ LogicalDecodingSlot *slot;
+ int i;
+
+ CheckLogicalReplicationRequirements();
+
+ Assert(!MyLogicalDecodingSlot);
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&slot->mutex);
+ if (slot->in_use && strcmp(name, NameStr(slot->name)) == 0)
+ {
+ MyLogicalDecodingSlot = slot;
+ /* NOT releasing the lock yet */
+ break;
+ }
+ SpinLockRelease(&slot->mutex);
+ }
+
+ if (!MyLogicalDecodingSlot)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("couldn't find logical slot \"%s\"", name)));
+
+ slot = MyLogicalDecodingSlot;
+
+ if (slot->active)
+ {
+ SpinLockRelease(&slot->mutex);
+ MyLogicalDecodingSlot = NULL;
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg("slot already active")));
+ }
+
+ slot->active = true;
+ /* now that we've marked it as active, we release our lock */
+ SpinLockRelease(&slot->mutex);
+
+ /* Don't let the user switch the database... */
+ if (slot->database != MyDatabaseId)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->active = false;
+ SpinLockRelease(&slot->mutex);
+
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ (errmsg("START_LOGICAL_REPLICATION needs to be run in the same database as INIT_LOGICAL_REPLICATION"))));
+ }
+
+ /* Arrange to clean up at exit */
+ on_shmem_exit(LogicalSlotKill, 0);
+
+ SaveLogicalSlot(slot);
+}
+
+/*
+ * Temporarily remove a logical decoding slot, this or another backend can
+ * reacquire it later.
+ */
+void
+LogicalDecodingReleaseSlot(void)
+{
+ LogicalDecodingSlot *slot;
+
+ CheckLogicalReplicationRequirements();
+
+ slot = MyLogicalDecodingSlot;
+
+ Assert(slot != NULL && slot->active);
+
+ SpinLockAcquire(&slot->mutex);
+ slot->active = false;
+ SpinLockRelease(&slot->mutex);
+
+ MyLogicalDecodingSlot = NULL;
+
+ SaveLogicalSlot(slot);
+
+ cancel_shmem_exit(LogicalSlotKill, 0);
+}
+
+/*
+ * Permanently remove a logical decoding slot.
+ */
+void
+LogicalDecodingFreeSlot(const char *name)
+{
+ LogicalDecodingSlot *slot = NULL;
+ int i;
+
+ CheckLogicalReplicationRequirements();
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&slot->mutex);
+ if (slot->in_use && strcmp(name, NameStr(slot->name)) == 0)
+ {
+ /* NOT releasing the lock yet */
+ break;
+ }
+ SpinLockRelease(&slot->mutex);
+ slot = NULL;
+ }
+
+ if (!slot)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("couldn't find logical slot \"%s\"", name)));
+
+ if (slot->active)
+ {
+ SpinLockRelease(&slot->mutex);
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg("cannot free active logical slot \"%s\"", name)));
+ }
+
+ /*
+ * Mark it as as active, so nobody can claim this slot while we are
+ * working on it. We don't want to hold the spinlock while doing stuff
+ * like fsyncing the state file to disk.
+ */
+ slot->active = true;
+
+ SpinLockRelease(&slot->mutex);
+
+ /*
+ * Start critical section, we can't to be interrupted while on-disk/memory
+ * state aren't coherent.
+ */
+ START_CRIT_SECTION();
+
+ DeleteLogicalSlot(slot);
+
+ /* ok, everything gone, after a crash we now wouldn't restore this slot */
+ SpinLockAcquire(&slot->mutex);
+ slot->active = false;
+ slot->in_use = false;
+ SpinLockRelease(&slot->mutex);
+
+ END_CRIT_SECTION();
+
+ /* slot is dead and doesn't nail the xmin anymore */
+ ComputeLogicalXmin();
+}
+
+/*
+ * Load replication state from disk into memory at server startup.
+ */
+void
+StartupLogicalReplication(XLogRecPtr checkPointRedo)
+{
+ DIR *logical_dir;
+ struct dirent *logical_de;
+
+ ereport(DEBUG1,
+ (errmsg("starting up logical decoding from %X/%X",
+ (uint32) (checkPointRedo >> 32), (uint32) checkPointRedo)));
+
+ /* restore all slots */
+ logical_dir = AllocateDir("pg_llog");
+ while ((logical_de = ReadDir(logical_dir, "pg_llog")) != NULL)
+ {
+ if (strcmp(logical_de->d_name, ".") == 0 ||
+ strcmp(logical_de->d_name, "..") == 0)
+ continue;
+
+ /* one of our own directories */
+ if (strcmp(logical_de->d_name, "snapshots") == 0)
+ continue;
+
+ /* we crashed while a slot was being setup or deleted, clean up */
+ if (strcmp(logical_de->d_name, "new") == 0 ||
+ strcmp(logical_de->d_name, "old") == 0)
+ {
+ char path[MAXPGPATH];
+
+ sprintf(path, "pg_llog/%s", logical_de->d_name);
+
+ if (!rmtree(path, true))
+ {
+ FreeDir(logical_dir);
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not remove directory \"%s\": %m",
+ path)));
+ }
+ continue;
+ }
+
+ RestoreLogicalSlot(logical_de->d_name);
+ }
+ FreeDir(logical_dir);
+
+ if (max_logical_slots <= 0)
+ return;
+
+ /* Now that we have recovered all the data, compute logical xmin */
+ ComputeLogicalXmin();
+
+ ReorderBufferStartup();
+}
+
+/* ----
+ * Manipulation of ondisk state of logical slots
+ * ----
+ */
+static void
+CreateLogicalSlot(LogicalDecodingSlot *slot)
+{
+ char tmppath[MAXPGPATH];
+ char path[MAXPGPATH];
+
+ START_CRIT_SECTION();
+
+ sprintf(tmppath, "pg_llog/new");
+ sprintf(path, "pg_llog/%s", NameStr(slot->name));
+
+ if (mkdir(tmppath, S_IRWXU) < 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m",
+ tmppath)));
+
+ fsync_fname(tmppath, true);
+
+ SaveLogicalSlotInternal(slot, tmppath);
+
+ if (rename(tmppath, path) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+ tmppath, path)));
+ }
+
+ fsync_fname(path, true);
+
+ END_CRIT_SECTION();
+}
+
+static void
+SaveLogicalSlot(LogicalDecodingSlot *slot)
+{
+ char path[MAXPGPATH];
+
+ sprintf(path, "pg_llog/%s", NameStr(slot->name));
+ SaveLogicalSlotInternal(slot, path);
+}
+
+/*
+ * Shared functionality between saving and creating a logical slot.
+ */
+static void
+SaveLogicalSlotInternal(LogicalDecodingSlot *slot, const char *dir)
+{
+ char tmppath[MAXPGPATH];
+ char path[MAXPGPATH];
+ int fd;
+ LogicalDecodingSlotOnDisk cp;
+
+ /* silence valgrind :( */
+ memset(&cp, 0, sizeof(LogicalDecodingSlotOnDisk));
+
+ sprintf(tmppath, "%s/state.tmp", dir);
+ sprintf(path, "%s/state", dir);
+
+ START_CRIT_SECTION();
+
+ fd = OpenTransientFile(tmppath,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+ if (fd < 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not create logical checkpoint file \"%s\": %m",
+ tmppath)));
+
+ cp.magic = LOGICAL_MAGIC;
+
+ SpinLockAcquire(&slot->mutex);
+
+ cp.slot.xmin = slot->xmin;
+ cp.slot.effective_xmin = slot->effective_xmin;
+
+ strcpy(NameStr(cp.slot.name), NameStr(slot->name));
+ strcpy(NameStr(cp.slot.plugin), NameStr(slot->plugin));
+
+ cp.slot.database = slot->database;
+ cp.slot.confirmed_flush = slot->confirmed_flush;
+ cp.slot.restart_decoding = slot->restart_decoding;
+ cp.slot.candidate_lsn = InvalidXLogRecPtr;
+ cp.slot.candidate_xmin = InvalidTransactionId;
+ cp.slot.candidate_restart_decoding = InvalidXLogRecPtr;
+ cp.slot.in_use = slot->in_use;
+ cp.slot.active = false;
+
+ SpinLockRelease(&slot->mutex);
+
+ if ((write(fd, &cp, sizeof(cp))) != sizeof(cp))
+ {
+ CloseTransientFile(fd);
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not write logical checkpoint file \"%s\": %m",
+ tmppath)));
+ }
+
+ /* fsync the file */
+ if (pg_fsync(fd) != 0)
+ {
+ CloseTransientFile(fd);
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not fsync logical checkpoint \"%s\": %m",
+ tmppath)));
+ }
+
+ CloseTransientFile(fd);
+
+ /* rename to permanent file, fsync file and directory */
+ if (rename(tmppath, path) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+ tmppath, path)));
+ }
+
+ fsync_fname((char *) dir, true);
+ fsync_fname(path, false);
+
+ END_CRIT_SECTION();
+}
+
+
+static void
+DeleteLogicalSlot(LogicalDecodingSlot *slot)
+{
+ char path[MAXPGPATH];
+ char tmppath[] = "pg_llog/old";
+
+ START_CRIT_SECTION();
+
+ sprintf(path, "pg_llog/%s", NameStr(slot->name));
+
+ if (rename(path, tmppath) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+ path, tmppath)));
+ }
+
+ /* make sure no partial state is visible after a crash */
+ fsync_fname(tmppath, true);
+ fsync_fname("pg_llog", true);
+
+ if (!rmtree(tmppath, true))
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not remove directory \"%s\": %m",
+ tmppath)));
+ }
+
+ END_CRIT_SECTION();
+}
+
+/*
+ * Load a single ondisk slot into memory.
+ */
+static void
+RestoreLogicalSlot(const char *name)
+{
+ LogicalDecodingSlotOnDisk cp;
+ int i;
+ char path[MAXPGPATH];
+ int fd;
+ bool restored = false;
+ int readBytes;
+
+ START_CRIT_SECTION();
+
+ /* delete temp file if it exists */
+ sprintf(path, "pg_llog/%s/state.tmp", name);
+ if (unlink(path) < 0 && errno != ENOENT)
+ ereport(PANIC, (errmsg("failed while unlinking %s", path)));
+
+ sprintf(path, "pg_llog/%s/state", name);
+
+ elog(DEBUG1, "restoring logical slot from %s", path);
+
+ fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+
+ /*
+ * We do not need to handle this as we are rename()ing the directory into
+ * place only after we fsync()ed the state file.
+ */
+ if (fd < 0)
+ ereport(PANIC, (errmsg("could not open state file %s", path)));
+
+ readBytes = read(fd, &cp, sizeof(cp));
+ if (readBytes != sizeof(cp))
+ {
+ int saved_errno = errno;
+
+ CloseTransientFile(fd);
+ errno = saved_errno;
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not read logical checkpoint file \"%s\": %m, read %d of %zu",
+ path, readBytes, sizeof(cp))));
+ }
+
+ CloseTransientFile(fd);
+
+ if (cp.magic != LOGICAL_MAGIC)
+ ereport(PANIC, (errmsg("Logical checkpoint has wrong magic %u instead of %u",
+ cp.magic, LOGICAL_MAGIC)));
+
+ /* nothing can be active yet, don't lock anything */
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *slot;
+
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ if (slot->in_use)
+ continue;
+
+ slot->xmin = cp.slot.xmin;
+ /* XXX: after a crash, always use xmin, not effective_xmin */
+ slot->effective_xmin = cp.slot.xmin;
+ strcpy(NameStr(slot->name), NameStr(cp.slot.name));
+ strcpy(NameStr(slot->plugin), NameStr(cp.slot.plugin));
+ slot->database = cp.slot.database;
+ slot->restart_decoding = cp.slot.restart_decoding;
+ slot->confirmed_flush = cp.slot.confirmed_flush;
+ slot->candidate_lsn = InvalidXLogRecPtr;
+ slot->candidate_xmin = InvalidTransactionId;
+ slot->candidate_restart_decoding = InvalidXLogRecPtr;
+ slot->in_use = true;
+ slot->active = false;
+ restored = true;
+
+ /*
+ * FIXME: Do some validation here.
+ */
+ break;
+ }
+
+ if (!restored)
+ ereport(PANIC,
+ (errmsg("too many logical slots active before shutdown, increase max_logical_slots and try again")));
+
+ END_CRIT_SECTION();
+}
+
+
+static void
+LoadOutputPlugin(OutputPluginCallbacks *callbacks, char *plugin)
+{
+ /* lookup symbols in the shared libarary */
+
+ /* optional */
+ callbacks->init_cb = (LogicalDecodeInitCB)
+ load_external_function(plugin, "pg_decode_init", false, NULL);
+
+ /* required */
+ callbacks->begin_cb = (LogicalDecodeBeginCB)
+ load_external_function(plugin, "pg_decode_begin_txn", true, NULL);
+
+ /* required */
+ callbacks->change_cb = (LogicalDecodeChangeCB)
+ load_external_function(plugin, "pg_decode_change", true, NULL);
+
+ /* required */
+ callbacks->commit_cb = (LogicalDecodeCommitCB)
+ load_external_function(plugin, "pg_decode_commit_txn", true, NULL);
+
+ /* optional */
+ callbacks->cleanup_cb = (LogicalDecodeCleanupCB)
+ load_external_function(plugin, "pg_decode_clean", false, NULL);
+}
+
+/*
+ * Context management functions to make coordination between the different
+ * logical decoding pieces.
+ */
+
+/*
+ * Callbacks for ReorderBuffer which add in some more information and then call
+ * output_plugin.h plugins.
+ */
+static void
+begin_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+
+ ctx->callbacks.begin_cb(ctx, txn);
+}
+
+static void
+commit_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+
+ ctx->callbacks.commit_cb(ctx, txn, commit_lsn);
+}
+
+static void
+change_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+
+ ctx->callbacks.change_cb(ctx, txn, relation, change);
+}
+
+LogicalDecodingContext *
+CreateLogicalDecodingContext(LogicalDecodingSlot *slot,
+ bool is_init,
+ XLogRecPtr start_lsn,
+ List *output_plugin_options,
+ XLogPageReadCB read_page,
+ LogicalOutputPluginWriterPrepareWrite prepare_write,
+ LogicalOutputPluginWriterWrite do_write)
+{
+ MemoryContext context;
+ MemoryContext old_context;
+ TransactionId xmin_horizon;
+ LogicalDecodingContext *ctx;
+
+ context = AllocSetContextCreate(TopMemoryContext,
+ "ReorderBuffer",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ old_context = MemoryContextSwitchTo(context);
+ ctx = palloc0(sizeof(LogicalDecodingContext));
+
+
+ /* load output plugins first, so we detect a wrong output plugin early */
+ LoadOutputPlugin(&ctx->callbacks, NameStr(slot->plugin));
+
+ if (is_init && start_lsn != InvalidXLogRecPtr)
+ elog(ERROR, "cannot initially start at a specified lsn");
+
+ if (is_init)
+ xmin_horizon = slot->xmin;
+ else
+ xmin_horizon = InvalidTransactionId;
+
+ ctx->slot = slot;
+
+ ctx->reader = XLogReaderAllocate(read_page, ctx);
+ ctx->reader->private_data = ctx;
+
+ ctx->reorder = ReorderBufferAllocate();
+ ctx->snapshot_builder =
+ AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn);
+
+ ctx->reorder->private_data = ctx;
+
+ ctx->reorder->begin = begin_txn_wrapper;
+ ctx->reorder->apply_change = change_wrapper;
+ ctx->reorder->commit = commit_txn_wrapper;
+
+ ctx->out = makeStringInfo();
+ ctx->prepare_write = prepare_write;
+ ctx->write = do_write;
+
+ ctx->output_plugin_options = output_plugin_options;
+
+ if (is_init)
+ ctx->stop_after_consistent = true;
+ else
+ ctx->stop_after_consistent = false;
+
+ /* call output plugin initialization callback */
+ if (ctx->callbacks.init_cb != NULL)
+ ctx->callbacks.init_cb(ctx, is_init);
+
+ MemoryContextSwitchTo(old_context);
+
+ return ctx;
+}
+
+void
+FreeLogicalDecodingContext(LogicalDecodingContext *ctx)
+{
+ if (ctx->callbacks.cleanup_cb != NULL)
+ ctx->callbacks.cleanup_cb(ctx);
+}
+
+
+/* has the initial snapshot found a consistent state? */
+bool
+LogicalDecodingContextReady(LogicalDecodingContext *ctx)
+{
+ return SnapBuildCurrentState(ctx->snapshot_builder) == SNAPBUILD_CONSISTENT;
+}
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
new file mode 100644
index 0000000..9837a95
--- /dev/null
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -0,0 +1,361 @@
+/*-------------------------------------------------------------------------
+ *
+ * logicalfuncs.c
+ *
+ * Support functions for using xlog decoding
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logicalfuncs.c
+ *
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "fmgr.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "storage/fd.h"
+
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+
+Datum init_logical_replication(PG_FUNCTION_ARGS);
+Datum stop_logical_replication(PG_FUNCTION_ARGS);
+Datum pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS);
+
+/* FIXME: duplicate code with pg_xlogdump, similar to walsender.c */
+static void
+XLogRead(char *buf, XLogRecPtr startptr, Size count)
+{
+ char *p;
+ XLogRecPtr recptr;
+ Size nbytes;
+
+ static int sendFile = -1;
+ static XLogSegNo sendSegNo = 0;
+ static uint32 sendOff = 0;
+
+ p = buf;
+ recptr = startptr;
+ nbytes = count;
+
+ while (nbytes > 0)
+ {
+ uint32 startoff;
+ int segbytes;
+ int readbytes;
+
+ startoff = recptr % XLogSegSize;
+
+ if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))
+ {
+ char path[MAXPGPATH];
+
+ /* Switch to another logfile segment */
+ if (sendFile >= 0)
+ close(sendFile);
+
+ XLByteToSeg(recptr, sendSegNo);
+
+ XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+ sendFile = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);
+
+ if (sendFile < 0)
+ {
+ if (errno == ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("requested WAL segment %s has already been removed",
+ path)));
+ else
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m",
+ path)));
+ }
+ sendOff = 0;
+ }
+
+ /* Need to seek in the file? */
+ if (sendOff != startoff)
+ {
+ if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0)
+ {
+ char path[MAXPGPATH];
+
+ XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not seek in log segment %s to offset %u: %m",
+ path, startoff)));
+ }
+ sendOff = startoff;
+ }
+
+ /* How many bytes are within this segment? */
+ if (nbytes > (XLogSegSize - startoff))
+ segbytes = XLogSegSize - startoff;
+ else
+ segbytes = nbytes;
+
+ readbytes = read(sendFile, p, segbytes);
+ if (readbytes <= 0)
+ {
+ char path[MAXPGPATH];
+
+ XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read from log segment %s, offset %u, length %lu: %m",
+ path, sendOff, (unsigned long) segbytes)));
+ }
+
+ /* Update state for read */
+ recptr += readbytes;
+
+ sendOff += readbytes;
+ nbytes -= readbytes;
+ p += readbytes;
+ }
+}
+
+int
+logical_read_local_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr,
+ int reqLen, XLogRecPtr targetRecPtr, char *cur_page, TimeLineID *pageTLI)
+{
+ XLogRecPtr flushptr,
+ loc;
+ int count;
+
+ loc = targetPagePtr + reqLen;
+ while (1)
+ {
+ flushptr = GetFlushRecPtr();
+ if (loc <= flushptr)
+ break;
+ pg_usleep(1000L);
+ }
+
+ /* more than one block available */
+ if (targetPagePtr + XLOG_BLCKSZ <= flushptr)
+ count = XLOG_BLCKSZ;
+ /* not enough data there */
+ else if (targetPagePtr + reqLen > flushptr)
+ return -1;
+ /* part of the page available */
+ else
+ count = flushptr - targetPagePtr;
+
+ /* FIXME: more sensible/efficient implementation */
+ XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+
+ return count;
+}
+
+static void
+DummyWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ elog(ERROR, "init_logical_replication shouldn't be writing anything");
+}
+
+Datum
+init_logical_replication(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+ Name plugin = PG_GETARG_NAME(1);
+
+ char xpos[MAXFNAMELEN];
+
+ TupleDesc tupdesc;
+ HeapTuple tuple;
+ Datum result;
+ Datum values[2];
+ bool nulls[2];
+ LogicalDecodingContext *ctx = NULL;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ /* Acquire a logical replication slot */
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingAcquireFreeSlot(NameStr(*name), NameStr(*plugin));
+
+ /* make sure we don't end up with an unreleased slot */
+ PG_TRY();
+ {
+ XLogRecPtr startptr;
+
+ /*
+ * Use the same initial_snapshot_reader, but with our own read_page
+ * callback that does not depend on walsender.
+ */
+ ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, true,
+ InvalidXLogRecPtr, NIL,
+ logical_read_local_xlog_page,
+ DummyWrite, DummyWrite);
+
+ /* setup from where to read xlog */
+ startptr = ctx->slot->restart_decoding;
+
+ /* Wait for a consistent starting point */
+ for (;;)
+ {
+ XLogRecord *record;
+ XLogRecordBuffer buf;
+ char *err = NULL;
+
+ /* the read_page callback waits for new WAL */
+ record = XLogReadRecord(ctx->reader, startptr, &err);
+ if (err)
+ elog(ERROR, "%s", err);
+
+ Assert(record);
+
+ startptr = InvalidXLogRecPtr;
+
+ buf.origptr = ctx->reader->ReadRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+ DecodeRecordIntoReorderBuffer(ctx, &buf);
+
+ /* only continue till we found a consistent spot */
+ if (LogicalDecodingContextReady(ctx))
+ break;
+ }
+
+ /* Extract the values we want */
+ MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+ snprintf(xpos, sizeof(xpos), "%X/%X",
+ (uint32) (MyLogicalDecodingSlot->confirmed_flush >> 32),
+ (uint32) MyLogicalDecodingSlot->confirmed_flush);
+ }
+ PG_CATCH();
+ {
+ LogicalDecodingReleaseSlot();
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ values[0] = CStringGetTextDatum(NameStr(MyLogicalDecodingSlot->name));
+ values[1] = CStringGetTextDatum(xpos);
+
+ memset(nulls, 0, sizeof(nulls));
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ result = HeapTupleGetDatum(tuple);
+
+ LogicalDecodingReleaseSlot();
+
+ PG_RETURN_DATUM(result);
+}
+
+Datum
+stop_logical_replication(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingFreeSlot(NameStr(*name));
+
+ PG_RETURN_INT32(0);
+}
+
+/*
+ * Return one row for each logical replication slot currently in use.
+ */
+
+Datum
+pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS 6
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ MemoryContext per_query_ctx;
+ MemoryContext oldcontext;
+ int i;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("materialize mode required, but it is not " \
+ "allowed in this context")));
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+ tupstore = tuplestore_begin_heap(true, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = tupstore;
+ rsinfo->setDesc = tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *slot = &LogicalDecodingCtl->logical_slots[i];
+ Datum values[PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS];
+ bool nulls[PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS];
+ char location[MAXFNAMELEN];
+ const char *slot_name;
+ const char *plugin;
+ TransactionId xmin;
+ XLogRecPtr last_req;
+ bool active;
+ Oid database;
+
+ SpinLockAcquire(&slot->mutex);
+ if (!slot->in_use)
+ {
+ SpinLockRelease(&slot->mutex);
+ continue;
+ }
+ else
+ {
+ xmin = slot->xmin;
+ active = slot->active;
+ database = slot->database;
+ last_req = slot->restart_decoding;
+ slot_name = pstrdup(NameStr(slot->name));
+ plugin = pstrdup(NameStr(slot->plugin));
+ }
+ SpinLockRelease(&slot->mutex);
+
+ memset(nulls, 0, sizeof(nulls));
+
+ snprintf(location, sizeof(location), "%X/%X",
+ (uint32) (last_req >> 32), (uint32) last_req);
+
+ values[0] = CStringGetTextDatum(slot_name);
+ values[1] = CStringGetTextDatum(plugin);
+ values[2] = database;
+ values[3] = BoolGetDatum(active);
+ values[4] = TransactionIdGetDatum(xmin);
+ values[5] = CStringGetTextDatum(location);
+
+ tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+ }
+
+ tuplestore_donestoring(tupstore);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
new file mode 100644
index 0000000..b6df411
--- /dev/null
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -0,0 +1,2548 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer.c
+ *
+ * PostgreSQL logical replay buffer management
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/replication/reorderbuffer.c
+ *
+ * NOTES
+ * This module gets handed individual pieces of transactions in the order
+ * they are written to the WAL and is responsible to reassemble them into
+ * toplevel transaction sized pieces. When a transaction is completely
+ * reassembled - signalled by reading the transaction commit record - it
+ * will then call the output plugin (c.f. ReorderBufferCommit()) with the
+ * individual changes. The output plugins rely on snapshots built by
+ * snapbuild.c which hands them to us.
+ *
+ * Transactions and subtransactions/savepoints in postgres are not
+ * immediately linked to each other from outside the performing
+ * backend. Only at commit/abort (or special xact_assignment records) they
+ * are linked together. Which means that we will have to splice together a
+ * toplevel transaction from its subtransactions. To do that efficiently we
+ * build a binary heap indexed by the smallest current lsn of the individual
+ * subtransactions' changestreams. As the individual streams are inherently
+ * ordered by LSN - since that is where we build them from - the transaction
+ * can easily be reassembled by always using the subtransaction with the
+ * smallest current LSN from the heap.
+ *
+ * In order to cope with large transactions - which can be several times as
+ * big as the available memory - this module supports spooling the contents
+ * of a large transactions to disk. When the transaction is replayed the
+ * contents of individual (sub-)transactions will be read from disk in
+ * chunks.
+ *
+ * This module also has to deal with reassembling toast records from the
+ * individual chunks stored in WAL. When a new (or initial) version of a
+ * tuple is stored in WAL it will always be preceded by the toast chunks
+ * emitted for the columns stored out of line. Within a single toplevel
+ * transaction there will be no other data carrying records between a row's
+ * toast chunks and the row data itself. See ReorderBufferToast* for
+ * details.
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "access/transam.h"
+#include "access/xact.h"
+
+#include "catalog/catalog.h"
+
+#include "common/relpath.h"
+
+#include "lib/binaryheap.h"
+
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h" /* just for SnapBuildSnapDecRefcount */
+#include "replication/logical.h"
+
+#include "storage/bufmgr.h"
+#include "storage/fd.h"
+#include "storage/sinval.h"
+
+#include "utils/builtins.h"
+#include "utils/combocid.h"
+#include "utils/memdebug.h"
+#include "utils/memutils.h"
+#include "utils/relcache.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+#include "utils/tqual.h"
+
+/*
+ * For efficiency and simplicity reasons we want to keep Snapshots, CommandIds
+ * and ComboCids in the same list with the user visible INSERT/UPDATE/DELETE
+ * changes. We don't want to leak those internal values to external users
+ * though (they would just use switch()...default:) because that would make it
+ * harder to add to new user visible values.
+ *
+ * This needs to be synchronized with ReorderBufferChangeType! Adjust the
+ * StaticAssertExpr's in ReorderBufferAllocate if you add anything!
+ */
+typedef enum
+{
+ REORDER_BUFFER_CHANGE_INTERNAL_INSERT,
+ REORDER_BUFFER_CHANGE_INTERNAL_UPDATE,
+ REORDER_BUFFER_CHANGE_INTERNAL_DELETE,
+ REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT,
+ REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
+ REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID
+} ReorderBufferChangeTypeInternal;
+
+/* entry for a hash table we use to map from xid to our transaction state */
+typedef struct ReorderBufferTXNByIdEnt
+{
+ TransactionId xid;
+ ReorderBufferTXN *txn;
+} ReorderBufferTXNByIdEnt;
+
+/* data structures for (relfilenode, ctid) => (cmin, cmax) mapping */
+typedef struct ReorderBufferTupleCidKey
+{
+ RelFileNode relnode;
+ ItemPointerData tid;
+} ReorderBufferTupleCidKey;
+
+typedef struct ReorderBufferTupleCidEnt
+{
+ ReorderBufferTupleCidKey key;
+ CommandId cmin;
+ CommandId cmax;
+ CommandId combocid; /* just for debugging */
+} ReorderBufferTupleCidEnt;
+
+/* k-way in-order change iteration support structures */
+typedef struct ReorderBufferIterTXNEntry
+{
+ XLogRecPtr lsn;
+ ReorderBufferChange *change;
+ ReorderBufferTXN *txn;
+ int fd;
+ XLogSegNo segno;
+} ReorderBufferIterTXNEntry;
+
+typedef struct ReorderBufferIterTXNState
+{
+ binaryheap *heap;
+ Size nr_txns;
+ dlist_head old_change;
+ ReorderBufferIterTXNEntry entries[FLEXIBLE_ARRAY_MEMBER];
+} ReorderBufferIterTXNState;
+
+/* toast datastructures */
+typedef struct ReorderBufferToastEnt
+{
+ Oid chunk_id; /* toast_table.chunk_id */
+ int32 last_chunk_seq; /* toast_table.chunk_seq of the last chunk we
+ * have seen */
+ Size num_chunks; /* number of chunks we've already seen */
+ Size size; /* combined size of chunks seen */
+ dlist_head chunks; /* linked list of chunks */
+ struct varlena *reconstructed; /* reconstructed varlena now pointed
+ * to in main tup */
+} ReorderBufferToastEnt;
+
+
+/* number of changes kept in memory, per transaction */
+const Size max_memtries = 4096;
+
+/* Size of the slab caches used for frequently allocated objects */
+const Size max_cached_changes = 4096 * 2;
+const Size max_cached_tuplebufs = 1024; /* ~8MB */
+const Size max_cached_transactions = 512;
+
+
+/* ---------------------------------------
+ * primary reorderbuffer support routines
+ * ---------------------------------------
+ */
+static ReorderBufferTXN *ReorderBufferGetTXN(ReorderBuffer *rb);
+static void ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static ReorderBufferTXN *ReorderBufferTXNByXid(ReorderBuffer *rb,
+ TransactionId xid, bool create, bool *is_new,
+ XLogRecPtr lsn, bool create_as_top);
+
+static void AssertTXNLsnOrder(ReorderBuffer *rb);
+
+/* ---------------------------------------
+ * support functions for lsn-order iterating over the ->changes of a
+ * transaction and its subtransactions
+ *
+ * used for iteration over the k-way heap merge of a transaction and its
+ * subtransactions
+ * ---------------------------------------
+ */
+static ReorderBufferIterTXNState *ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static ReorderBufferChange *
+ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state);
+static void ReorderBufferIterTXNFinish(ReorderBuffer *rb,
+ ReorderBufferIterTXNState *state);
+static void ReorderBufferExecuteInvalidations(ReorderBuffer *rb, ReorderBufferTXN *txn);
+
+/*
+ * ---------------------------------------
+ * Disk serialization support functions
+ * ---------------------------------------
+ */
+static void ReorderBufferCheckSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ int fd, ReorderBufferChange *change);
+static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ int *fd, XLogSegNo *segno);
+static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ char *change);
+static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
+
+static void ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap);
+static Snapshot ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
+ ReorderBufferTXN *txn, CommandId cid);
+
+/* ---------------------------------------
+ * toast reassembly support
+ * ---------------------------------------
+ */
+/* Size of an EXTERNAL datum that contains a standard TOAST pointer */
+#define TOAST_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_external))
+
+/* Size of an indirect datum that contains a standard TOAST pointer */
+#define INDIRECT_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_indirect))
+
+static void ReorderBufferToastInitHash(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferToastReset(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change);
+static void ReorderBufferToastAppendChunk(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change);
+
+
+/*
+ * Allocate a new ReorderBuffer
+ */
+ReorderBuffer *
+ReorderBufferAllocate(void)
+{
+ ReorderBuffer *buffer;
+ HASHCTL hash_ctl;
+ MemoryContext new_ctx;
+
+ StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_INSERT == (int) REORDER_BUFFER_CHANGE_INSERT, "out of sync enums");
+ StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_UPDATE == (int) REORDER_BUFFER_CHANGE_UPDATE, "out of sync enums");
+ StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_DELETE == (int) REORDER_BUFFER_CHANGE_DELETE, "out of sync enums");
+
+ new_ctx = AllocSetContextCreate(TopMemoryContext,
+ "ReorderBuffer",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ buffer = (ReorderBuffer *) MemoryContextAlloc(new_ctx, sizeof(ReorderBuffer));
+
+ memset(&hash_ctl, 0, sizeof(hash_ctl));
+
+ buffer->context = new_ctx;
+
+ hash_ctl.keysize = sizeof(TransactionId);
+ hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt);
+ hash_ctl.hash = tag_hash;
+ hash_ctl.hcxt = buffer->context;
+
+ buffer->by_txn = hash_create("ReorderBufferByXid", 1000, &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+
+ buffer->by_txn_last_xid = InvalidTransactionId;
+ buffer->by_txn_last_txn = NULL;
+
+ buffer->nr_cached_transactions = 0;
+ buffer->nr_cached_changes = 0;
+ buffer->nr_cached_tuplebufs = 0;
+
+ buffer->outbuf = NULL;
+ buffer->outbufsize = 0;
+
+ buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
+
+ dlist_init(&buffer->toplevel_by_lsn);
+ dlist_init(&buffer->cached_transactions);
+ dlist_init(&buffer->cached_changes);
+ slist_init(&buffer->cached_tuplebufs);
+
+ return buffer;
+}
+
+/*
+ * Free a ReorderBuffer
+ */
+void
+ReorderBufferFree(ReorderBuffer *rb)
+{
+ MemoryContext context = rb->context;
+
+ /*
+ * We free separately allocated data by entirely scrapping oure personal
+ * memory context.
+ */
+ MemoryContextDelete(context);
+}
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferTXN.
+ */
+static ReorderBufferTXN *
+ReorderBufferGetTXN(ReorderBuffer *rb)
+{
+ ReorderBufferTXN *txn;
+
+ if (rb->nr_cached_transactions > 0)
+ {
+ rb->nr_cached_transactions--;
+ txn = (ReorderBufferTXN *)
+ dlist_container(ReorderBufferTXN, node,
+ dlist_pop_head_node(&rb->cached_transactions));
+ }
+ else
+ {
+ txn = (ReorderBufferTXN *)
+ MemoryContextAlloc(rb->context, sizeof(ReorderBufferTXN));
+ }
+
+ memset(txn, 0, sizeof(ReorderBufferTXN));
+
+ dlist_init(&txn->changes);
+ dlist_init(&txn->tuplecids);
+ dlist_init(&txn->subtxns);
+
+ return txn;
+}
+
+/*
+ * Free an ReorderBufferTXN. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ /* clean the lookup cache if we were cached (quite likely) */
+ if (rb->by_txn_last_xid == txn->xid)
+ {
+ rb->by_txn_last_xid = InvalidTransactionId;
+ rb->by_txn_last_txn = NULL;
+ }
+
+ if (txn->tuplecid_hash != NULL)
+ {
+ hash_destroy(txn->tuplecid_hash);
+ txn->tuplecid_hash = NULL;
+ }
+
+ if (txn->invalidations)
+ {
+ pfree(txn->invalidations);
+ txn->invalidations = NULL;
+ }
+
+ if (rb->nr_cached_transactions < max_cached_transactions)
+ {
+ rb->nr_cached_transactions++;
+ dlist_push_head(&rb->cached_transactions, &txn->node);
+ VALGRIND_MAKE_MEM_UNDEFINED(txn, sizeof(ReorderBufferTXN));
+ VALGRIND_MAKE_MEM_DEFINED(&txn->node, sizeof(txn->node));
+ }
+ else
+ {
+ pfree(txn);
+ }
+}
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferChange.
+ */
+ReorderBufferChange *
+ReorderBufferGetChange(ReorderBuffer *rb)
+{
+ ReorderBufferChange *change;
+
+ if (rb->nr_cached_changes)
+ {
+ rb->nr_cached_changes--;
+ change = (ReorderBufferChange *)
+ dlist_container(ReorderBufferChange, node,
+ dlist_pop_head_node(&rb->cached_changes));
+ }
+ else
+ {
+ change = (ReorderBufferChange *)
+ MemoryContextAlloc(rb->context, sizeof(ReorderBufferChange));
+ }
+
+ memset(change, 0, sizeof(ReorderBufferChange));
+ return change;
+}
+
+/*
+ * Free an ReorderBufferChange. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
+{
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ if (change->newtuple)
+ {
+ ReorderBufferReturnTupleBuf(rb, change->newtuple);
+ change->newtuple = NULL;
+ }
+
+ if (change->oldtuple)
+ {
+ ReorderBufferReturnTupleBuf(rb, change->oldtuple);
+ change->oldtuple = NULL;
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ if (change->snapshot)
+ {
+ ReorderBufferFreeSnap(rb, change->snapshot);
+ change->snapshot = NULL;
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ break;
+ }
+
+ if (rb->nr_cached_changes < max_cached_changes)
+ {
+ rb->nr_cached_changes++;
+ dlist_push_head(&rb->cached_changes, &change->node);
+ VALGRIND_MAKE_MEM_UNDEFINED(change, sizeof(ReorderBufferChange));
+ VALGRIND_MAKE_MEM_DEFINED(&change->node, sizeof(change->node));
+ }
+ else
+ {
+ pfree(change);
+ }
+}
+
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferTupleBuf
+ */
+ReorderBufferTupleBuf *
+ReorderBufferGetTupleBuf(ReorderBuffer *rb)
+{
+ ReorderBufferTupleBuf *tuple;
+
+ if (rb->nr_cached_tuplebufs)
+ {
+ rb->nr_cached_tuplebufs--;
+ tuple = slist_container(ReorderBufferTupleBuf, node,
+ slist_pop_head_node(&rb->cached_tuplebufs));
+#ifdef USE_ASSERT_CHECKING
+ memset(tuple, 0xdeadbeef, sizeof(ReorderBufferTupleBuf));
+#endif
+ }
+ else
+ {
+ tuple = (ReorderBufferTupleBuf *)
+ MemoryContextAlloc(rb->context, sizeof(ReorderBufferTupleBuf));
+ }
+
+ return tuple;
+}
+
+/*
+ * Free an ReorderBufferTupleBuf. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnTupleBuf(ReorderBuffer *rb, ReorderBufferTupleBuf *tuple)
+{
+ if (rb->nr_cached_tuplebufs < max_cached_tuplebufs)
+ {
+ rb->nr_cached_tuplebufs++;
+ slist_push_head(&rb->cached_tuplebufs, &tuple->node);
+ VALGRIND_MAKE_MEM_UNDEFINED(tuple, sizeof(ReorderBufferTupleBuf));
+ VALGRIND_MAKE_MEM_DEFINED(&tuple->node, sizeof(tuple->node));
+ }
+ else
+ {
+ pfree(tuple);
+ }
+}
+
+/*
+ * Return the ReorderBufferTXN from the given buffer, specified by Xid.
+ * If create is true, and a transaction doesn't already exist, create it
+ * (with the given LSN, and as top transaction if that's specified);
+ * when this happens, is_new is set to true.
+ */
+static ReorderBufferTXN *
+ReorderBufferTXNByXid(ReorderBuffer *rb, TransactionId xid, bool create,
+ bool *is_new, XLogRecPtr lsn, bool create_as_top)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferTXNByIdEnt *ent;
+ bool found;
+
+ Assert(TransactionIdIsValid(xid));
+ Assert(!create || lsn != InvalidXLogRecPtr);
+
+ /*
+ * Check the one-entry lookup cache first
+ */
+ if (TransactionIdIsValid(rb->by_txn_last_xid) &&
+ rb->by_txn_last_xid == xid)
+ {
+ txn = rb->by_txn_last_txn;
+
+ if (txn != NULL)
+ {
+ /* found it, and it's valid */
+ if (is_new)
+ *is_new = false;
+ return txn;
+ }
+
+ /*
+ * cached as non-existant, and asked not to create? Then nothing else
+ * to do.
+ */
+ if (!create)
+ return NULL;
+ /* otherwise fall through to create it */
+ }
+
+ /*
+ * If the cache wasn't hit or it yielded an "does-not-exist" and we want
+ * to create an entry.
+ */
+
+ /* search the lookup table */
+ ent = (ReorderBufferTXNByIdEnt *)
+ hash_search(rb->by_txn,
+ (void *) &xid,
+ create ? HASH_ENTER : HASH_FIND,
+ &found);
+ if (found)
+ txn = ent->txn;
+ else if (create)
+ {
+ /* initialize the new entry, if creation was requested */
+ Assert(ent != NULL);
+
+ ent->txn = ReorderBufferGetTXN(rb);
+ ent->txn->xid = xid;
+ txn = ent->txn;
+ txn->first_lsn = lsn;
+ txn->restart_decoding_lsn = rb->current_restart_decoding_lsn;
+
+ if (create_as_top)
+ {
+ dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
+ AssertTXNLsnOrder(rb);
+ }
+ }
+ else
+ txn = NULL; /* not found and not asked to create */
+
+ /* update cache */
+ rb->by_txn_last_xid = xid;
+ rb->by_txn_last_txn = txn;
+
+ if (is_new)
+ *is_new = !found;
+
+ Assert(!create || !!txn);
+ return txn;
+}
+
+/*
+ * Queue a change into a transaction so it can be replayed upon commit.
+ */
+void
+ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn,
+ ReorderBufferChange *change)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+
+ change->lsn = lsn;
+ Assert(InvalidXLogRecPtr != lsn);
+ dlist_push_tail(&txn->changes, &change->node);
+ txn->nentries++;
+ txn->nentries_mem++;
+
+ ReorderBufferCheckSerializeTXN(rb, txn);
+}
+
+static void
+AssertTXNLsnOrder(ReorderBuffer *rb)
+{
+#ifdef USE_ASSERT_CHECKING
+ dlist_iter iter;
+ XLogRecPtr prev_first_lsn = InvalidXLogRecPtr;
+
+ dlist_foreach(iter, &rb->toplevel_by_lsn)
+ {
+ ReorderBufferTXN *cur_txn;
+
+ cur_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
+ Assert(cur_txn->first_lsn != InvalidXLogRecPtr);
+
+ if (cur_txn->end_lsn != InvalidXLogRecPtr)
+ Assert(cur_txn->first_lsn <= cur_txn->end_lsn);
+
+ if (prev_first_lsn != InvalidXLogRecPtr)
+ Assert(prev_first_lsn < cur_txn->first_lsn);
+
+ Assert(!cur_txn->is_known_as_subxact);
+ prev_first_lsn = cur_txn->first_lsn;
+ }
+#endif
+}
+
+ReorderBufferTXN *
+ReorderBufferGetOldestTXN(ReorderBuffer *rb)
+{
+ ReorderBufferTXN *txn;
+
+ if (dlist_is_empty(&rb->toplevel_by_lsn))
+ return NULL;
+
+ AssertTXNLsnOrder(rb);
+
+ txn = dlist_head_element(ReorderBufferTXN, node, &rb->toplevel_by_lsn);
+
+ Assert(!txn->is_known_as_subxact);
+ Assert(txn->first_lsn != InvalidXLogRecPtr);
+ return txn;
+}
+
+void
+ReorderBufferSetRestartPoint(ReorderBuffer *rb, XLogRecPtr ptr)
+{
+ rb->current_restart_decoding_lsn = ptr;
+}
+
+void
+ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
+ TransactionId subxid, XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferTXN *subtxn;
+ bool new_top;
+ bool new_sub;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, &new_top, lsn, true);
+ subtxn = ReorderBufferTXNByXid(rb, subxid, true, &new_sub, lsn, false);
+
+ if (new_sub)
+ {
+ /*
+ * we assign subtransactions to top level transaction even if we don't
+ * have data for it yet, assignment records frequently reference xids
+ * that have not yet produced any records. Knowing those aren't top
+ * level xids allows us to make processing cheaper in some places.
+ */
+ dlist_push_tail(&txn->subtxns, &subtxn->node);
+ txn->nsubtxns++;
+ }
+ else if (!subtxn->is_known_as_subxact)
+ {
+ subtxn->is_known_as_subxact = true;
+ Assert(subtxn->nsubtxns == 0);
+
+ /* remove from lsn order list of top-level transactions */
+ dlist_delete(&subtxn->node);
+
+ /* add to toplevel transaction */
+ dlist_push_tail(&txn->subtxns, &subtxn->node);
+ txn->nsubtxns++;
+ }
+ else if (new_top)
+ {
+ elog(ERROR, "existing subxact assigned to unknown toplevel xact");
+ }
+}
+
+/*
+ * Associate a subtransaction with its toplevel transaction at commit
+ * time. There may be no further changes added after this.
+ */
+void
+ReorderBufferCommitChild(ReorderBuffer *rb, TransactionId xid,
+ TransactionId subxid, XLogRecPtr commit_lsn,
+ XLogRecPtr end_lsn)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferTXN *subtxn;
+
+ subtxn = ReorderBufferTXNByXid(rb, subxid, false, NULL,
+ InvalidXLogRecPtr, false);
+
+ /*
+ * No need to do anything if that subtxn didn't contain any changes
+ */
+ if (!subtxn)
+ return;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, commit_lsn, true);
+
+ if (txn == NULL)
+ elog(ERROR, "subxact logged without previous toplevel record");
+
+ subtxn->final_lsn = commit_lsn;
+ subtxn->end_lsn = end_lsn;
+
+ if (!subtxn->is_known_as_subxact)
+ {
+ subtxn->is_known_as_subxact = true;
+ Assert(subtxn->nsubtxns == 0);
+
+ /* remove from lsn order list of top-level transactions */
+ dlist_delete(&subtxn->node);
+
+ /* add to subtransaction list */
+ dlist_push_tail(&txn->subtxns, &subtxn->node);
+ txn->nsubtxns++;
+ }
+}
+
+
+/*
+ * Support for efficiently iterating over a transaction's and its
+ * subtransactions' changes.
+ *
+ * We do by doing a k-way merge between transactions/subtransactions. For that
+ * we model the current heads of the different transactions as a binary heap so
+ * we easily know which (sub-)transaction has the change with the smallest lsn
+ * next.
+ *
+ * We assume the changes in individual transactions are already sorted by LSN.
+ */
+
+/*
+ * Binary heap comparison function.
+ */
+static int
+ReorderBufferIterCompare(Datum a, Datum b, void *arg)
+{
+ ReorderBufferIterTXNState *state = (ReorderBufferIterTXNState *) arg;
+ XLogRecPtr pos_a = state->entries[DatumGetInt32(a)].lsn;
+ XLogRecPtr pos_b = state->entries[DatumGetInt32(b)].lsn;
+
+ if (pos_a < pos_b)
+ return 1;
+ else if (pos_a == pos_b)
+ return 0;
+ return -1;
+}
+
+/*
+ * Allocate & initialize an iterator which iterates in lsn order over a
+ * transaction and all its subtransactions.
+ */
+static ReorderBufferIterTXNState *
+ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ Size nr_txns = 0;
+ ReorderBufferIterTXNState *state;
+ dlist_iter cur_txn_i;
+ int32 off;
+
+ /*
+ * Calculate the size of our heap: one element for every transaction that
+ * contains changes. (Besides the transactions already in the reorder
+ * buffer, we count the one we were directly passed.)
+ */
+ if (txn->nentries > 0)
+ nr_txns++;
+
+ dlist_foreach(cur_txn_i, &txn->subtxns)
+ {
+ ReorderBufferTXN *cur_txn;
+
+ cur_txn = dlist_container(ReorderBufferTXN, node, cur_txn_i.cur);
+
+ if (cur_txn->nentries > 0)
+ nr_txns++;
+ }
+
+ /*
+ * XXX: Add fastpath for the rather common nr_txns=1 case, no need to
+ * allocate/build a heap in that case.
+ */
+
+ /* allocate iteration state */
+ state = (ReorderBufferIterTXNState *)
+ MemoryContextAllocZero(rb->context,
+ sizeof(ReorderBufferIterTXNState) +
+ sizeof(ReorderBufferIterTXNEntry) * nr_txns);
+
+ state->nr_txns = nr_txns;
+ dlist_init(&state->old_change);
+
+ for (off = 0; off < state->nr_txns; off++)
+ {
+ state->entries[off].fd = -1;
+ state->entries[off].segno = 0;
+ }
+
+ /* allocate heap */
+ state->heap = binaryheap_allocate(state->nr_txns, ReorderBufferIterCompare,
+ state);
+
+ /*
+ * Now insert items into the binary heap, unordered. (We will run a heap
+ * assembly step at the end; this is more efficient.)
+ */
+
+ off = 0;
+
+ /* add toplevel transaction if it contains changes */
+ if (txn->nentries > 0)
+ {
+ ReorderBufferChange *cur_change;
+
+ if (txn->nentries != txn->nentries_mem)
+ ReorderBufferRestoreChanges(rb, txn, &state->entries[off].fd,
+ &state->entries[off].segno);
+
+ cur_change = dlist_head_element(ReorderBufferChange, node,
+ &txn->changes);
+
+ state->entries[off].lsn = cur_change->lsn;
+ state->entries[off].change = cur_change;
+ state->entries[off].txn = txn;
+
+ binaryheap_add_unordered(state->heap, Int32GetDatum(off++));
+ }
+
+ /* add subtransactions if they contain changes */
+ dlist_foreach(cur_txn_i, &txn->subtxns)
+ {
+ ReorderBufferTXN *cur_txn;
+
+ cur_txn = dlist_container(ReorderBufferTXN, node, cur_txn_i.cur);
+
+ if (cur_txn->nentries > 0)
+ {
+ ReorderBufferChange *cur_change;
+
+ if (txn->nentries != txn->nentries_mem)
+ ReorderBufferRestoreChanges(rb, cur_txn,
+ &state->entries[off].fd,
+ &state->entries[off].segno);
+
+ cur_change = dlist_head_element(ReorderBufferChange, node,
+ &cur_txn->changes);
+
+ state->entries[off].lsn = cur_change->lsn;
+ state->entries[off].change = cur_change;
+ state->entries[off].txn = cur_txn;
+
+ binaryheap_add_unordered(state->heap, Int32GetDatum(off++));
+ }
+ }
+
+ /* assemble a valid binary heap */
+ binaryheap_build(state->heap);
+
+ return state;
+}
+
+/*
+ * FIXME: better comment and/or name
+ */
+static void
+ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ XLogSegNo first;
+ XLogSegNo cur;
+ XLogSegNo last;
+
+ Assert(txn->first_lsn != InvalidXLogRecPtr);
+ Assert(txn->final_lsn != InvalidXLogRecPtr);
+
+ XLByteToSeg(txn->first_lsn, first);
+ XLByteToSeg(txn->final_lsn, last);
+
+ for (cur = first; cur <= last; cur++)
+ {
+ char path[MAXPGPATH];
+ XLogRecPtr recptr;
+
+ XLogSegNoOffsetToRecPtr(cur, 0, recptr);
+
+ sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+ NameStr(MyLogicalDecodingSlot->name), txn->xid,
+ (uint32) (recptr >> 32), (uint32) recptr);
+ if (unlink(path) != 0 && errno != ENOENT)
+ elog(FATAL, "could not unlink file \"%s\": %m", path);
+ }
+}
+
+/*
+ * Return the next change when iterating over a transaction and its
+ * subtransaction.
+ *
+ * Returns NULL when no further changes exist.
+ */
+static ReorderBufferChange *
+ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
+{
+ ReorderBufferChange *change;
+ ReorderBufferIterTXNEntry *entry;
+ int32 off;
+
+ /* nothing there anymore */
+ if (state->heap->bh_size == 0)
+ return NULL;
+
+ off = DatumGetInt32(binaryheap_first(state->heap));
+ entry = &state->entries[off];
+
+ if (!dlist_is_empty(&entry->txn->subtxns))
+ elog(LOG, "tx with subtxn %u", entry->txn->xid);
+
+ /* free memory we might have "leaked" in the previous *Next call */
+ if (!dlist_is_empty(&state->old_change))
+ {
+ change = dlist_container(ReorderBufferChange, node,
+ dlist_pop_head_node(&state->old_change));
+ ReorderBufferReturnChange(rb, change);
+ Assert(dlist_is_empty(&state->old_change));
+ }
+
+ change = entry->change;
+
+ /*
+ * update heap with information about which transaction has the next
+ * relevant change in LSN order
+ */
+
+ /* there are in-memory changes */
+ if (dlist_has_next(&entry->txn->changes, &entry->change->node))
+ {
+ dlist_node *next = dlist_next_node(&entry->txn->changes, &change->node);
+ ReorderBufferChange *next_change =
+ dlist_container(ReorderBufferChange, node, next);
+
+ /* txn stays the same */
+ state->entries[off].lsn = next_change->lsn;
+ state->entries[off].change = next_change;
+
+ binaryheap_replace_first(state->heap, Int32GetDatum(off));
+ return change;
+ }
+
+ /* try to load changes from disk */
+ if (entry->txn->nentries != entry->txn->nentries_mem)
+ {
+ /*
+ * Ugly: restoring changes will reuse *Change records, thus delete the
+ * current one from the per-tx list and only free in the next call.
+ */
+ dlist_delete(&change->node);
+ dlist_push_tail(&state->old_change, &change->node);
+
+ if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->fd,
+ &state->entries[off].segno))
+ {
+ /* successfully restored changes from disk */
+ ReorderBufferChange *next_change =
+ dlist_head_element(ReorderBufferChange, node,
+ &entry->txn->changes);
+
+ elog(DEBUG2, "restored %zu/%zu changes from disk",
+ entry->txn->nentries_mem, entry->txn->nentries);
+ Assert(entry->txn->nentries_mem);
+ /* txn stays the same */
+ state->entries[off].lsn = next_change->lsn;
+ state->entries[off].change = next_change;
+ binaryheap_replace_first(state->heap, Int32GetDatum(off));
+
+ return change;
+ }
+ }
+
+ /* ok, no changes there anymore, remove */
+ binaryheap_remove_first(state->heap);
+
+ return change;
+}
+
+/*
+ * Deallocate the iterator
+ */
+static void
+ReorderBufferIterTXNFinish(ReorderBuffer *rb,
+ ReorderBufferIterTXNState *state)
+{
+ int32 off;
+
+ for (off = 0; off < state->nr_txns; off++)
+ {
+ if (state->entries[off].fd != -1)
+ CloseTransientFile(state->entries[off].fd);
+ }
+
+ /* free memory we might have "leaked" in the last *Next call */
+ if (!dlist_is_empty(&state->old_change))
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node,
+ dlist_pop_head_node(&state->old_change));
+ ReorderBufferReturnChange(rb, change);
+ Assert(dlist_is_empty(&state->old_change));
+ }
+
+ binaryheap_free(state->heap);
+ pfree(state);
+}
+
+/*
+ * Cleanup the contents of a transaction, usually after the transaction
+ * committed or aborted.
+ */
+static void
+ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ bool found;
+ dlist_mutable_iter iter;
+
+ /* cleanup subtransactions & their changes */
+ dlist_foreach_modify(iter, &txn->subtxns)
+ {
+ ReorderBufferTXN *subtxn;
+
+ subtxn = dlist_container(ReorderBufferTXN, node, iter.cur);
+ Assert(subtxn->is_known_as_subxact);
+ Assert(subtxn->nsubtxns == 0);
+
+ /*
+ * subtransactions are always associated to the toplevel TXN, even if
+ * they originally were happening inside another subtxn, so we won't
+ * ever recurse more than one level here.
+ */
+ ReorderBufferCleanupTXN(rb, subtxn);
+ }
+
+ /* cleanup changes in the toplevel txn */
+ dlist_foreach_modify(iter, &txn->changes)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ ReorderBufferReturnChange(rb, change);
+ }
+
+ /*
+ * cleanup the tuplecids we stored timetravel access. They are always
+ * stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+ Assert(change->action_internal == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+ ReorderBufferReturnChange(rb, change);
+ }
+
+ if (txn->base_snapshot != NULL)
+ {
+ SnapBuildSnapDecRefcount(txn->base_snapshot);
+ txn->base_snapshot = NULL;
+ }
+
+ /* delete from list of known subxacts */
+ if (txn->is_known_as_subxact)
+ {
+ dlist_delete(&txn->node);
+ }
+ /* delete from LSN ordered list of toplevel TXNs */
+ else
+ {
+ /* FIXME: adjust nsubxacts count of parent */
+ dlist_delete(&txn->node);
+ }
+
+ /* now remove reference from buffer */
+ hash_search(rb->by_txn,
+ (void *) &txn->xid,
+ HASH_REMOVE,
+ &found);
+ Assert(found);
+
+ /* remove entries spilled to disk */
+ if (txn->nentries != txn->nentries_mem)
+ ReorderBufferRestoreCleanup(rb, txn);
+
+ /* deallocate */
+ ReorderBufferReturnTXN(rb, txn);
+}
+
+/*
+ * Build a hash with a (relfilenode, ctid) -> (cmin, cmax) mapping for use by
+ * tqual.c's HeapTupleSatisfiesMVCCDuringDecoding.
+ */
+static void
+ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ dlist_iter iter;
+ HASHCTL hash_ctl;
+
+ if (!txn->does_timetravel || dlist_is_empty(&txn->tuplecids))
+ return;
+
+ memset(&hash_ctl, 0, sizeof(hash_ctl));
+
+ hash_ctl.keysize = sizeof(ReorderBufferTupleCidKey);
+ hash_ctl.entrysize = sizeof(ReorderBufferTupleCidEnt);
+ hash_ctl.hash = tag_hash;
+ hash_ctl.hcxt = rb->context;
+
+ /*
+ * create the hash with the exact number of to-be-stored tuplecids from
+ * the start
+ */
+ txn->tuplecid_hash =
+ hash_create("ReorderBufferTupleCid", txn->ntuplecids, &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+
+ dlist_foreach(iter, &txn->tuplecids)
+ {
+ ReorderBufferTupleCidKey key;
+ ReorderBufferTupleCidEnt *ent;
+ bool found;
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ Assert(change->action_internal == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* be careful about padding */
+ memset(&key, 0, sizeof(ReorderBufferTupleCidKey));
+
+ key.relnode = change->tuplecid.node;
+
+ ItemPointerCopy(&change->tuplecid.tid,
+ &key.tid);
+
+ ent = (ReorderBufferTupleCidEnt *)
+ hash_search(txn->tuplecid_hash,
+ (void *) &key,
+ HASH_ENTER | HASH_FIND,
+ &found);
+ if (!found)
+ {
+ ent->cmin = change->tuplecid.cmin;
+ ent->cmax = change->tuplecid.cmax;
+ ent->combocid = change->tuplecid.combocid;
+ }
+ else
+ {
+ Assert(ent->cmin == change->tuplecid.cmin);
+ Assert(ent->cmax == InvalidCommandId ||
+ ent->cmax == change->tuplecid.cmax);
+
+ /*
+ * if the tuple got valid in this transaction and now got deleted
+ * we already have a valid cmin stored. The cmax will be
+ * InvalidCommandId though.
+ */
+ ent->cmax = change->tuplecid.cmax;
+ }
+ }
+}
+
+/*
+ * Copy a provided snapshot so we can modify it privately. This is needed so
+ * that catalog modifying transactions can look into intermediate catalog
+ * states.
+ */
+static Snapshot
+ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
+ ReorderBufferTXN *txn, CommandId cid)
+{
+ Snapshot snap;
+ dlist_iter iter;
+ int i = 0;
+ Size size;
+
+ size = sizeof(SnapshotData) +
+ sizeof(TransactionId) * orig_snap->xcnt +
+ sizeof(TransactionId) * (txn->nsubtxns + 1);
+
+ elog(DEBUG1, "copying a non-transaction-specific snapshot into timetravel tx %u", txn->xid);
+
+ snap = MemoryContextAllocZero(rb->context, size);
+ memcpy(snap, orig_snap, sizeof(SnapshotData));
+
+ snap->copied = true;
+ snap->active_count = 0;
+ snap->regd_count = 0;
+ snap->xip = (TransactionId *) (snap + 1);
+
+ memcpy(snap->xip, orig_snap->xip, sizeof(TransactionId) * snap->xcnt);
+
+ /*
+ * ->subxip contains all txids that belong to our transaction which we
+ * need to check via cmin/cmax. Thats why we store the toplevel
+ * transaction in there as well.
+ */
+ snap->subxip = snap->xip + snap->xcnt;
+ snap->subxip[i++] = txn->xid;
+ snap->subxcnt = txn->nsubtxns + 1;
+
+ dlist_foreach(iter, &txn->subtxns)
+ {
+ ReorderBufferTXN *sub_txn;
+
+ sub_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
+ snap->subxip[i++] = sub_txn->xid;
+ }
+
+ /* sort so we can bsearch() later */
+ qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);
+
+ /* store the specified current CommandId */
+ snap->curcid = cid;
+
+ return snap;
+}
+
+/*
+ * Free a previously ReorderBufferCopySnap'ed snapshot
+ */
+static void
+ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap)
+{
+ if (snap->copied)
+ pfree(snap);
+ else
+ SnapBuildSnapDecRefcount(snap);
+}
+
+/*
+ * Commit a transaction and replay all actions that previously have been
+ * ReorderBufferQueueChange'd in the toplevel TX or any of the subtransactions
+ * assigned via ReorderBufferCommitChild.
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid, XLogRecPtr commit_lsn,
+ XLogRecPtr end_lsn)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferIterTXNState *iterstate = NULL;
+ ReorderBufferChange *change;
+ CommandId command_id = FirstCommandId;
+ volatile Snapshot snapshot_now;
+ Relation relation = NULL;
+ Oid reloid;
+ bool is_transaction_state = IsTransactionOrTransactionBlock();
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* empty transaction */
+ if (txn == NULL)
+ return;
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+
+ /* serialize the last bunch of changes if we need start earlier anyway */
+ if (txn->nentries_mem != txn->nentries)
+ ReorderBufferSerializeTXN(rb, txn);
+
+ /*
+ * If this transaction didn't have any real changes in our database, it's
+ * OK not to have a snapshot.
+ */
+ if (txn->base_snapshot == NULL)
+ return;
+
+ snapshot_now = txn->base_snapshot;
+
+ ReorderBufferBuildTupleCidHash(rb, txn);
+
+ /* setup initial snapshot */
+ SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+
+ PG_TRY();
+ {
+ /*
+ * Decoding needs access to syscaches et al., which in turn use
+ * heavyweight locks and such. Thus we need to have enough state around
+ * to keep track of those. The easiest way is to simply use a
+ * transaction internally. That also allows us to easily enforce that
+ * nothing writes to the database by checking for xid assignments.
+ *
+ * When we're called via the SQL SRF there's already a transaction
+ * started, so start an explicit subtransaction there.
+ */
+ if (is_transaction_state)
+ BeginInternalSubTransaction("replay");
+ else
+ StartTransactionCommand();
+
+ rb->begin(rb, txn);
+
+ iterstate = ReorderBufferIterTXNInit(rb, txn);
+ while ((change = ReorderBufferIterTXNNext(rb, iterstate)))
+ {
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ Assert(snapshot_now);
+
+ reloid = RelidByRelfilenode(change->relnode.spcNode,
+ change->relnode.relNode);
+
+ /*
+ * catalog tuple without data, while catalog has been
+ * rewritten
+ */
+ if (reloid == InvalidOid &&
+ change->newtuple == NULL && change->oldtuple == NULL)
+ continue;
+ else if (reloid == InvalidOid)
+ elog(ERROR, "could not lookup relation %s",
+ relpathperm(change->relnode, MAIN_FORKNUM));
+
+ relation = RelationIdGetRelation(reloid);
+
+ if (relation == NULL)
+ elog(ERROR, "could open relation descriptor %s",
+ relpathperm(change->relnode, MAIN_FORKNUM));
+
+ if (RelationIsLogicallyLogged(relation))
+ {
+ /* user-triggered change */
+ if (relation->rd_rel->relkind == RELKIND_SEQUENCE)
+ {
+ }
+ else if (!IsToastRelation(relation))
+ {
+ ReorderBufferToastReplace(rb, txn, relation, change);
+ rb->apply_change(rb, txn, relation, change);
+ ReorderBufferToastReset(rb, txn);
+ }
+ /* we're not interested in toast deletions */
+ else if (change->action == REORDER_BUFFER_CHANGE_INSERT)
+ {
+ /*
+ * need to reassemble change in memory, ensure it
+ * doesn't get reused till we're done.
+ */
+ dlist_delete(&change->node);
+ ReorderBufferToastAppendChunk(rb, txn, relation,
+ change);
+ }
+
+ }
+ RelationClose(relation);
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ /* XXX: we could skip snapshots in non toplevel txns */
+
+ /* get rid of the old */
+ RevertFromDecodingSnapshots();
+
+ if (snapshot_now->copied)
+ {
+ ReorderBufferFreeSnap(rb, snapshot_now);
+ snapshot_now =
+ ReorderBufferCopySnap(rb, change->snapshot,
+ txn, command_id);
+ }
+
+ /*
+ * restored from disk, we need to be careful not to double
+ * free. We could introduce refcounting for that, but for
+ * now this seems infrequent enough not to care.
+ */
+ else if (change->snapshot->copied)
+ {
+ snapshot_now =
+ ReorderBufferCopySnap(rb, change->snapshot,
+ txn, command_id);
+ }
+ else
+ {
+ snapshot_now = change->snapshot;
+ }
+
+
+ /* and start with the new one */
+ SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+ break;
+
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ if (!snapshot_now->copied)
+ {
+ /* we don't use the global one anymore */
+ snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
+ txn, command_id);
+ }
+
+ command_id = Max(command_id, change->command_id);
+
+ if (command_id != InvalidCommandId)
+ {
+ snapshot_now->curcid = command_id;
+
+ RevertFromDecodingSnapshots();
+ SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+ }
+
+ /*
+ * everytime the CommandId is incremented, we could see
+ * new catalog contents
+ */
+ ReorderBufferExecuteInvalidations(rb, txn);
+
+ break;
+
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ elog(ERROR, "tuplecid value in normal queue");
+ break;
+ }
+ }
+
+ ReorderBufferIterTXNFinish(rb, iterstate);
+
+ /* call commit callback */
+ rb->commit(rb, txn, commit_lsn);
+
+ /* make sure nothing has written anything */
+ if (GetTopTransactionIdIfAny() != InvalidTransactionId)
+ elog(ERROR, "cannot write during replay");
+
+ /*
+ * Abort subtransaction or aborting transaction as a whole has the
+ * right semantics. We want all locks acquired in here to be released,
+ * not reassinged to the parent and we do not want any database access
+ * have persistent effects.
+ */
+ if (is_transaction_state)
+ RollbackAndReleaseCurrentSubTransaction();
+ else
+ AbortCurrentTransaction();
+
+ /* make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+
+ /* cleanup */
+ RevertFromDecodingSnapshots();
+
+ if (snapshot_now->copied)
+ ReorderBufferFreeSnap(rb, snapshot_now);
+
+ ReorderBufferCleanupTXN(rb, txn);
+ }
+ PG_CATCH();
+ {
+ /* TODO: Encapsulate cleanup from the PG_TRY and PG_CATCH blocks */
+ if (iterstate)
+ ReorderBufferIterTXNFinish(rb, iterstate);
+
+ if (is_transaction_state)
+ RollbackAndReleaseCurrentSubTransaction();
+ else
+ AbortCurrentTransaction();
+
+ ReorderBufferExecuteInvalidations(rb, txn);
+
+ RevertFromDecodingSnapshots();
+
+ if (snapshot_now->copied)
+ ReorderBufferFreeSnap(rb, snapshot_now);
+
+ /*
+ * don't do a ReorderBufferCleanupTXN here, with the vague idea of
+ * allowing to retry decoding.
+ */
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Abort a transaction that possibly has previous changes. Needs to be done
+ * independently for toplevel and subtransactions.
+ */
+void
+ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* no changes in this commit */
+ if (txn == NULL)
+ return;
+
+ txn->final_lsn = lsn;
+
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
+ * Check whether a transaction is already known in this module
+ */
+bool
+ReorderBufferIsXidKnown(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+ return txn != NULL;
+}
+
+/*
+ * Add a new snapshot to this transaction that is only used after lsn 'lsn'.
+ */
+void
+ReorderBufferAddSnapshot(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, Snapshot snap)
+{
+ ReorderBufferChange *change = ReorderBufferGetChange(rb);
+
+ change->snapshot = snap;
+ change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT;
+
+ ReorderBufferQueueChange(rb, xid, lsn, change);
+}
+
+/*
+ * Setup the base snapshot of a transaction. That is the snapshot that is used
+ * to decode all changes until either this transaction modifies the catalog or
+ * another catalog modifying transaction commits.
+ */
+void
+ReorderBufferSetBaseSnapshot(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, Snapshot snap)
+{
+ ReorderBufferTXN *txn;
+ bool is_new;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, &is_new, lsn, true);
+ Assert(txn->base_snapshot == NULL);
+
+ txn->base_snapshot = snap;
+}
+
+/*
+ * Access the catalog with this CommandId at this point in the changestream.
+ *
+ * May only be called for command ids > 1
+ */
+void
+ReorderBufferAddNewCommandId(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, CommandId cid)
+{
+ ReorderBufferChange *change = ReorderBufferGetChange(rb);
+
+ change->command_id = cid;
+ change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID;
+
+ ReorderBufferQueueChange(rb, xid, lsn, change);
+}
+
+
+/*
+ * Add new (relfilenode, tid) -> (cmin, cmax) mappings.
+ */
+void
+ReorderBufferAddNewTupleCids(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, RelFileNode node,
+ ItemPointerData tid, CommandId cmin,
+ CommandId cmax, CommandId combocid)
+{
+ ReorderBufferChange *change = ReorderBufferGetChange(rb);
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+
+ change->tuplecid.node = node;
+ change->tuplecid.tid = tid;
+ change->tuplecid.cmin = cmin;
+ change->tuplecid.cmax = cmax;
+ change->tuplecid.combocid = combocid;
+ change->lsn = lsn;
+ change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
+
+ dlist_push_tail(&txn->tuplecids, &change->node);
+ txn->ntuplecids++;
+}
+
+/*
+ * Setup the invalidation of the toplevel transaction.
+ *
+ * This needs to be done before ReorderBufferCommit is called!
+ */
+void
+ReorderBufferAddInvalidations(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, Size nmsgs,
+ SharedInvalidationMessage *msgs)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+
+ if (txn->ninvalidations != 0)
+ elog(ERROR, "only ever add one set of invalidations");
+
+ Assert(nmsgs > 0);
+
+ txn->ninvalidations = nmsgs;
+ txn->invalidations = (SharedInvalidationMessage *)
+ MemoryContextAlloc(rb->context,
+ sizeof(SharedInvalidationMessage) * nmsgs);
+ memcpy(txn->invalidations, msgs, sizeof(SharedInvalidationMessage) * nmsgs);
+}
+
+/*
+ * Apply all invalidations we know. Possibly we only need parts at this point
+ * in the changestream but we don't know which those are.
+ */
+static void
+ReorderBufferExecuteInvalidations(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ int i;
+
+ for (i = 0; i < txn->ninvalidations; i++)
+ LocalExecuteInvalidationMessage(&txn->invalidations[i]);
+}
+
+/*
+ * Mark a transaction as doing timetravel.
+ */
+void
+ReorderBufferXidSetTimetravel(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+
+ txn->does_timetravel = true;
+}
+
+/*
+ * Query whether a transaction is already *known* to be doing timetravel. This
+ * can be wrong until directly before the commit!
+ */
+bool
+ReorderBufferXidDoesTimetravel(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+ if (txn == NULL)
+ return false;
+
+ return txn->does_timetravel;
+}
+
+/*
+ * Have we already added the first snapshot?
+ */
+bool
+ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* transaction isn't known yet, ergo no snapshot */
+ if (txn == NULL)
+ return false;
+
+ return txn->base_snapshot != NULL;
+}
+
+static void
+ReorderBufferSerializeReserve(ReorderBuffer *rb, Size sz)
+{
+ if (!rb->outbufsize)
+ {
+ rb->outbuf = MemoryContextAlloc(rb->context, sz);
+ rb->outbufsize = sz;
+ }
+ else if (rb->outbufsize < sz)
+ {
+ rb->outbuf = repalloc(rb->outbuf, sz);
+ rb->outbufsize = sz;
+ }
+}
+
+typedef struct ReorderBufferDiskChange
+{
+ Size size;
+ ReorderBufferChange change;
+ /* data follows */
+} ReorderBufferDiskChange;
+
+/*
+ * Persistency support
+ */
+static void
+ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ int fd, ReorderBufferChange *change)
+{
+ ReorderBufferDiskChange *ondisk;
+ Size sz = sizeof(ReorderBufferDiskChange);
+
+ ReorderBufferSerializeReserve(rb, sz);
+
+ ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+ memcpy(&ondisk->change, change, sizeof(ReorderBufferChange));
+
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ {
+ char *data;
+ Size oldlen = 0;
+ Size newlen = 0;
+
+ if (change->oldtuple)
+ oldlen = offsetof(ReorderBufferTupleBuf, data)
+ +change->oldtuple->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ if (change->newtuple)
+ newlen = offsetof(ReorderBufferTupleBuf, data)
+ +change->newtuple->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ sz += oldlen;
+ sz += newlen;
+
+ /* make sure we have enough space */
+ ReorderBufferSerializeReserve(rb, sz);
+
+ data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+ /* might have been reallocated above */
+ ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+
+ if (oldlen)
+ {
+ memcpy(data, change->oldtuple, oldlen);
+ data += oldlen;
+ Assert(&change->oldtuple->header == change->oldtuple->tuple.t_data);
+ }
+
+ if (newlen)
+ {
+ memcpy(data, change->newtuple, newlen);
+ data += newlen;
+ Assert(&change->newtuple->header == change->newtuple->tuple.t_data);
+ }
+ break;
+ }
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ {
+ char *data;
+
+ sz += sizeof(SnapshotData) +
+ sizeof(TransactionId) * change->snapshot->xcnt +
+ sizeof(TransactionId) * change->snapshot->subxcnt
+ ;
+
+ /* make sure we have enough space */
+ ReorderBufferSerializeReserve(rb, sz);
+ data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+ /* might have been reallocated above */
+ ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+
+ memcpy(data, change->snapshot, sizeof(SnapshotData));
+ data += sizeof(SnapshotData);
+
+ if (change->snapshot->xcnt)
+ {
+ memcpy(data, change->snapshot->xip,
+ sizeof(TransactionId) + change->snapshot->xcnt);
+ data += sizeof(TransactionId) + change->snapshot->xcnt;
+ }
+
+ if (change->snapshot->subxcnt)
+ {
+ memcpy(data, change->snapshot->subxip,
+ sizeof(TransactionId) + change->snapshot->subxcnt);
+ data += sizeof(TransactionId) + change->snapshot->subxcnt;
+ }
+ break;
+ }
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ /* ReorderBufferChange contains everything important */
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ /* ReorderBufferChange contains everything important */
+ break;
+ }
+
+ ondisk->size = sz;
+
+ if (write(fd, rb->outbuf, ondisk->size) != ondisk->size)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to xid data file \"%u\": %m",
+ txn->xid)));
+ }
+
+ Assert(ondisk->change.action_internal == change->action_internal);
+}
+
+static void
+ReorderBufferCheckSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ /* FIXME subtxn handling? */
+ if (txn->nentries_mem >= max_memtries)
+ {
+ ReorderBufferSerializeTXN(rb, txn);
+ Assert(txn->nentries_mem == 0);
+ }
+}
+
+static void
+ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ dlist_iter subtxn_i;
+ dlist_mutable_iter change_i;
+ int fd = -1;
+ XLogSegNo curOpenSegNo = 0;
+ Size spilled = 0;
+ char path[MAXPGPATH];
+
+ elog(DEBUG2, "spill %zu changes in tx %u to disk",
+ txn->nentries_mem, txn->xid);
+
+ /* do the same to all child TXs */
+ dlist_foreach(subtxn_i, &txn->subtxns)
+ {
+ ReorderBufferTXN *subtxn;
+
+ subtxn = dlist_container(ReorderBufferTXN, node, subtxn_i.cur);
+ ReorderBufferSerializeTXN(rb, subtxn);
+ }
+
+ /* serialize changestream */
+ dlist_foreach_modify(change_i, &txn->changes)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, change_i.cur);
+
+ /*
+ * store in segment in which it belongs by start lsn, don't split over
+ * multiple segments tho
+ */
+ if (fd == -1 || XLByteInSeg(change->lsn, curOpenSegNo))
+ {
+ XLogRecPtr recptr;
+
+ if (fd != -1)
+ CloseTransientFile(fd);
+
+ XLByteToSeg(change->lsn, curOpenSegNo);
+ XLogSegNoOffsetToRecPtr(curOpenSegNo, 0, recptr);
+
+ sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+ NameStr(MyLogicalDecodingSlot->name), txn->xid,
+ (uint32) (recptr >> 32), (uint32) recptr);
+
+ /* open segment, create it if necessary */
+ fd = OpenTransientFile(path,
+ O_CREAT | O_WRONLY | O_APPEND | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+
+ if (fd < 0)
+ ereport(ERROR, (errmsg("could not open reorderbuffer file %s for writing: %m", path)));
+ }
+
+ ReorderBufferSerializeChange(rb, txn, fd, change);
+ dlist_delete(&change->node);
+ ReorderBufferReturnChange(rb, change);
+
+ spilled++;
+ }
+
+ Assert(spilled == txn->nentries_mem);
+ Assert(dlist_is_empty(&txn->changes));
+ txn->nentries_mem = 0;
+
+ if (fd != -1)
+ CloseTransientFile(fd);
+
+ /* issue write barrier */
+ /* serialize main transaction state */
+}
+
+static Size
+ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ int *fd, XLogSegNo *segno)
+{
+ Size restored = 0;
+ XLogSegNo last_segno;
+ dlist_mutable_iter cleanup_iter;
+
+ Assert(txn->first_lsn != InvalidXLogRecPtr);
+ Assert(txn->final_lsn != InvalidXLogRecPtr);
+
+ /* free current entries, so we have memory for more */
+ dlist_foreach_modify(cleanup_iter, &txn->changes)
+ {
+ ReorderBufferChange *cleanup =
+ dlist_container(ReorderBufferChange, node, cleanup_iter.cur);
+
+ dlist_delete(&cleanup->node);
+ ReorderBufferReturnChange(rb, cleanup);
+ }
+ txn->nentries_mem = 0;
+ Assert(dlist_is_empty(&txn->changes));
+
+ XLByteToSeg(txn->final_lsn, last_segno);
+
+ while (restored < max_memtries && *segno <= last_segno)
+ {
+ int readBytes;
+ ReorderBufferDiskChange *ondisk;
+
+ if (*fd == -1)
+ {
+ XLogRecPtr recptr;
+ char path[MAXPGPATH];
+
+ /* first time in */
+ if (*segno == 0)
+ {
+ XLByteToSeg(txn->first_lsn, *segno);
+ elog(LOG, "initial restoring from %zu to %zu",
+ *segno, last_segno);
+ }
+
+ Assert(*segno != 0 || dlist_is_empty(&txn->changes));
+ XLogSegNoOffsetToRecPtr(*segno, 0, recptr);
+
+ sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+ NameStr(MyLogicalDecodingSlot->name), txn->xid,
+ (uint32) (recptr >> 32), (uint32) recptr);
+
+ elog(LOG, "opening file %s", path);
+
+ *fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+ if (*fd < 0 && errno == ENOENT)
+ {
+ *fd = -1;
+ (*segno)++;
+ continue;
+ }
+ else if (*fd < 0)
+ ereport(ERROR, (errmsg("could not open reorderbuffer file %s for reading: %m", path)));
+
+ }
+
+ ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
+
+
+ /*
+ * read the statically sized part of a change which has information
+ * about the total size. If we couldn't read a record, we're at the
+ * end of this file.
+ */
+
+ readBytes = read(*fd, rb->outbuf, sizeof(ReorderBufferDiskChange));
+
+ /* eof */
+ if (readBytes == 0)
+ {
+ CloseTransientFile(*fd);
+ *fd = -1;
+ (*segno)++;
+ continue;
+ }
+ else if (readBytes < 0)
+ elog(ERROR, "read failed: %m");
+ else if (readBytes != sizeof(ReorderBufferDiskChange))
+ elog(ERROR, "incomplete read, read %d instead of %zu",
+ readBytes, sizeof(ReorderBufferDiskChange));
+
+ ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+
+ ReorderBufferSerializeReserve(rb,
+ sizeof(ReorderBufferDiskChange) + ondisk->size);
+ ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+
+ readBytes = read(*fd, rb->outbuf + sizeof(ReorderBufferDiskChange),
+ ondisk->size - sizeof(ReorderBufferDiskChange));
+
+ if (readBytes < 0)
+ elog(ERROR, "read2 failed: %m");
+ else if (readBytes != ondisk->size - sizeof(ReorderBufferDiskChange))
+ elog(ERROR, "incomplete read2, read %d instead of %zu",
+ readBytes, ondisk->size - sizeof(ReorderBufferDiskChange));
+
+ /*
+ * ok, read a full change from disk, now restore it into proper
+ * in-memory format
+ */
+ ReorderBufferRestoreChange(rb, txn, rb->outbuf);
+ restored++;
+ }
+
+ return restored;
+}
+
+/*
+ * Convert change from its on-disk format to in-memory format and queue it onto
+ * the TXN's ->changes list.
+ */
+static void
+ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ char *data)
+{
+ ReorderBufferDiskChange *ondisk;
+ ReorderBufferChange *change;
+
+ ondisk = (ReorderBufferDiskChange *) data;
+
+ change = ReorderBufferGetChange(rb);
+
+ /* copy static part */
+ memcpy(change, &ondisk->change, sizeof(ReorderBufferChange));
+
+ data += sizeof(ReorderBufferDiskChange);
+
+ /* restore individual stuff */
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ if (change->newtuple)
+ {
+ Size len = offsetof(ReorderBufferTupleBuf, data)
+ +((ReorderBufferTupleBuf *) data)->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ change->newtuple = ReorderBufferGetTupleBuf(rb);
+ memcpy(change->newtuple, data, len);
+ change->newtuple->tuple.t_data = &change->newtuple->header;
+
+ data += len;
+ }
+
+ if (change->oldtuple)
+ {
+ Size len = offsetof(ReorderBufferTupleBuf, data)
+ +((ReorderBufferTupleBuf *) data)->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ change->oldtuple = ReorderBufferGetTupleBuf(rb);
+ memcpy(change->oldtuple, data, len);
+ change->oldtuple->tuple.t_data = &change->oldtuple->header;
+ data += len;
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ {
+ Snapshot oldsnap = (Snapshot) data;
+ Size size = sizeof(SnapshotData) +
+ sizeof(TransactionId) * oldsnap->xcnt +
+ sizeof(TransactionId) * (oldsnap->subxcnt + 0)
+ ;
+
+ Assert(change->snapshot != NULL);
+
+ change->snapshot = MemoryContextAllocZero(rb->context, size);
+
+ memcpy(change->snapshot, data, size);
+ change->snapshot->xip = (TransactionId *)
+ (((char *) change->snapshot) + sizeof(SnapshotData));
+ change->snapshot->subxip =
+ change->snapshot->xip + change->snapshot->xcnt + 0;
+ change->snapshot->copied = true;
+ break;
+ }
+ /* nothing needs to be done */
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ break;
+ }
+
+ dlist_push_tail(&txn->changes, &change->node);
+ txn->nentries_mem++;
+}
+
+/*
+ * Delete all data spilled to disk after we've restarted/crashed. It will be
+ * recreated when the respective slots are reused.
+ */
+void
+ReorderBufferStartup(void)
+{
+ DIR *logical_dir;
+ struct dirent *logical_de;
+
+ DIR *spill_dir;
+ struct dirent *spill_de;
+
+ logical_dir = AllocateDir("pg_llog");
+ while ((logical_de = ReadDir(logical_dir, "pg_llog")) != NULL)
+ {
+ char path[MAXPGPATH];
+
+ if (strcmp(logical_de->d_name, ".") == 0 ||
+ strcmp(logical_de->d_name, "..") == 0)
+ continue;
+
+ /* one of our own directories */
+ if (strcmp(logical_de->d_name, "snapshots") == 0)
+ continue;
+
+ /*
+ * ok, has to be a surviving logical slot, iterate and delete
+ * everythign starting with xid-*
+ */
+ sprintf(path, "pg_llog/%s", logical_de->d_name);
+
+ spill_dir = AllocateDir(path);
+ while ((spill_de = ReadDir(spill_dir, "pg_llog")) != NULL)
+ {
+ if (strcmp(spill_de->d_name, ".") == 0 ||
+ strcmp(spill_de->d_name, "..") == 0)
+ continue;
+
+ if (strncmp(spill_de->d_name, "xid", 3) == 0)
+ {
+ sprintf(path, "pg_llog/%s/%s", logical_de->d_name,
+ spill_de->d_name);
+
+ if (unlink(path) != 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not remove xid data file \"%s\": %m",
+ path)));
+ }
+ /* XXX: WARN? */
+ }
+ FreeDir(spill_dir);
+ }
+ FreeDir(logical_dir);
+}
+
+/*
+ * toast support
+ */
+
+/*
+ * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h?
+ */
+#define VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr) \
+do { \
+ varattrib_1b_e *attre = (varattrib_1b_e *) (attr); \
+ Assert(VARATT_IS_EXTERNAL(attre)); \
+ Assert(VARSIZE_EXTERNAL(attre) == sizeof(toast_pointer) + VARHDRSZ_EXTERNAL); \
+ memcpy(&(toast_pointer), VARDATA_EXTERNAL(attre), sizeof(toast_pointer)); \
+} while (0)
+
+#define VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) \
+ ((toast_pointer).va_extsize < (toast_pointer).va_rawsize - VARHDRSZ)
+
+/*
+ * Initialize per tuple toast reconstruction support.
+ */
+static void
+ReorderBufferToastInitHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ HASHCTL hash_ctl;
+
+ Assert(txn->toast_hash == NULL);
+
+ memset(&hash_ctl, 0, sizeof(hash_ctl));
+ hash_ctl.keysize = sizeof(Oid);
+ hash_ctl.entrysize = sizeof(ReorderBufferToastEnt);
+ hash_ctl.hash = tag_hash;
+ hash_ctl.hcxt = rb->context;
+ txn->toast_hash = hash_create("ReorderBufferToastHash", 5, &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+}
+
+/*
+ * Per toast-chunk handling for toast reconstruction
+ *
+ * Appends a toast chunk so we can reconstruct it when the tuple "owning" the
+ * toasted Datum comes along.
+ */
+static void
+ReorderBufferToastAppendChunk(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ ReorderBufferToastEnt *ent;
+ bool found;
+ int32 chunksize;
+ bool isnull;
+ Pointer chunk;
+ TupleDesc desc = RelationGetDescr(relation);
+ Oid chunk_id;
+ Oid chunk_seq;
+
+ if (txn->toast_hash == NULL)
+ ReorderBufferToastInitHash(rb, txn);
+
+ Assert(IsToastRelation(relation));
+
+ chunk_id = DatumGetObjectId(fastgetattr(&change->newtuple->tuple, 1, desc, &isnull));
+ Assert(!isnull);
+ chunk_seq = DatumGetInt32(fastgetattr(&change->newtuple->tuple, 2, desc, &isnull));
+ Assert(!isnull);
+
+ ent = (ReorderBufferToastEnt *)
+ hash_search(txn->toast_hash,
+ (void *) &chunk_id,
+ HASH_ENTER,
+ &found);
+
+ if (!found)
+ {
+ Assert(ent->chunk_id == chunk_id);
+ ent->num_chunks = 0;
+ ent->last_chunk_seq = 0;
+ ent->size = 0;
+ ent->reconstructed = NULL;
+ dlist_init(&ent->chunks);
+
+ if (chunk_seq != 0)
+ elog(ERROR, "got sequence entry %d for toast chunk %u instead of seq 0",
+ chunk_seq, chunk_id);
+ }
+ else if (found && chunk_seq != ent->last_chunk_seq + 1)
+ elog(ERROR, "got sequence entry %d for toast chunk %u instead of seq %d",
+ chunk_seq, chunk_id, ent->last_chunk_seq + 1);
+
+ chunk = DatumGetPointer(fastgetattr(&change->newtuple->tuple, 3, desc, &isnull));
+ Assert(!isnull);
+
+ /* calculate size so we can allocate the right size at once later */
+ if (!VARATT_IS_EXTENDED(chunk))
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ else if (VARATT_IS_SHORT(chunk))
+ /* could happen due to heap_form_tuple doing its thing */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ else
+ elog(ERROR, "unexpected type of toast chunk");
+
+ ent->size += chunksize;
+ ent->last_chunk_seq = chunk_seq;
+ ent->num_chunks++;
+ dlist_push_tail(&ent->chunks, &change->node);
+}
+
+/*
+ * Rejigger change->newtuple to point to in-memory toast tuples instead to
+ * on-disk toast tuples that may not longer exist (think DROP TABLE or VACUUM).
+ *
+ * We cannot replace unchanged toast tuples though, so those will still point
+ * to on-disk toast data.
+ */
+static void
+ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ TupleDesc desc;
+ int natt;
+ Datum *attrs;
+ bool *isnull;
+ bool *free;
+ HeapTuple newtup;
+ Relation toast_rel;
+ TupleDesc toast_desc;
+ MemoryContext oldcontext;
+
+ /* no toast tuples changed */
+ if (txn->toast_hash == NULL)
+ return;
+
+ oldcontext = MemoryContextSwitchTo(rb->context);
+
+ /* we should only have toast tuples in an INSERT or UPDATE */
+ Assert(change->newtuple);
+
+ desc = RelationGetDescr(relation);
+
+ toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
+ toast_desc = RelationGetDescr(toast_rel);
+
+ /* should we allocate from stack instead? */
+ attrs = palloc0(sizeof(Datum) * desc->natts);
+ isnull = palloc0(sizeof(bool) * desc->natts);
+ free = palloc0(sizeof(bool) * desc->natts);
+
+ heap_deform_tuple(&change->newtuple->tuple, desc,
+ attrs, isnull);
+
+ for (natt = 0; natt < desc->natts; natt++)
+ {
+ Form_pg_attribute attr = desc->attrs[natt];
+ ReorderBufferToastEnt *ent;
+ struct varlena *varlena;
+
+ /* va_rawsize is the size of the original datum -- including header */
+ struct varatt_external toast_pointer;
+ struct varatt_indirect redirect_pointer;
+ struct varlena *new_datum = NULL;
+ struct varlena *reconstructed;
+ dlist_iter it;
+ Size data_done = 0;
+
+ /* system columns aren't toasted */
+ if (attr->attnum < 0)
+ continue;
+
+ if (attr->attisdropped)
+ continue;
+
+ /* not a varlena datatype */
+ if (attr->attlen != -1)
+ continue;
+
+ /* no data */
+ if (isnull[natt])
+ continue;
+
+ /* ok, we know we have a toast datum */
+ varlena = (struct varlena *) DatumGetPointer(attrs[natt]);
+
+ /* no need to do anything if the tuple isn't external */
+ if (!VARATT_IS_EXTERNAL(varlena))
+ continue;
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, varlena);
+
+ /*
+ * check whether the toast tuple changed, replace if so.
+ */
+ ent = (ReorderBufferToastEnt *)
+ hash_search(txn->toast_hash,
+ (void *) &toast_pointer.va_valueid,
+ HASH_FIND,
+ NULL);
+ if (ent == NULL)
+ continue;
+
+ new_datum =
+ (struct varlena *) palloc0(INDIRECT_POINTER_SIZE);
+
+ free[natt] = true;
+
+ reconstructed = palloc0(toast_pointer.va_rawsize);
+
+ ent->reconstructed = reconstructed;
+
+ /* stitch toast tuple back together from its parts */
+ dlist_foreach(it, &ent->chunks)
+ {
+ bool isnull;
+ ReorderBufferTupleBuf *tup =
+ dlist_container(ReorderBufferChange, node, it.cur)->newtuple;
+ Pointer chunk =
+ DatumGetPointer(fastgetattr(&tup->tuple, 3, toast_desc, &isnull));
+
+ Assert(!isnull);
+ Assert(!VARATT_IS_EXTERNAL(chunk));
+ Assert(!VARATT_IS_SHORT(chunk));
+
+ memcpy(VARDATA(reconstructed) + data_done,
+ VARDATA(chunk),
+ VARSIZE(chunk) - VARHDRSZ);
+ data_done += VARSIZE(chunk) - VARHDRSZ;
+ }
+ Assert(data_done == toast_pointer.va_extsize);
+
+ /* make sure its marked as compressed or not */
+ if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+ SET_VARSIZE_COMPRESSED(reconstructed, data_done + VARHDRSZ);
+ else
+ SET_VARSIZE(reconstructed, data_done + VARHDRSZ);
+
+ memset(&redirect_pointer, 0, sizeof(redirect_pointer));
+ redirect_pointer.pointer = reconstructed;
+
+ SET_VARTAG_EXTERNAL(new_datum, VARTAG_INDIRECT);
+ memcpy(VARDATA_EXTERNAL(new_datum), &redirect_pointer,
+ sizeof(redirect_pointer));
+
+ attrs[natt] = PointerGetDatum(new_datum);
+ }
+
+ /*
+ * Build tuple in separate memory & copy tuple back into the tuplebuf
+ * passed to the output plugin. We can't directly heap_fill_tuple() into
+ * the tuplebuf because attrs[] will point back into the current content.
+ */
+ newtup = heap_form_tuple(desc, attrs, isnull);
+ Assert(change->newtuple->tuple.t_len <= MaxHeapTupleSize);
+ Assert(&change->newtuple->header == change->newtuple->tuple.t_data);
+
+ memcpy(change->newtuple->tuple.t_data,
+ newtup->t_data,
+ newtup->t_len);
+ change->newtuple->tuple.t_len = newtup->t_len;
+
+ /*
+ * free resources we won't further need, more persistent stuff will be
+ * free'd in ReorderBufferToastReset().
+ */
+ RelationClose(toast_rel);
+ pfree(newtup);
+ for (natt = 0; natt < desc->natts; natt++)
+ {
+ if (free[natt])
+ pfree(DatumGetPointer(attrs[natt]));
+ }
+ pfree(attrs);
+ pfree(free);
+ pfree(isnull);
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Free all resources allocated for toast reconstruction.
+ */
+static void
+ReorderBufferToastReset(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ HASH_SEQ_STATUS hstat;
+ ReorderBufferToastEnt *ent;
+
+ if (txn->toast_hash == NULL)
+ return;
+
+ /* sequentially walk over the hash and free everything */
+ hash_seq_init(&hstat, txn->toast_hash);
+ while ((ent = (ReorderBufferToastEnt *) hash_seq_search(&hstat)) != NULL)
+ {
+ dlist_mutable_iter it;
+
+ if (ent->reconstructed != NULL)
+ pfree(ent->reconstructed);
+
+ dlist_foreach_modify(it, &ent->chunks)
+ {
+ ReorderBufferChange *change =
+ dlist_container(ReorderBufferChange, node, it.cur);
+
+ dlist_delete(&change->node);
+ ReorderBufferReturnChange(rb, change);
+ }
+ }
+
+ hash_destroy(txn->toast_hash);
+ txn->toast_hash = NULL;
+}
+
+
+/*
+ * Visibility support routines
+ */
+
+/*-------------------------------------------------------------------------
+ * Lookup actual cmin/cmax values during timetravel access. We can't always
+ * rely on stored cmin/cmax values because of two scenarios:
+ *
+ * * A tuple got changed multiple times during a single transaction and thus
+ * has got a combocid. Combocid's are only valid for the duration of a single
+ * transaction.
+ * * A tuple with a cmin but no cmax (and thus no combocid) got deleted/updated
+ * in another transaction than the one which created it which we are looking
+ * at right now. As only one of cmin, cmax or combocid is actually stored in
+ * the heap we don't have access to the the value we need anymore.
+ *
+ * To resolve those problems we have a per-transaction hash of (cmin, cmax)
+ * tuples keyed by (relfilenode, ctid) which contains the actual (cmin, cmax)
+ * values. That also takes care of combocids by simply not caring about them at
+ * all. As we have the real cmin/cmax values thats enough.
+ *
+ * As we only care about catalog tuples here the overhead of this hashtable
+ * should be acceptable.
+ * -------------------------------------------------------------------------
+ */
+extern bool
+ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
+ HeapTuple htup, Buffer buffer,
+ CommandId *cmin, CommandId *cmax)
+{
+ ReorderBufferTupleCidKey key;
+ ReorderBufferTupleCidEnt *ent;
+ ForkNumber forkno;
+ BlockNumber blockno;
+
+ /* be careful about padding */
+ memset(&key, 0, sizeof(key));
+
+ Assert(!BufferIsLocal(buffer));
+
+ /*
+ * get relfilenode from the buffer, no convenient way to access it other
+ * than that.
+ */
+ BufferGetTag(buffer, &key.relnode, &forkno, &blockno);
+
+ /* tuples can only be in the main fork */
+ Assert(forkno == MAIN_FORKNUM);
+ Assert(blockno == ItemPointerGetBlockNumber(&htup->t_self));
+
+ ItemPointerCopy(&htup->t_self,
+ &key.tid);
+
+ ent = (ReorderBufferTupleCidEnt *)
+ hash_search(tuplecid_data,
+ (void *) &key,
+ HASH_FIND,
+ NULL);
+
+ if (ent == NULL)
+ return false;
+
+ if (cmin)
+ *cmin = ent->cmin;
+ if (cmax)
+ *cmax = ent->cmax;
+ return true;
+}
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
new file mode 100644
index 0000000..6547e3f
--- /dev/null
+++ b/src/backend/replication/logical/snapbuild.c
@@ -0,0 +1,1581 @@
+/*-------------------------------------------------------------------------
+ *
+ * snapbuild.c
+ *
+ * Support for building timetravel snapshots based on the contents of the
+ * WAL which then can be used to decode the contents of the WAL.
+ *
+ * NOTES:
+ *
+ * We build snapshots which can *only* be used to read catalog contents by
+ * reading and interpreting the WAL stream. The aim is to build a snapshot that
+ * behaves the same as a freshly taken MVCC snapshot would have at the time the
+ * XLogRecord was generated.
+ *
+ * To build the snapshots we reuse the infrastructure built for hot
+ * standby. The snapshots we build look different than HS' because we have
+ * different needs. To successfully decode data from the WAL we only need to
+ * access catalogs/(sys|rel|cat)cache, not the actual user tables since the
+ * data we decode is contained in the WAL records. Also, our snapshots need to
+ * be different in comparison to normal MVCC ones because in contrast to those
+ * we cannot fully rely on the clog and pg_subtrans for information about
+ * committed transactions because they might commit in the future from the POV
+ * of the wal entry we're currently decoding.
+ *
+ * As the percentage of transactions modifying the catalog normally is fairly
+ * small in comparisons to ones only manipulating user data we keep track of
+ * the committed catalog modifying ones inside (xmin, xmax) instead of keeping
+ * track of all running transactions like its done in a normal snapshot. Note
+ * that we're generally only looking at transactions that have acquired an
+ * xid. That is we keep a list of transactions between snapshot->(xmin, xmax)
+ * that we consider committed, everything else is considered aborted/in
+ * progress. That also allows us not to care about subtransactions before they
+ * have committed which means this modules, in contrast to HS, doesn't have to
+ * care about suboverflowed subtransactions and similar.
+ *
+ * One complexity of doing this is that to e.g. handle mixed DDL/DML
+ * transactions we need Snapshots that see intermediate versions of the catalog
+ * in a transaction. During normal operation this is achieved by using
+ * CommandIds/cmin/cmax. The problem with that however is that for space
+ * efficiency reasons only one value of that is stored (c.f. combocid.c). Since
+ * Combocids are only available in memory we log additional information which
+ * allows us to get the original (cmin, cmax) pair during visibility
+ * checks. Check the reorderbuffer.c's comment above
+ * ResolveCminCmaxDuringDecoding() for details.
+ *
+ * To facilitate all this we need our own visibility routine, as the normal
+ * ones are optimized for different usecases. To make sure no unexpected
+ * database access bypassing our special snapshot is possible - which would
+ * possibly load invalid data into caches - we temporarily overload the
+ * .satisfies methods of the usual snapshots while doing timetravel.
+ *
+ * To replace the normal catalog snapshots with timetravel ones use the
+ * SetupDecodingSnapshots and RevertFromDecodingSnapshots functions.
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/snapbuild.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+
+#include "access/heapam_xlog.h"
+#include "access/transam.h"
+#include "access/xact.h"
+
+#include "replication/logical.h"
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+
+#include "utils/builtins.h"
+#include "utils/catcache.h" /* FIXME: Use */
+#include "utils/memutils.h"
+#include "utils/snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/tqual.h"
+
+#include "storage/block.h" /* debugging output */
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/standby.h"
+
+typedef struct SnapBuild
+{
+ /* how far are we along building our first full snapshot */
+ SnapBuildState state;
+
+ /* private memory context used to allocate memory for this module. */
+ MemoryContext context;
+
+ /* all transactions < than this have committed/aborted */
+ TransactionId xmin;
+
+ /* all transactions >= than this are uncommitted */
+ TransactionId xmax;
+
+ /*
+ * Don't replay commits from an LSN <= this LSN. This can be set
+ * externally but it will also be advanced (never retreat) from within
+ * snapbuild.c.
+ */
+ XLogRecPtr transactions_after;
+
+ /*
+ * Don't start decoding WAL until the "xl_running_xacts" information
+ * indicates there are no running xids with a xid smaller than this.
+ */
+ TransactionId initial_xmin_horizon;
+
+ /*
+ * Snapshot thats valid to see all currently committed transactions that
+ * see catalog modifications.
+ */
+ Snapshot snapshot;
+
+ /*
+ * LSN of the last location we are sure a snapshot has been serialized to.
+ */
+ XLogRecPtr last_serialized_snapshot;
+
+ ReorderBuffer *reorder;
+
+ /*
+ * Information about initially running transactions
+ *
+ * When we start building a snapshot there already may be transactions in
+ * progress. Those are stored in running.xip. We don't have enough
+ * information about those to decode their contents, so until they are
+ * finished (xcnt=0) we cannot switch to a CONSISTENT state.
+ */
+ struct
+ {
+ /*
+ * As long as running.xcnt all XIDs < running.xmin and > running.xmax
+ * have to be checked whether they still are running.
+ */
+ TransactionId xmin;
+ TransactionId xmax;
+
+ size_t xcnt; /* number of used xip entries */
+ size_t xcnt_space; /* allocated size of xip */
+ TransactionId *xip; /* running xacts array, xidComparator-sorted */
+ } running;
+
+ /*
+ * Array of transactions which could have catalog changes that committed
+ * between xmin and xmax
+ */
+ struct
+ {
+ /* number of committed transactions */
+ size_t xcnt;
+
+ /* available space for committed transactions */
+ size_t xcnt_space;
+
+ /*
+ * Until we reach a CONSISTENT state, we record commits of all
+ * transactions, not just the catalog changing ones. Record when that
+ * changes so we know we cannot export a snapshot safely anymore.
+ */
+ bool includes_all_transactions;
+
+ /*
+ * Array of committed transactions that have modified the catalog.
+ *
+ * As this array is frequently modified we do *not* keep it in
+ * xidComparator order. Instead we sort the array when building &
+ * distributing a snapshot.
+ *
+ * XXX: That doesn't seem to be good reasoning anymore. Everytime we
+ * add something here after becoming consistent will also require
+ * distributing a snapshot. Storing them sorted would potentially make
+ * it easier to purge as well (but more complicated wrt wraparound?).
+ */
+ TransactionId *xip;
+ } committed;
+} SnapBuild;
+
+/*
+ * Starting a transaction -- which we need to do while exporting a snapshot --
+ * removes knowledge about the previously used resowner, so we save it here.
+ */
+ResourceOwner SavedResourceOwnerDuringExport = NULL;
+
+/* transaction state manipulation functions */
+static void SnapBuildEndTxn(SnapBuild *builder, TransactionId xid);
+
+/* ->running manipulation */
+static bool SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid);
+
+/* ->committed manipulation */
+static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
+
+/* snapshot building/manipulation/distribution functions */
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid);
+
+static void SnapBuildFreeSnapshot(Snapshot snap);
+
+static void SnapBuildSnapIncRefcount(Snapshot snap);
+
+static void SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn);
+
+/* xlog reading helper functions for SnapBuildProcessRecord */
+static bool SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running);
+
+/* serialization functions */
+static void SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn);
+static bool SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn);
+
+
+/*
+ * Allocate a new snapshot builder.
+ */
+SnapBuild *
+AllocateSnapshotBuilder(ReorderBuffer *reorder,
+ TransactionId xmin_horizon,
+ XLogRecPtr start_lsn)
+{
+ MemoryContext context;
+ MemoryContext oldcontext;
+ SnapBuild *builder;
+
+ context = AllocSetContextCreate(TopMemoryContext,
+ "snapshot builder context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ oldcontext = MemoryContextSwitchTo(context);
+
+ builder = palloc0(sizeof(SnapBuild));
+
+ builder->state = SNAPBUILD_START;
+ builder->context = context;
+ builder->reorder = reorder;
+ /* Other struct members initialized by zeroing, above */
+
+ /* builder->running is initialized by zeroing, above */
+
+ builder->committed.xcnt = 0;
+ builder->committed.xcnt_space = 128; /* arbitrary number */
+ builder->committed.xip =
+ palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
+ builder->committed.includes_all_transactions = true;
+ builder->committed.xip =
+ palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
+ builder->initial_xmin_horizon = xmin_horizon;
+ builder->transactions_after = start_lsn;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return builder;
+}
+
+/*
+ * Free a snapshot builder.
+ */
+void
+FreeSnapshotBuilder(SnapBuild *builder)
+{
+ MemoryContext context = builder->context;
+
+ if (builder->snapshot)
+ SnapBuildFreeSnapshot(builder->snapshot);
+
+ if (builder->running.xip)
+ pfree(builder->running.xip);
+
+ if (builder->committed.xip)
+ pfree(builder->committed.xip);
+
+ pfree(builder);
+
+ MemoryContextDelete(context);
+}
+
+/*
+ * Free an unreferenced snapshot that has previously been built by us.
+ */
+static void
+SnapBuildFreeSnapshot(Snapshot snap)
+{
+ /* make sure we don't get passed an external snapshot */
+ Assert(snap->satisfies == HeapTupleSatisfiesMVCCDuringDecoding);
+
+ /* make sure nobody modified our snapshot */
+ Assert(snap->curcid == FirstCommandId);
+ Assert(!snap->suboverflowed);
+ Assert(!snap->takenDuringRecovery);
+ Assert(!snap->regd_count);
+
+ /* slightly more likely, so it's checked even without c-asserts */
+ if (snap->copied)
+ elog(ERROR, "can't free a copied snapshot");
+
+ if (snap->active_count)
+ elog(ERROR, "can't free an active snapshot");
+
+ pfree(snap);
+}
+
+/*
+ * In which state of snapshot building ar we?
+ */
+SnapBuildState
+SnapBuildCurrentState(SnapBuild *builder)
+{
+ return builder->state;
+}
+
+/*
+ * Should the contents of transaction ending at 'ptr' be decoded?
+ */
+bool
+SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr)
+{
+ return ptr <= builder->transactions_after;
+}
+
+/*
+ * Increase refcount of a snapshot.
+ *
+ * This is used when handing out a snapshot to some external resource or when
+ * adding a Snapshot as builder->snapshot.
+ */
+static void
+SnapBuildSnapIncRefcount(Snapshot snap)
+{
+ snap->active_count++;
+}
+
+/*
+ * Decrease refcount of a snapshot and free if the refcount reaches zero.
+ *
+ * Externally visible so external resources that have been handed an IncRef'ed
+ * Snapshot can free it easily.
+ */
+void
+SnapBuildSnapDecRefcount(Snapshot snap)
+{
+ /* make sure we don't get passed an external snapshot */
+ Assert(snap->satisfies == HeapTupleSatisfiesMVCCDuringDecoding);
+
+ /* make sure nobody modified our snapshot */
+ Assert(snap->curcid == FirstCommandId);
+ Assert(!snap->suboverflowed);
+ Assert(!snap->takenDuringRecovery);
+ Assert(!snap->regd_count);
+
+ Assert(snap->active_count);
+
+ /* slightly more likely, so its checked even without casserts */
+ if (snap->copied)
+ elog(ERROR, "can't free a copied snapshot");
+
+ snap->active_count--;
+ if (!snap->active_count)
+ SnapBuildFreeSnapshot(snap);
+}
+
+/*
+ * Build a new snapshot, based on currently committed catalog-modifying
+ * transactions.
+ *
+ * In-progress transactions with catalog access are *not* allowed to modify
+ * these snapshots; they have to copy them and fill in appropriate ->curcid and
+ * ->subxip/subxcnt values.
+ */
+static Snapshot
+SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid)
+{
+ Snapshot snapshot;
+ Size ssize;
+
+ Assert(builder->state >= SNAPBUILD_FULL_SNAPSHOT);
+
+ ssize = sizeof(SnapshotData)
+ + sizeof(TransactionId) * builder->committed.xcnt
+ + sizeof(TransactionId) * 1 /* toplevel xid */ ;
+
+ snapshot = MemoryContextAllocZero(builder->context, ssize);
+
+ snapshot->satisfies = HeapTupleSatisfiesMVCCDuringDecoding;
+
+ /*
+ * We misuse the original meaning of SnapshotData's xip and subxip fields
+ * to make the more fitting for our needs.
+ *
+ * In the 'xip' array we store transactions that have to be treated as
+ * committed. Since we will only ever look at tuples from transactions
+ * that have modified the catalog its more efficient to store those few
+ * that exist between xmin and xmax (frequently there are none).
+ *
+ * Snapshots that are used in transactions that have modified the catalog
+ * also use the 'subxip' array to store their toplevel xid and all the
+ * subtransaction xids so we can recognize when we need to treat rows as
+ * visible that are not in xip but still need to be visible. Subxip only
+ * gets filled when the transaction is copied into the context of a
+ * catalog modifying transaction since we otherwise share a snapshot
+ * between transactions. As long as a txn hasn't modified the catalog it
+ * doesn't need to treat any uncommitted rows as visible, so there is no
+ * need for those xids.
+ *
+ * Both arrays are qsort'ed so that we can use bsearch() on them.
+ *
+ * XXX: Do we want extra fields instead of misusing existing ones instead?
+ */
+ Assert(TransactionIdIsNormal(builder->xmin));
+ Assert(TransactionIdIsNormal(builder->xmax));
+
+ snapshot->xmin = builder->xmin;
+ snapshot->xmax = builder->xmax;
+
+ /* store all transactions to be treated as committed by this snapshot */
+ snapshot->xip =
+ (TransactionId *) ((char *) snapshot + sizeof(SnapshotData));
+ snapshot->xcnt = builder->committed.xcnt;
+ memcpy(snapshot->xip,
+ builder->committed.xip,
+ builder->committed.xcnt * sizeof(TransactionId));
+
+ /* sort so we can bsearch() */
+ qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+
+ /*
+ * Initially, subxip is empty, i.e. it's a snapshot to be used by
+ * transactions that don't modify the catalog. Will be filled by
+ * ReorderBufferCopySnap() if necessary.
+ */
+ snapshot->subxcnt = 0;
+ snapshot->subxip = NULL;
+
+ snapshot->suboverflowed = false;
+ snapshot->takenDuringRecovery = false;
+ snapshot->copied = false;
+ snapshot->curcid = FirstCommandId;
+ snapshot->active_count = 0;
+ snapshot->regd_count = 0;
+
+ return snapshot;
+}
+
+/*
+ * Export a snapshot so it can be set in another session with SET TRANSACTION
+ * SNAPSHOT.
+ *
+ * For that we need to start a transaction in the current backend as the
+ * importing side checks whether the source transaction is still open to make
+ * sure the xmin horizon hasn't advanced since then.
+ *
+ * After that we convert a locally built snapshot into the normal variant
+ * understood by HeapTupleSatisfiesMVCC et al.
+ */
+const char *
+SnapBuildExportSnapshot(SnapBuild *builder)
+{
+ Snapshot snap;
+ char *snapname;
+ TransactionId xid;
+ TransactionId *newxip;
+ int newxcnt = 0;
+
+ elog(LOG, "building snapshot");
+
+ if (builder->state != SNAPBUILD_CONSISTENT)
+ elog(ERROR, "cannot export a snapshot before reaching a consistent state");
+
+ if (!builder->committed.includes_all_transactions)
+ elog(ERROR, "cannot export a snapshot, not all transactions are monitored anymore");
+
+ /* so we don't overwrite the existing value */
+ if (TransactionIdIsValid(MyPgXact->xmin))
+ elog(ERROR, "cannot export a snapshot when MyPgXact->xmin already is valid");
+
+ if (IsTransactionOrTransactionBlock())
+ elog(ERROR, "cannot export a snapshot from within a transaction");
+
+ if (SavedResourceOwnerDuringExport)
+ elog(ERROR, "can only export one snapshot at a time");
+
+ SavedResourceOwnerDuringExport = CurrentResourceOwner;
+
+ StartTransactionCommand();
+
+ Assert(!FirstSnapshotSet);
+
+ /* There doesn't seem to a nice API to set these */
+ XactIsoLevel = XACT_REPEATABLE_READ;
+ XactReadOnly = true;
+
+ snap = SnapBuildBuildSnapshot(builder, GetTopTransactionId());
+
+ /*
+ * We know that snap->xmin is alive, enforced by the logical xmin
+ * mechanism. Due to that we can do this without locks, we're only
+ * changing our own value.
+ */
+ MyPgXact->xmin = snap->xmin;
+
+ /* allocate in transaction context */
+ newxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+
+ /*
+ * snapbuild.c builds transactions in an "inverted" manner, which means it
+ * stores committed transactions in ->xip, not ones in progress. Build a
+ * classical snapshot by marking all non-committed transactions as
+ * in-progress. This can be expensive.
+ */
+ for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+ {
+ void *test;
+
+ /*
+ * check whether transaction committed using the timetravel meaning of
+ * ->xip
+ */
+ test = bsearch(&xid, snap->xip, snap->xcnt,
+ sizeof(TransactionId), xidComparator);
+
+ elog(DEBUG2, "checking xid %u.. %d (xmin %u, xmax %u)",
+ xid, test == NULL, snap->xmin, snap->xmax);
+
+ if (test == NULL)
+ {
+ if (newxcnt >= GetMaxSnapshotXidCount())
+ elog(ERROR, "snapshot too large");
+
+ newxip[newxcnt++] = xid;
+
+ elog(DEBUG2, "treat %u as in-progress", xid);
+ }
+
+ TransactionIdAdvance(xid);
+ }
+
+ snap->xcnt = newxcnt;
+ snap->xip = newxip;
+
+ /*
+ * now that we've built a plain snapshot, use the normal mechanisms for
+ * exporting it
+ */
+ snapname = ExportSnapshot(snap);
+
+ elog(LOG, "exported snapbuild snapshot: %s xcnt %u", snapname, snap->xcnt);
+ return snapname;
+}
+
+/*
+ * Reset a previously SnapBuildExportSnapshot()'ed snapshot if there is
+ * any. Aborts the previously started transaction and resets the resource owner
+ * back to it's original value.
+ */
+void
+SnapBuildClearExportedSnapshot()
+{
+ /* nothing exported, thats the usual case */
+ if (SavedResourceOwnerDuringExport == NULL)
+ return;
+
+ Assert(IsTransactionState());
+
+ /* make sure nothing could have ever happened */
+ AbortCurrentTransaction();
+
+ CurrentResourceOwner = SavedResourceOwnerDuringExport;
+ SavedResourceOwnerDuringExport = NULL;
+}
+
+/*
+ * Handle the effects of a single heap change, appropriate to the current state
+ * of the snapshot builder and returns whether changes made at (xid, lsn) may
+ * be decoded.
+ */
+bool
+SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
+{
+ bool is_old_tx;
+
+ /*
+ * We can't handle data in transactions if we haven't built a snapshot
+ * yet, so don't store them.
+ */
+ if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
+ return false;
+
+ /*
+ * No point in keeping track of changes in transactions that we don't have
+ * enough information about to decode.
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT &&
+ SnapBuildTxnIsRunning(builder, xid))
+ return false;
+
+ is_old_tx = ReorderBufferIsXidKnown(builder->reorder, xid);
+
+ if (!is_old_tx || !ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
+ {
+ /* only build a new snapshot if we don't have a prebuilt one */
+ if (builder->snapshot == NULL)
+ {
+ builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+ /* inrease refcount for the snapshot builder */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+ }
+
+ /* increase refcount for the transaction */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+ ReorderBufferSetBaseSnapshot(builder->reorder, xid, lsn,
+ builder->snapshot);
+ }
+
+ return true;
+}
+
+/*
+ * Do CommandId/ComboCid handling after reading a xl_heap_new_cid record. This
+ * implies that a transaction has done some for of write to system catalogs.
+ */
+void
+SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
+ XLogRecPtr lsn, xl_heap_new_cid *xlrec)
+{
+ CommandId cid;
+
+ /*
+ * we only log new_cid's if a catalog tuple was modified, so
+ * set transaction to timetravelling.
+ */
+ ReorderBufferXidSetTimetravel(builder->reorder, xid,lsn);
+
+ ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
+ xlrec->target.node, xlrec->target.tid,
+ xlrec->cmin, xlrec->cmax,
+ xlrec->combocid);
+
+ /* figure out new command id */
+ if (xlrec->cmin != InvalidCommandId &&
+ xlrec->cmax != InvalidCommandId)
+ cid = Max(xlrec->cmin, xlrec->cmax);
+ else if (xlrec->cmax != InvalidCommandId)
+ cid = xlrec->cmax;
+ else if (xlrec->cmin != InvalidCommandId)
+ cid = xlrec->cmin;
+ else
+ {
+ cid = InvalidCommandId; /* silence compiler */
+ elog(ERROR, "broken arrow, no cid?");
+ }
+
+ /*
+ * FIXME: potential race condition here: if multiple snapshots were running
+ * & generating changes in the same transaction on the source side this
+ * could be problematic. But this cannot happen for system catalogs, right?
+ */
+ ReorderBufferAddNewCommandId(builder->reorder, xid, lsn, cid + 1);
+}
+
+/*
+ * Check whether `xid` is currently 'running'. Running transactions in our
+ * parlance are transactions which we didn't observe from the start so we can't
+ * properly decode them. They only exist after we freshly started from an
+ * < CONSISTENT snapshot.
+ */
+static bool
+SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid)
+{
+ Assert(builder->state < SNAPBUILD_CONSISTENT);
+ Assert(TransactionIdIsValid(builder->running.xmin));
+ Assert(TransactionIdIsValid(builder->running.xmax));
+
+ if (builder->running.xcnt &&
+ NormalTransactionIdFollows(xid, builder->running.xmin) &&
+ NormalTransactionIdPrecedes(xid, builder->running.xmax))
+ {
+ TransactionId *search =
+ bsearch(&xid, builder->running.xip, builder->running.xcnt_space,
+ sizeof(TransactionId), xidComparator);
+
+ if (search != NULL)
+ {
+ Assert(*search == xid);
+ return true;
+ }
+ }
+
+ return false;
+}
+
+/*
+ * Add a new Snapshot to all transactions we're decoding that currently are
+ * in-progress so they can see new catalog contents made by the transaction
+ * that just committed. This is necessary because those in-progress
+ * transactions will use the new catalog's contents from here on (at the very
+ * least everything they do needs to be compatible with newer catalog contents).
+ */
+static void
+SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn)
+{
+ dlist_iter txn_i;
+ ReorderBufferTXN *txn;
+
+ /*
+ * Iterate through all toplevel transactions. This can include
+ * subtransactions which we just don't yet know to be that, but that's
+ * fine, they will just get an unneccesary snapshot queued.
+ */
+ dlist_foreach(txn_i, &builder->reorder->toplevel_by_lsn)
+ {
+ txn = dlist_container(ReorderBufferTXN, node, txn_i.cur);
+
+ Assert(TransactionIdIsValid(txn->xid));
+
+ /*
+ * If we don't have a base snapshot yet, there are no changes in this
+ * transaction which in turn implies we don't yet need a snapshot at
+ * all. We'll add add a snapshot when the first change gets queued.
+ *
+ * XXX: is that fine if only a subtransaction has a base snapshot so
+ * far?
+ */
+ if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
+ continue;
+
+ elog(DEBUG2, "adding a new snapshot to %u at %X/%X",
+ txn->xid, (uint32) (lsn >> 32), (uint32) lsn);
+
+ /* increase refcount for the transaction */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+ ReorderBufferAddSnapshot(builder->reorder, txn->xid, lsn,
+ builder->snapshot);
+ }
+}
+
+/*
+ * Keep track of a new catalog changing transaction that has committed.
+ */
+static void
+SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
+{
+ Assert(TransactionIdIsValid(xid));
+
+ if (builder->committed.xcnt == builder->committed.xcnt_space)
+ {
+ builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
+
+ /* XXX: put in a limit here as a defense against bugs? */
+
+ elog(DEBUG1, "increasing space for committed transactions to %zu",
+ builder->committed.xcnt_space);
+
+ builder->committed.xip = repalloc(builder->committed.xip,
+ builder->committed.xcnt_space * sizeof(TransactionId));
+ }
+
+ /*
+ * XXX: It might make sense to keep the array sorted here instead of doing
+ * it everytime we build a new snapshot. On the other hand this gets called
+ * repeatedly when a transaction with subtransactions commits.
+ */
+ builder->committed.xip[builder->committed.xcnt++] = xid;
+}
+
+/*
+ * Remove knowledge about transactions we treat as committed that are smaller
+ * than ->xmin. Those won't ever get checked via the ->commited array but via
+ * the clog machinery, so we don't need to waste memory on them.
+ */
+static void
+SnapBuildPurgeCommittedTxn(SnapBuild *builder)
+{
+ int off;
+ TransactionId *workspace;
+ int surviving_xids = 0;
+
+ /* not ready yet */
+ if (!TransactionIdIsNormal(builder->xmin))
+ return;
+
+ /* XXX: Neater algorithm? */
+ workspace =
+ MemoryContextAlloc(builder->context,
+ builder->committed.xcnt * sizeof(TransactionId));
+
+ /* copy xids that still are interesting to workspace */
+ for (off = 0; off < builder->committed.xcnt; off++)
+ {
+ if (NormalTransactionIdPrecedes(builder->committed.xip[off],
+ builder->xmin))
+ ; /* remove */
+ else
+ workspace[surviving_xids++] = builder->committed.xip[off];
+ }
+
+ /* copy workspace back to persistent state */
+ memcpy(builder->committed.xip, workspace,
+ surviving_xids * sizeof(TransactionId));
+
+ elog(DEBUG1, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
+ (uint32) builder->committed.xcnt, (uint32) surviving_xids,
+ builder->xmin, builder->xmax);
+ builder->committed.xcnt = surviving_xids;
+
+ pfree(workspace);
+}
+
+/*
+ * Common logic for SnapBuildAbortTxn and SnapBuildCommitTxn dealing with
+ * keeping track of the amount of running transactions.
+ */
+static void
+SnapBuildEndTxn(SnapBuild *builder, TransactionId xid)
+{
+ if (builder->state == SNAPBUILD_CONSISTENT)
+ return;
+
+ if (SnapBuildTxnIsRunning(builder, xid))
+ {
+ Assert(builder->running.xcnt > 0);
+
+ if (!--builder->running.xcnt)
+ {
+ /*
+ * None of the originally running transaction is running anymore.
+ * Due to that our incrementaly built snapshot now is complete.
+ */
+ elog(LOG, "found consistent point due to SnapBuildEndTxn + running: %u", xid);
+ builder->state = SNAPBUILD_CONSISTENT;
+ }
+ }
+}
+
+/*
+ * Abort a transaction, throw away all state we kept
+ */
+void
+SnapBuildAbortTxn(SnapBuild *builder, TransactionId xid,
+ int nsubxacts, TransactionId *subxacts)
+{
+ int i;
+
+ for (i = 0; i < nsubxacts; i++)
+ {
+ TransactionId subxid = subxacts[i];
+
+ SnapBuildEndTxn(builder, subxid);
+ }
+
+ SnapBuildEndTxn(builder, xid);
+}
+
+/*
+ * Handle everything that needs to be done when a transaction commits
+ */
+void
+SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
+ int nsubxacts, TransactionId *subxacts)
+{
+ int nxact;
+
+ bool forced_timetravel = false;
+ bool sub_does_timetravel = false;
+ bool top_does_timetravel = false;
+
+ TransactionId xmax = xid;
+
+ /*
+ * If we couldn't observe every change of a transaction because it was
+ * already running at the point we started to observe we have to assume it
+ * made catalog changes.
+ *
+ * This has the positive benefit that we afterwards have enough
+ * information to build an exportable snapshot thats usable by pg_dump et
+ * al.
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ {
+ /* ensure that only commits after this are getting replayed */
+ if (builder->transactions_after < lsn)
+ builder->transactions_after = lsn;
+
+ /*
+ * we could avoid treating !SnapBuildTxnIsRunning transactions as
+ * timetravel ones, but we want to be able to export a snapshot when
+ * we reached consistency.
+ */
+ forced_timetravel = true;
+ elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running to early", xid);
+ }
+
+ for (nxact = 0; nxact < nsubxacts; nxact++)
+ {
+ TransactionId subxid = subxacts[nxact];
+
+ /*
+ * make sure txn is not tracked in running txn's anymore, switch state
+ */
+ SnapBuildEndTxn(builder, subxid);
+
+ /*
+ * If we're forcing timetravel we also need accurate subtransaction
+ * status.
+ */
+ if (forced_timetravel)
+ {
+ SnapBuildAddCommittedTxn(builder, subxid);
+ if (NormalTransactionIdFollows(subxid, xmax))
+ xmax = subxid;
+ }
+
+ /*
+ * add subtransaction to base snapshot, we don't distinguish to
+ * toplevel transactions there.
+ */
+ else if (ReorderBufferXidDoesTimetravel(builder->reorder, subxid))
+ {
+ sub_does_timetravel = true;
+
+ elog(DEBUG1, "found subtransaction %u:%u with catalog changes.",
+ xid, subxid);
+
+ SnapBuildAddCommittedTxn(builder, subxid);
+
+ if (NormalTransactionIdFollows(subxid, xmax))
+ xmax = subxid;
+ }
+ }
+
+ /*
+ * make sure txn is not tracked in running txn's anymore, switch state
+ */
+ SnapBuildEndTxn(builder, xid);
+
+ if (forced_timetravel)
+ {
+ elog(DEBUG1, "forced transaction %u to do timetravel.", xid);
+
+ SnapBuildAddCommittedTxn(builder, xid);
+ }
+ /* add toplevel transaction to base snapshot */
+ else if (ReorderBufferXidDoesTimetravel(builder->reorder, xid))
+ {
+ elog(DEBUG1, "found top level transaction %u, with catalog changes!",
+ xid);
+
+ top_does_timetravel = true;
+ SnapBuildAddCommittedTxn(builder, xid);
+ }
+ else if (sub_does_timetravel)
+ {
+ /* mark toplevel txn as timetravel as well */
+ SnapBuildAddCommittedTxn(builder, xid);
+ }
+
+ if (forced_timetravel || top_does_timetravel || sub_does_timetravel)
+ {
+ if (!TransactionIdIsValid(builder->xmax) ||
+ TransactionIdFollowsOrEquals(xmax, builder->xmax))
+ {
+ builder->xmax = xmax;
+ TransactionIdAdvance(builder->xmax);
+ }
+
+ if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
+ return;
+
+ /* decrease the snapshot builder's refcount of the old snapshot */
+ if (builder->snapshot)
+ SnapBuildSnapDecRefcount(builder->snapshot);
+
+ builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+
+ /* refcount of the snapshot builder for the new snapshot */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+
+ /* add a new SnapshotNow to all currently running transactions */
+ SnapBuildDistributeNewCatalogSnapshot(builder, lsn);
+ }
+ else
+ {
+ /* record that we cannot export a general snapshot anymore */
+ builder->committed.includes_all_transactions = false;
+ }
+}
+
+
+/* -----------------------------------
+ * Snapshot building functions dealing with xlog records
+ * -----------------------------------
+ */
+void
+SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running)
+{
+ ReorderBufferTXN *txn;
+
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ {
+ /* returns false if there's no point in performing cleanup just yet */
+ if (!SnapBuildFindSnapshot(builder, lsn, running))
+ return;
+ }
+ else
+ {
+ SnapBuildSerialize(builder, lsn);
+ }
+
+ /*
+ * update range of interesting xids. We don't increase ->xmax because once
+ * we are in a consistent state we can do that ourselves and much more
+ * efficiently so because we only need to do it for catalog transactions.
+ */
+ builder->xmin = running->oldestRunningXid;
+
+ /*
+ * xmax can be lower than xmin here because we only increase xmax when we
+ * hit a transaction with catalog changes. While odd looking, its correct
+ * and actually more efficient this way since we hit fast paths in tqual.c.
+ */
+
+ /* Remove transactions we don't need to keep track off anymore */
+ SnapBuildPurgeCommittedTxn(builder);
+
+ elog(DEBUG1, "xmin: %u, xmax: %u, oldestrunning: %u",
+ builder->xmin, builder->xmax,
+ running->oldestRunningXid);
+
+ /*
+ * inrease shared memory state, so vacuum can work on tuples we prevent
+ * from being pruned till now.
+ */
+ IncreaseLogicalXminForSlot(lsn, running->oldestRunningXid);
+
+ /*
+ * Also tell the slot where we can restart decoding from. We don't want to
+ * do that after every commit because changing that implies an fsync of the
+ * logical slot's state file, so we only do it everytime we see a running
+ * xacts record.
+ *
+ * Do so by looking for the oldest in progress transaction (determined by
+ * the first LSN of any of its relevant records). Every transaction
+ * remembers the last location we stored the snapshot to disk before its
+ * beginning. That point is where we can restart from.
+ */
+
+ /*
+ * Can't know about a serialized snapshot's location if we're not
+ * consistent
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ return;
+
+ txn = ReorderBufferGetOldestTXN(builder->reorder);
+
+ /*
+ * oldest ongoing txn might have started when we didn't yet serialize
+ * anything because we hadn't reached a consistent state yet.
+ */
+ if (txn != NULL && txn->restart_decoding_lsn != InvalidXLogRecPtr)
+ IncreaseRestartDecodingForSlot(lsn, txn->restart_decoding_lsn);
+
+ /*
+ * No in-progress transaction, can reuse the last serialized snapshot if we
+ * have one.
+ */
+ else if (txn == NULL &&
+ builder->reorder->current_restart_decoding_lsn != InvalidXLogRecPtr &&
+ builder->last_serialized_snapshot != InvalidXLogRecPtr)
+ IncreaseRestartDecodingForSlot(lsn, builder->last_serialized_snapshot);
+}
+
+
+/*
+ * Build the start of a snapshot that's capable of decoding the catalog. Helper
+ * function for SnapBuildProcessRunningXacts() while we're not yet consistent.
+ *
+ * Returns true if there is a point in performing internal maintenance/cleanup
+ * using the xl_running_xacts record.
+ */
+static bool
+SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running)
+{
+ /* ---
+ * Build catalog decoding snapshot incrementally using information about
+ * the currently running transactions. There are several ways to do that:
+
+ * a) There were no running transactions when the xl_running_xacts record
+ * was inserted, jump to CONSISTENT immediately. We might find such a
+ * state we were waiting for b) and c).
+
+ * b) Wait for all toplevel transactions that were running to end. We
+ * simply track the number of in-progress toplevel transactions and
+ * lower it whenever one commits or aborts. When that number
+ * (builder->running.xcnt) reaches zero, we can go from FULL_SNAPSHOT to
+ * CONSISTENT.
+ * NB: We need to search running.xip when seeing a transaction's end to
+ * make sure it's a toplevel transaction and it's been one of the
+ * intially running ones.
+ * Interestingly, in contrast to HS this allows us not to care about
+ * subtransactions - and by extension suboverflowed xl_running_xacts -
+ * at all.
+ *
+ * c) This (in a previous run) or another decoding slot serialized a
+ * snapshot to disk that we can use.
+ * ---
+ */
+
+ /*
+ * xl_running_xact record is older than what we can use, we might not have
+ * all necessary catalog rows anymore.
+ */
+ if (TransactionIdIsNormal(builder->initial_xmin_horizon) &&
+ NormalTransactionIdPrecedes(running->oldestRunningXid,
+ builder->initial_xmin_horizon))
+ {
+ elog(LOG, "skipping snapshot at %X/%X due to initial xmin horizon of %u vs the snapshot's %u",
+ (uint32) (lsn >> 32), (uint32) lsn,
+ builder->initial_xmin_horizon, running->oldestRunningXid);
+ return true;
+ }
+
+ /*
+ * a) No transaction were running, we can jump to consistent.
+ *
+ * NB: We might have already started to incrementally assemble a snapshot,
+ * so we need to be careful to deal with that.
+ */
+ if (running->xcnt == 0)
+ {
+ if (builder->transactions_after == InvalidXLogRecPtr ||
+ builder->transactions_after < lsn)
+ builder->transactions_after = lsn;
+
+ builder->xmin = running->oldestRunningXid;
+ builder->xmax = running->latestCompletedXid;
+ TransactionIdAdvance(builder->xmax);
+
+ Assert(TransactionIdIsNormal(builder->xmin));
+ Assert(TransactionIdIsNormal(builder->xmax));
+
+ /* no transactions running now */
+ builder->running.xcnt = 0;
+ builder->running.xmin = InvalidTransactionId;
+ builder->running.xmax = InvalidTransactionId;
+
+ /*
+ * FIXME: abort everything we have stored about running transactions,
+ * relevant e.g. after a crash.
+ */
+ builder->state = SNAPBUILD_CONSISTENT;
+
+ elog(LOG, "found initial snapshot (xmin %u) due to running xacts with xcnt == 0",
+ builder->xmin);
+
+ return false;
+ }
+ /* c) valid on disk state */
+ else if (SnapBuildRestore(builder, lsn))
+ {
+ /* there won't be any state to cleanup */
+ return false;
+ }
+
+ /*
+ * b) first encounter of a useable xl_running_xacts record. If we had found
+ * one earlier we would either track running transactions
+ * (i.e. builder->running.xcnt != 0) or be consistent (this function
+ * wouldn't get called)..
+ */
+ else if (!builder->running.xcnt)
+ {
+ /*
+ * We only care about toplevel xids as those are the ones we definitely
+ * see in the wal stream. As snapbuild.c tracks committed instead of
+ * running transactions we don't need to know anything about
+ * uncommitted subtransactions.
+ */
+ builder->xmin = running->oldestRunningXid;
+ builder->xmax = running->latestCompletedXid;
+ TransactionIdAdvance(builder->xmax);
+
+ /* so we can safely use the faster comparisons */
+ Assert(TransactionIdIsNormal(builder->xmin));
+ Assert(TransactionIdIsNormal(builder->xmax));
+
+ builder->running.xcnt = running->xcnt;
+ builder->running.xcnt_space = running->xcnt;
+ builder->running.xip =
+ MemoryContextAlloc(builder->context,
+ builder->running.xcnt * sizeof(TransactionId));
+ memcpy(builder->running.xip, running->xids,
+ builder->running.xcnt * sizeof(TransactionId));
+
+ /* sort so we can do a binary search */
+ qsort(builder->running.xip, builder->running.xcnt,
+ sizeof(TransactionId), xidComparator);
+
+ builder->running.xmin = builder->running.xip[0];
+ builder->running.xmax = builder->running.xip[running->xcnt - 1];
+
+ /* makes comparisons cheaper later */
+ TransactionIdRetreat(builder->running.xmin);
+ TransactionIdAdvance(builder->running.xmax);
+
+ builder->state = SNAPBUILD_FULL_SNAPSHOT;
+
+ elog(LOG, "found initial snapshot (xmin %u) due to running xacts, %u xacts need to finish",
+ builder->xmin, (uint32) builder->running.xcnt);
+
+ /* nothing could have built up so far */
+ return false;
+ }
+
+ /*
+ * We already started to track running xacts and need to wait for all
+ * in-progress ones to finish. We fall through to the normal processing of
+ * records so incremental cleanup can be performed.
+ */
+ return true;
+}
+
+
+/* -----------------------------------
+ * Snapshot serialization support
+ * -----------------------------------
+ */
+
+/*
+ * We store current state of struct SnapBuild on disk in the following manner:
+ *
+ * struct SnapBuildOnDisk;
+ * TransactionId * running.xcnt_space;
+ * TransactionId * committed.xcnt; (*not xcnt_space*)
+ *
+ */
+typedef struct SnapBuildOnDisk
+{
+ uint32 magic;
+ /* how large is the SnapBuildOnDisk including all data in state */
+ Size size;
+ SnapBuild builder;
+
+ /* XXX: Should we store a CRC32? */
+
+ /* variable amount of TransactionId's */
+} SnapBuildOnDisk;
+
+#define SNAPBUILD_MAGIC 0x51A1E001
+
+/*
+ * Store/Load a snapshot from disk, depending on the snapshot builder's state.
+ *
+ * Supposed to be used by external (i.e. not snapbuild.c) code that just reada
+ * record that's a potential location for a serialized snapshot.
+ */
+void
+SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn)
+{
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ SnapBuildRestore(builder, lsn);
+ else
+ SnapBuildSerialize(builder, lsn);
+}
+
+/*
+ * Serialize the snapshot 'builder' at the location 'lsn' if it hasn't already
+ * been done by another decoding process.
+ */
+static void
+SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
+{
+ Size needed_size;
+ SnapBuildOnDisk *ondisk;
+ char *ondisk_c;
+ int fd;
+ char tmppath[MAXPGPATH];
+ char path[MAXPGPATH];
+ int ret;
+ struct stat stat_buf;
+
+ needed_size = sizeof(SnapBuildOnDisk) +
+ sizeof(TransactionId) * builder->running.xcnt_space +
+ sizeof(TransactionId) * builder->committed.xcnt;
+
+ Assert(lsn != InvalidXLogRecPtr);
+ Assert(builder->last_serialized_snapshot == InvalidXLogRecPtr ||
+ builder->last_serialized_snapshot <= lsn);
+
+ /*
+ * no point in serializing if we cannot continue to work immediately after
+ * restoring the snapshot
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ return;
+
+ /*
+ * FIXME: Timeline handling/naming.
+ */
+
+ /*
+ * first check whether some other backend already has written the snapshot
+ * for this LSN. It's perfectly fine if there's none, so we accept ENOENT
+ * as a valid state. Everything else is an unexpected error.
+ */
+ sprintf(path, "pg_llog/snapshots/%X-%X.snap",
+ (uint32) (lsn >> 32), (uint32) lsn);
+
+ ret = stat(path, &stat_buf);
+
+ if (ret != 0 && errno != ENOENT)
+ ereport(ERROR, (errmsg("could not stat snapbuild state file %s", path)));
+ else if (ret == 0)
+ {
+ /*
+ * somebody else has already serialized to this point, don't overwrite
+ * but remember location, so we don't need to read old data again.
+ *
+ * FIXME: Is it safe to set this as restartpoint below? While we can
+ * see the file it's not guaranteed to persist after a crash...
+ */
+ builder->last_serialized_snapshot = lsn;
+ goto out;
+ }
+
+ /*
+ * there is an obvious race condition here between the time we stat(2) the
+ * file and us writing the file. But we rename the file into place
+ * atomically and all files created need to contain the same data anyway,
+ * so this is perfectly fine, although a bit of a resource waste. Locking
+ * seems like pointless complication.
+ */
+ elog(DEBUG1, "serializing snapshot to %s", path);
+
+ /* to make sure only we will write to this tempfile, include pid */
+ sprintf(tmppath, "pg_llog/snapshots/%X-%X.snap.%u.tmp",
+ (uint32) (lsn >> 32), (uint32) lsn, MyProcPid);
+
+ /*
+ * Unlink temporary file if it already exists, needs to have been before a
+ * crash/error since we won't enter this function twice from within a
+ * single decoding slot/backend and the temporary file contains the pid of
+ * the current process.
+ */
+ if (unlink(tmppath) != 0 && errno != ENOENT)
+ ereport(ERROR, (errmsg("could not unlink old snapbuild state file %s", path)));
+
+ ondisk = MemoryContextAllocZero(builder->context, needed_size);
+ ondisk_c = ((char *) ondisk) + sizeof(SnapBuildOnDisk);
+ ondisk->magic = SNAPBUILD_MAGIC;
+ ondisk->size = needed_size;
+
+ /* copy state per struct assignment, lalala lazy. */
+ ondisk->builder = *builder;
+
+ /* NULL-ify memory-only data */
+ ondisk->builder.context = NULL;
+ ondisk->builder.snapshot = NULL;
+ ondisk->builder.reorder = NULL;
+
+ /* copy running xacts */
+ memcpy(ondisk_c, builder->running.xip,
+ sizeof(TransactionId) * builder->running.xcnt_space);
+ ondisk_c += sizeof(TransactionId) * builder->running.xcnt_space;
+
+ /* copy committed xacts */
+ memcpy(ondisk_c, builder->committed.xip,
+ sizeof(TransactionId) * builder->committed.xcnt);
+ ondisk_c += sizeof(TransactionId) * builder->committed.xcnt;
+
+ /* we have valid data now, open tempfile and write it there */
+ fd = OpenTransientFile(tmppath,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+ if (fd < 0)
+ ereport(ERROR, (errmsg("could not open snapbuild state file %s for writing: %m", path)));
+
+ if ((write(fd, ondisk, needed_size)) != needed_size)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to snapbuild state file \"%s\": %m",
+ tmppath)));
+ }
+
+ /*
+ * fsync the file before renaming so that even if we crash after this we
+ * have either a fully valid file or nothing.
+ *
+ * TODO: Do the fsync() via checkpoints/restartpoints, doing it here has
+ * some noticeable overhead since it's performed synchronously during
+ * decoding?
+ */
+ if (pg_fsync(fd) != 0)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync snapbuild state file \"%s\": %m",
+ tmppath)));
+ }
+
+ CloseTransientFile(fd);
+
+ /*
+ * We may overwrite the work from some other backend, but that's ok, our
+ * snapshot is valid as well.
+ */
+ if (rename(tmppath, path) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename snapbuild state file from \"%s\" to \"%s\": %m",
+ tmppath, path)));
+ }
+
+ /* make sure we persist */
+ fsync_fname(path, false);
+ fsync_fname("pg_llog/snapshots", true);
+
+ /*
+ * now there's no way we loose the dumped state anymore, remember
+ * serialization point.
+ */
+ builder->last_serialized_snapshot = lsn;
+
+out:
+ ReorderBufferSetRestartPoint(builder->reorder,
+ builder->last_serialized_snapshot);
+}
+
+/*
+ * Restore a snapshot into 'builder' if previously one has been stored at the
+ * location indicated by 'lsn'. Returns true if successfull, false otherwise.
+ */
+static bool
+SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
+{
+ SnapBuildOnDisk ondisk;
+ int fd;
+ char path[MAXPGPATH];
+ Size sz;
+
+ /* no point in loading a snapshot if we're already there */
+ if (builder->state == SNAPBUILD_CONSISTENT)
+ return false;
+
+ sprintf(path, "pg_llog/snapshots/%X-%X.snap",
+ (uint32) (lsn >> 32), (uint32) lsn);
+
+ fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+
+ elog(LOG, "restoring snapbuild state from %s", path);
+
+ if (fd < 0 && errno == ENOENT)
+ return false;
+ else if (fd < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open snapbuild state file %s", path)));
+
+ elog(LOG, "really restoring from %s", path);
+
+ /* read statically sized portion of snapshot */
+ if (read(fd, &ondisk, sizeof(ondisk)) != sizeof(ondisk))
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read snapbuild file \"%s\": %m",
+ path)));
+ }
+
+ if (ondisk.magic != SNAPBUILD_MAGIC)
+ ereport(ERROR, (errmsg("snapbuild state file has wrong magic %u instead of %u",
+ ondisk.magic, SNAPBUILD_MAGIC)));
+
+ /* restore running xact information */
+ sz = sizeof(TransactionId) * ondisk.builder.running.xcnt_space;
+ ondisk.builder.running.xip = MemoryContextAlloc(builder->context, sz);
+ if (read(fd, ondisk.builder.running.xip, sz) != sz)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read running xacts from snapbuild file \"%s\": %m",
+ path)));
+ }
+
+ /* restore running xact information */
+ sz = sizeof(TransactionId) * ondisk.builder.committed.xcnt;
+ ondisk.builder.committed.xip = MemoryContextAlloc(builder->context, sz);
+ if (read(fd, ondisk.builder.committed.xip, sz) != sz)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read committed xacts from snapbuild file \"%s\": %m",
+ path)));
+ }
+
+ CloseTransientFile(fd);
+
+ /*
+ * ok, we now have a sensible snapshot here, figure out if it has more
+ * information than we have.
+ */
+
+ /*
+ * We are only interested in consistent snapshots for now, comparing
+ * whether one imcomplete snapshot is more "advanced" seems to be
+ * unnecessarily complex.
+ */
+ if (ondisk.builder.state < SNAPBUILD_CONSISTENT)
+ goto snapshot_not_interesting;
+
+ /*
+ * Don't use a snapshot that requires an xmin that we cannot guarantee to
+ * be available.
+ */
+ if (TransactionIdPrecedes(ondisk.builder.xmin, builder->initial_xmin_horizon))
+ goto snapshot_not_interesting;
+
+ /*
+ * XXX: transactions_after needs to be updated differently, to be checked
+ * here
+ */
+
+ /* ok, we think the snapshot is sensible, copy over everything important */
+ builder->xmin = ondisk.builder.xmin;
+ builder->xmax = ondisk.builder.xmax;
+ builder->state = ondisk.builder.state;
+
+ builder->committed.xcnt = ondisk.builder.committed.xcnt;
+ /* We only allocated/stored xcnt, not xcnt_space xids ! */
+ /* don't overwrite preallocated xip, if we don't have anything here */
+ if (builder->committed.xcnt > 0)
+ {
+ pfree(builder->committed.xip);
+ builder->committed.xcnt_space = ondisk.builder.committed.xcnt;
+ builder->committed.xip = ondisk.builder.committed.xip;
+ }
+ ondisk.builder.committed.xip = NULL;
+
+ builder->running.xcnt = ondisk.builder.committed.xcnt;
+ if (builder->running.xip)
+ pfree(builder->running.xip);
+ builder->running.xcnt_space = ondisk.builder.committed.xcnt_space;
+ builder->running.xip = ondisk.builder.running.xip;
+
+ /* our snapshot is not interesting anymore, build a new one */
+ if (builder->snapshot != NULL)
+ {
+ SnapBuildSnapDecRefcount(builder->snapshot);
+ }
+ builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidTransactionId);
+ SnapBuildSnapIncRefcount(builder->snapshot);
+
+ ReorderBufferSetRestartPoint(builder->reorder, lsn);
+
+ Assert(builder->state == SNAPBUILD_CONSISTENT);
+ elog(LOG, "recovered initial snapshot (xmin %u) from disk", builder->xmin);
+
+ return true;
+
+snapshot_not_interesting:
+ if (ondisk.builder.running.xip != NULL)
+ pfree(ondisk.builder.running.xip);
+ if (ondisk.builder.committed.xip != NULL)
+ pfree(ondisk.builder.committed.xip);
+ return false;
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 8c83780..0d64156 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -65,7 +65,7 @@ Node *replication_parse_result;
}
/* Non-keyword tokens */
-%token <str> SCONST
+%token <str> SCONST IDENT
%token <uintval> UCONST
%token <recptr> RECPTR
@@ -73,6 +73,9 @@ Node *replication_parse_result;
%token K_BASE_BACKUP
%token K_IDENTIFY_SYSTEM
%token K_START_REPLICATION
+%token K_INIT_LOGICAL_REPLICATION
+%token K_START_LOGICAL_REPLICATION
+%token K_FREE_LOGICAL_REPLICATION
%token K_TIMELINE_HISTORY
%token K_LABEL
%token K_PROGRESS
@@ -82,10 +85,13 @@ Node *replication_parse_result;
%token K_TIMELINE
%type <node> command
-%type <node> base_backup start_replication identify_system timeline_history
+%type <node> base_backup start_replication start_logical_replication init_logical_replication free_logical_replication identify_system timeline_history
%type <list> base_backup_opt_list
%type <defelt> base_backup_opt
%type <uintval> opt_timeline
+%type <list> plugin_options plugin_opt_list
+%type <defelt> plugin_opt_elem
+%type <node> plugin_opt_arg
%%
firstcmd: command opt_semicolon
@@ -102,6 +108,9 @@ command:
identify_system
| base_backup
| start_replication
+ | init_logical_replication
+ | start_logical_replication
+ | free_logical_replication
| timeline_history
;
@@ -186,6 +195,67 @@ opt_timeline:
| /* nothing */ { $$ = 0; }
;
+init_logical_replication:
+ K_INIT_LOGICAL_REPLICATION IDENT IDENT
+ {
+ InitLogicalReplicationCmd *cmd;
+ cmd = makeNode(InitLogicalReplicationCmd);
+ cmd->name = $2;
+ cmd->plugin = $3;
+ $$ = (Node *) cmd;
+ }
+ ;
+
+start_logical_replication:
+ K_START_LOGICAL_REPLICATION IDENT RECPTR plugin_options
+ {
+ StartLogicalReplicationCmd *cmd;
+ cmd = makeNode(StartLogicalReplicationCmd);
+ cmd->name = $2;
+ cmd->startpoint = $3;
+ cmd->options = $4;
+ $$ = (Node *) cmd;
+ }
+ ;
+
+plugin_options:
+ '(' plugin_opt_list ')' { $$ = $2; }
+ | /* EMPTY */ { $$ = NIL; }
+ ;
+
+plugin_opt_list:
+ plugin_opt_elem
+ {
+ $$ = list_make1($1);
+ }
+ | plugin_opt_list ',' plugin_opt_elem
+ {
+ $$ = lappend($1, $3);
+ }
+ ;
+
+plugin_opt_elem:
+ IDENT plugin_opt_arg
+ {
+ $$ = makeDefElem($1, $2);
+ }
+ ;
+
+plugin_opt_arg:
+ SCONST { $$ = (Node *) makeString($1); }
+ | /* EMPTY */ { $$ = NULL; }
+ ;
+
+free_logical_replication:
+ K_FREE_LOGICAL_REPLICATION IDENT
+ {
+ FreeLogicalReplicationCmd *cmd;
+ cmd = makeNode(FreeLogicalReplicationCmd);
+ cmd->name = $2;
+ $$ = (Node *) cmd;
+ }
+ ;
+
/*
* TIMELINE_HISTORY %d
*/
@@ -205,6 +275,7 @@ timeline_history:
$$ = (Node *) cmd;
}
;
+
%%
#include "repl_scanner.c"
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 3d930f1..2b0f2ff 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "utils/builtins.h"
+#include "parser/scansup.h"
/* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
#undef fprintf
@@ -48,7 +49,7 @@ static void addlitchar(unsigned char ychar);
%option warn
%option prefix="replication_yy"
-%x xq
+%x xq xd
/* Extended quote
* xqdouble implements embedded quote, ''''
@@ -57,12 +58,26 @@ xqstart {quote}
xqdouble {quote}{quote}
xqinside [^']+
+/* Double quote
+ * Allows embedded spaces and other special characters into identifiers.
+ */
+dquote \"
+xdstart {dquote}
+xdstop {dquote}
+xddouble {dquote}{dquote}
+xdinside [^"]+
+
digit [0-9]+
hexdigit [0-9A-Za-z]+
quote '
quotestop {quote}
+ident_start [A-Za-z\200-\377_]
+ident_cont [A-Za-z\200-\377_0-9\$]
+
+identifier {ident_start}{ident_cont}*
+
%%
BASE_BACKUP { return K_BASE_BACKUP; }
@@ -74,9 +89,14 @@ PROGRESS { return K_PROGRESS; }
WAL { return K_WAL; }
TIMELINE { return K_TIMELINE; }
START_REPLICATION { return K_START_REPLICATION; }
+INIT_LOGICAL_REPLICATION { return K_INIT_LOGICAL_REPLICATION; }
+START_LOGICAL_REPLICATION { return K_START_LOGICAL_REPLICATION; }
+FREE_LOGICAL_REPLICATION { return K_FREE_LOGICAL_REPLICATION; }
TIMELINE_HISTORY { return K_TIMELINE_HISTORY; }
"," { return ','; }
";" { return ';'; }
+"(" { return '('; }
+")" { return ')'; }
[\n] ;
[\t] ;
@@ -100,20 +120,49 @@ TIMELINE_HISTORY { return K_TIMELINE_HISTORY; }
BEGIN(xq);
startlit();
}
+
<xq>{quotestop} {
yyless(1);
BEGIN(INITIAL);
yylval.str = litbufdup();
return SCONST;
}
-<xq>{xqdouble} {
+
+<xq>{xqdouble} {
addlitchar('\'');
}
+
<xq>{xqinside} {
addlit(yytext, yyleng);
}
-<xq><<EOF>> { yyerror("unterminated quoted string"); }
+{xdstart} {
+ BEGIN(xd);
+ startlit();
+ }
+
+<xd>{xdstop} {
+ int len;
+ yyless(1);
+ BEGIN(INITIAL);
+ yylval.str = litbufdup();
+ len = strlen(yylval.str);
+ truncate_identifier(yylval.str, len, true);
+ return IDENT;
+ }
+
+<xd>{xdinside} {
+ addlit(yytext, yyleng);
+ }
+
+{identifier} {
+ int len = strlen(yytext);
+
+ yylval.str = downcase_truncate_identifier(yytext, len, true);
+ return IDENT;
+ }
+
+<xq,xd><<EOF>> { yyerror("unterminated quoted string"); }
<<EOF>> {
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 413f0b9..e73f566 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1137,7 +1137,7 @@ XLogWalRcvSendHSFeedback(bool immed)
* everything else has been checked.
*/
if (hot_standby_feedback)
- xmin = GetOldestXmin(true, false);
+ xmin = GetOldestXmin(true, true, false, false);
else
xmin = InvalidTransactionId;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index b00a91a..2187d96 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -45,9 +45,8 @@
#include "access/timeline.h"
#include "access/transam.h"
-#include "access/xlog_internal.h"
#include "access/xact.h"
-
+#include "access/xlog_internal.h"
#include "catalog/pg_type.h"
#include "commands/dbcommands.h"
#include "funcapi.h"
@@ -56,6 +55,10 @@
#include "miscadmin.h"
#include "nodes/replnodes.h"
#include "replication/basebackup.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+#include "replication/snapbuild.h"
#include "replication/syncrep.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
@@ -157,6 +160,9 @@ static bool ping_sent = false;
static bool streamingDoneSending;
static bool streamingDoneReceiving;
+/* Are we there yet? */
+static bool WalSndCaughtUp = false;
+
/* Flags set by signal handlers for later service in main loop */
static volatile sig_atomic_t got_SIGHUP = false;
static volatile sig_atomic_t walsender_ready_to_stop = false;
@@ -169,24 +175,42 @@ static volatile sig_atomic_t walsender_ready_to_stop = false;
*/
static volatile sig_atomic_t replication_active = false;
+/* XXX reader */
+static MemoryContext decoding_ctx = NULL;
+static MemoryContext old_decoding_ctx = NULL;
+
+static LogicalDecodingContext *logical_decoding_ctx = NULL;
+static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
+
/* Signal handlers */
static void WalSndSigHupHandler(SIGNAL_ARGS);
static void WalSndXLogSendHandler(SIGNAL_ARGS);
static void WalSndLastCycleHandler(SIGNAL_ARGS);
/* Prototypes for private functions */
-static void WalSndLoop(void);
+typedef void (*WalSndSendData)(void);
+static void WalSndLoop(WalSndSendData send_data);
static void InitWalSenderSlot(void);
static void WalSndKill(int code, Datum arg);
-static void XLogSend(bool *caughtup);
+static void XLogSendPhysical(void);
+static void XLogSendLogical(void);
+static void WalSndDone(WalSndSendData send_data);
static XLogRecPtr GetStandbyFlushRecPtr(void);
static void IdentifySystem(void);
static void StartReplication(StartReplicationCmd *cmd);
+static void InitLogicalReplication(InitLogicalReplicationCmd *cmd);
+static void StartLogicalReplication(StartLogicalReplicationCmd *cmd);
+static void FreeLogicalReplication(FreeLogicalReplicationCmd *cmd);
static void ProcessStandbyMessage(void);
static void ProcessStandbyReplyMessage(void);
static void ProcessStandbyHSFeedbackMessage(void);
static void ProcessRepliesIfAny(void);
static void WalSndKeepalive(bool requestReply);
+static void WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
+static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
+static void XLogRead(char *buf, XLogRecPtr startptr, Size count);
+
+
/* Initialize walsender process before entering the main command loop */
@@ -247,14 +271,13 @@ IdentifySystem(void)
char tli[11];
char xpos[MAXFNAMELEN];
XLogRecPtr logptr;
- char* dbname = NULL;
+ char *dbname = NULL;
/*
* Reply with a result set with one row, four columns. First col is system
* ID, second is timeline ID, third is current xlog location and the fourth
* contains the database name if we are connected to one.
*/
-
snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
GetSystemIdentifier());
@@ -308,22 +331,22 @@ IdentifySystem(void)
pq_sendint(&buf, 0, 2); /* format code */
/* third field */
- pq_sendstring(&buf, "xlogpos");
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
- pq_sendint(&buf, TEXTOID, 4);
- pq_sendint(&buf, -1, 2);
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
+ pq_sendstring(&buf, "xlogpos"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
/* fourth field */
- pq_sendstring(&buf, "dbname");
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
- pq_sendint(&buf, TEXTOID, 4);
- pq_sendint(&buf, -1, 2);
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
+ pq_sendstring(&buf, "dbname"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
pq_endmessage(&buf);
/* Send a DataRow message */
@@ -335,9 +358,16 @@ IdentifySystem(void)
pq_sendbytes(&buf, (char *) tli, strlen(tli));
pq_sendint(&buf, strlen(xpos), 4); /* col3 len */
pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
- pq_sendint(&buf, strlen(dbname), 4); /* col4 len */
- pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
-
+ /* send NULL if not connected to a database */
+ if (dbname)
+ {
+ pq_sendint(&buf, strlen(dbname), 4); /* col4 len */
+ pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
+ }
+ else
+ {
+ pq_sendint(&buf, -1, 4); /* col4 len */
+ }
pq_endmessage(&buf);
}
@@ -586,7 +616,7 @@ StartReplication(StartReplicationCmd *cmd)
/* Main loop of walsender */
replication_active = true;
- WalSndLoop();
+ WalSndLoop(XLogSendPhysical);
replication_active = false;
if (walsender_ready_to_stop)
@@ -653,6 +683,497 @@ StartReplication(StartReplicationCmd *cmd)
pq_puttextmessage('C', "START_STREAMING");
}
+static int
+replay_read_page(XLogReaderState* state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetRecPtr, char* cur_page, TimeLineID *pageTLI)
+{
+ XLogRecPtr flushptr;
+ int count;
+
+ flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
+
+ /* more than one block available */
+ if (targetPagePtr + XLOG_BLCKSZ <= flushptr)
+ count = XLOG_BLCKSZ;
+ /* not enough data there */
+ else if (targetPagePtr + reqLen > flushptr)
+ return -1;
+ /* part of the page available */
+ else
+ count = flushptr - targetPagePtr;
+
+ /* FIXME: more sensible/efficient implementation */
+ XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+
+ return count;
+}
+
+/*
+ * Initialize logical replication and wait for an initial consistent point to
+ * start sending changes from.
+ */
+static void
+InitLogicalReplication(InitLogicalReplicationCmd *cmd)
+{
+ const char *slot_name;
+ StringInfoData buf;
+ char xpos[MAXFNAMELEN];
+ const char *snapshot_name = NULL;
+ LogicalDecodingContext *ctx;
+ XLogRecPtr startptr;
+
+ CheckLogicalReplicationRequirements();
+
+ Assert(!MyLogicalDecodingSlot);
+
+ /* XXX apply sanity checking to slot name? */
+ LogicalDecodingAcquireFreeSlot(cmd->name, cmd->plugin);
+
+ Assert(MyLogicalDecodingSlot);
+
+ decoding_ctx = AllocSetContextCreate(TopMemoryContext,
+ "decoding context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ old_decoding_ctx = MemoryContextSwitchTo(decoding_ctx);
+
+ /* setup state for XLogReadPage */
+ sendTimeLineIsHistoric = false;
+ sendTimeLine = ThisTimeLineID;
+
+ initStringInfo(&output_message);
+ ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false, InvalidXLogRecPtr,
+ NIL, replay_read_page,
+ WalSndPrepareWrite, WalSndWriteData);
+
+ MemoryContextSwitchTo(old_decoding_ctx);
+
+ startptr = MyLogicalDecodingSlot->restart_decoding;
+
+ elog(WARNING, "Initiating logical rep from %X/%X",
+ (uint32)(startptr >> 32), (uint32)startptr);
+
+ for (;;)
+ {
+ XLogRecord *record;
+ XLogRecordBuffer buf;
+ char *err = NULL;
+
+ /* the read_page callback waits for new WAL */
+ record = XLogReadRecord(ctx->reader, startptr, &err);
+ /* xlog record was invalid */
+ if (err)
+ elog(ERROR, "%s", err);
+
+ /* read up from last position next time round */
+ startptr = InvalidXLogRecPtr;
+
+ Assert(record);
+
+ buf.origptr = ctx->reader->ReadRecPtr;
+ buf.endptr = ctx->reader->EndRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+ DecodeRecordIntoReorderBuffer(ctx, &buf);
+
+ /* only continue till we found a consistent spot */
+ if (LogicalDecodingContextReady(ctx))
+ {
+ /* export plain, importable, snapshot to the user */
+ snapshot_name = SnapBuildExportSnapshot(ctx->snapshot_builder);
+ break;
+ }
+ }
+
+ MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+ slot_name = NameStr(MyLogicalDecodingSlot->name);
+ snprintf(xpos, sizeof(xpos), "%X/%X",
+ (uint32) (MyLogicalDecodingSlot->confirmed_flush >> 32),
+ (uint32) MyLogicalDecodingSlot->confirmed_flush);
+
+ pq_beginmessage(&buf, 'T');
+ pq_sendint(&buf, 4, 2); /* 4 fields */
+
+ /* first field */
+ pq_sendstring(&buf, "replication_id"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_sendstring(&buf, "consistent_point"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_sendstring(&buf, "snapshot_name"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_sendstring(&buf, "plugin"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_endmessage(&buf);
+
+ /* Send a DataRow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint(&buf, 4, 2); /* # of columns */
+
+ /* replication_id */
+ pq_sendint(&buf, strlen(slot_name), 4); /* col1 len */
+ pq_sendbytes(&buf, slot_name, strlen(slot_name));
+
+ /* consistent wal location */
+ pq_sendint(&buf, strlen(xpos), 4); /* col2 len */
+ pq_sendbytes(&buf, xpos, strlen(xpos));
+
+ /* snapshot name */
+ pq_sendint(&buf, strlen(snapshot_name), 4); /* col3 len */
+ pq_sendbytes(&buf, snapshot_name, strlen(snapshot_name));
+
+ /* plugin */
+ pq_sendint(&buf, strlen(cmd->plugin), 4); /* col4 len */
+ pq_sendbytes(&buf, cmd->plugin, strlen(cmd->plugin));
+
+ pq_endmessage(&buf);
+
+ /*
+ * release active status again, START_LOGICAL_REPLICATION will reacquire it
+ */
+ LogicalDecodingReleaseSlot();
+}
+
+/*
+ * Load previously initiated logical slot and prepare for sending data (via
+ * WalSndLoop).
+ */
+static void
+StartLogicalReplication(StartLogicalReplicationCmd *cmd)
+{
+ StringInfoData buf;
+ XLogRecPtr confirmed_flush;
+
+ elog(WARNING, "Starting logical replication from %x/%x",
+ (uint32)(cmd->startpoint >> 32), (uint32)cmd->startpoint);
+
+ /* make sure that our requirements are still fulfilled */
+ CheckLogicalReplicationRequirements();
+
+ Assert(!MyLogicalDecodingSlot);
+
+ LogicalDecodingReAcquireSlot(cmd->name);
+
+ if (am_cascading_walsender && !RecoveryInProgress())
+ {
+ ereport(LOG,
+ (errmsg("terminating walsender process to force cascaded standby to update timeline and reconnect")));
+ walsender_ready_to_stop = true;
+ }
+
+ WalSndSetState(WALSNDSTATE_CATCHUP);
+
+ /* Send a CopyBothResponse message, and start streaming */
+ pq_beginmessage(&buf, 'W');
+ pq_sendbyte(&buf, 0);
+ pq_sendint(&buf, 0, 2);
+ pq_endmessage(&buf);
+ pq_flush();
+
+ /* setup state for XLogReadPage */
+ sendTimeLineIsHistoric = false;
+ sendTimeLine = ThisTimeLineID;
+
+ confirmed_flush = MyLogicalDecodingSlot->confirmed_flush;
+
+ Assert(confirmed_flush != InvalidXLogRecPtr);
+
+ /* continue from last position */
+ if (cmd->startpoint == InvalidXLogRecPtr)
+ cmd->startpoint = MyLogicalDecodingSlot->confirmed_flush;
+ else if (cmd->startpoint > MyLogicalDecodingSlot->confirmed_flush)
+ elog(ERROR, "cannot stream from %X/%X, minimum is %X/%X",
+ (uint32)(cmd->startpoint >> 32), (uint32)cmd->startpoint,
+ (uint32)(confirmed_flush >> 32), (uint32)confirmed_flush);
+
+ /*
+ * Initialize position to the last ack'ed one, then the xlog records begin
+ * to be shipped from that position.
+ */
+ logical_decoding_ctx = CreateLogicalDecodingContext(
+ MyLogicalDecodingSlot, false, cmd->startpoint, cmd->options,
+ replay_read_page, WalSndPrepareWrite, WalSndWriteData);
+
+ /*
+ * XXX: For feedback purposes it would be nicer to set sentPtr to
+ * cmd->startpoint, but we use it to know where to read xlog in the main
+ * loop...
+ */
+ sentPtr = MyLogicalDecodingSlot->restart_decoding;
+ logical_startptr = sentPtr;
+
+ /* Also update the start position status in shared memory */
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = MyWalSnd;
+
+ SpinLockAcquire(&walsnd->mutex);
+ walsnd->sentPtr = MyLogicalDecodingSlot->restart_decoding;
+ SpinLockRelease(&walsnd->mutex);
+ }
+
+ elog(LOG, "starting to decode from %X/%X, replay %X/%X",
+ (uint32)(MyWalSnd->sentPtr >> 32), (uint32)MyWalSnd->sentPtr,
+ (uint32)(cmd->startpoint >> 32), (uint32)cmd->startpoint);
+
+ replication_active = true;
+
+ SyncRepInitConfig();
+
+ /* Main loop of walsender */
+ WalSndLoop(XLogSendLogical);
+
+ LogicalDecodingReleaseSlot();
+
+ replication_active = false;
+ if (walsender_ready_to_stop)
+ proc_exit(0);
+ WalSndSetState(WALSNDSTATE_STARTUP);
+
+ /* Get out of COPY mode (CommandComplete). */
+ EndCommand("COPY 0", DestRemote);
+}
+
+/*
+ * Free permanent state by a now inactive but defined logical slot.
+ */
+static void
+FreeLogicalReplication(FreeLogicalReplicationCmd *cmd)
+{
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingFreeSlot(cmd->name);
+ EndCommand("FREE_LOGICAL_REPLICATION", DestRemote);
+}
+
+/*
+ * LogicalDecodingContext 'prepare_write' callback.
+ *
+ * Prepare a write into a StringInfo.
+ *
+ * Don't do anything lasting in here, it's quite possible that nothing will done
+ * with the data.
+ */
+static void
+WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ AssertVariableIsOfType(&WalSndPrepareWrite, LogicalOutputPluginWriterPrepareWrite);
+
+ resetStringInfo(ctx->out);
+
+ pq_sendbyte(ctx->out, 'w');
+ pq_sendint64(ctx->out, lsn); /* dataStart */
+ /* XXX: overwrite when data is assembled */
+ pq_sendint64(ctx->out, lsn); /* walEnd */
+ /* XXX: gather that value later just as it's done in XLogSendPhysical */
+ pq_sendint64(ctx->out, 0 /*GetCurrentIntegerTimestamp() */);/* sendtime */
+}
+
+/*
+ * LogicalDecodingContext 'write' callback.
+ *
+ * Actually write out data previously prepared by WalSndPrepareWrite out to the
+ * network, take as long as needed but process replies from the other side
+ * during that.
+ */
+static void
+WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ AssertVariableIsOfType(&WalSndWriteData, LogicalOutputPluginWriterWrite);
+
+ /* output previously gathered data in a CopyData packet */
+ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
+
+ /* fast path */
+ /* Try to flush pending output to the client */
+ if (pq_flush_if_writable() != 0)
+ return;
+
+ if (!pq_is_send_pending())
+ return;
+
+ for (;;)
+ {
+ int wakeEvents;
+ long sleeptime = 10000; /* 10s */
+
+ /*
+ * Emergency bailout if postmaster has died. This is to avoid the
+ * necessity for manual cleanup of all postmaster children.
+ */
+ if (!PostmasterIsAlive())
+ exit(1);
+
+ /* Process any requests or signals received recently */
+ if (got_SIGHUP)
+ {
+ got_SIGHUP = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ SyncRepInitConfig();
+ }
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Check for input from the client */
+ ProcessRepliesIfAny();
+
+ /* Clear any already-pending wakeups */
+ ResetLatch(&MyWalSnd->latch);
+
+ /* Try to flush pending output to the client */
+ if (pq_flush_if_writable() != 0)
+ break;
+
+ /* If we finished clearing the buffered data, we're done here. */
+ if (!pq_is_send_pending())
+ break;
+
+ /*
+ * Note we don't set a timeout here. It would be pointless, because
+ * if the socket is not writable there's not much we can do elsewhere
+ * anyway.
+ */
+ wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+ WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
+
+ ImmediateInterruptOK = true;
+ CHECK_FOR_INTERRUPTS();
+ WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
+ MyProcPort->sock, sleeptime);
+ ImmediateInterruptOK = false;
+ }
+
+ /* reactivate latch so WalSndLoop knows to continue */
+ SetLatch(&MyWalSnd->latch);
+}
+
+/*
+ * Wait till WAL < loc is flushed to disk so it can be safely read.
+ */
+XLogRecPtr
+WalSndWaitForWal(XLogRecPtr loc)
+{
+ int wakeEvents;
+ XLogRecPtr flushptr;
+
+ /* fast path if everything is there already */
+ /*
+ * XXX: introduce RecentFlushPtr to avoid acquiring the spinlock in the
+ * fast path case where we already know we have enough WAL available.
+ */
+ flushptr = GetFlushRecPtr();
+ if (loc <= flushptr)
+ return flushptr;
+
+ for (;;)
+ {
+ long sleeptime = 10000; /* 10 s */
+
+ /*
+ * Emergency bailout if postmaster has died. This is to avoid the
+ * necessity for manual cleanup of all postmaster children.
+ */
+ if (!PostmasterIsAlive())
+ exit(1);
+
+ /* Process any requests or signals received recently */
+ if (got_SIGHUP)
+ {
+ got_SIGHUP = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ SyncRepInitConfig();
+ }
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Check for input from the client */
+ ProcessRepliesIfAny();
+
+ /* Clear any already-pending wakeups */
+ ResetLatch(&MyWalSnd->latch);
+
+ /* Update our idea of flushed position. */
+ flushptr = GetFlushRecPtr();
+
+ /* If postmaster asked us to stop, don't wait here anymore */
+ if (walsender_ready_to_stop)
+ break;
+
+ /* check whether we're done */
+ if (loc <= flushptr)
+ break;
+
+ /* Determine time until replication timeout */
+ if (wal_sender_timeout > 0)
+ {
+ if (!ping_sent)
+ {
+ TimestampTz timeout;
+
+ /*
+ * If half of wal_sender_timeout has lapsed without receiving
+ * any reply from standby, send a keep-alive message to standby
+ * requesting an immediate reply.
+ */
+ timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
+ wal_sender_timeout / 2);
+ if (GetCurrentTimestamp() >= timeout)
+ {
+ WalSndKeepalive(true);
+ ping_sent = true;
+ /* Try to flush pending output to the client */
+ if (pq_flush_if_writable() != 0)
+ break;
+ }
+ }
+
+ sleeptime = 1 + (wal_sender_timeout / 10);
+ }
+
+ wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+ WL_SOCKET_READABLE | WL_TIMEOUT;
+
+ ImmediateInterruptOK = true;
+ CHECK_FOR_INTERRUPTS();
+ WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
+ MyProcPort->sock, sleeptime);
+ ImmediateInterruptOK = false;
+
+ /*
+ * The equivalent code in WalSndLoop checks here that replication
+ * timeout hasn't been exceeded. We don't do that here. XXX explain
+ * why.
+ */
+ }
+
+ /* reactivate latch so WalSndLoop knows to continue */
+ SetLatch(&MyWalSnd->latch);
+ return flushptr;
+}
+
/*
* Execute an incoming replication command.
*/
@@ -664,6 +1185,12 @@ exec_replication_command(const char *cmd_string)
MemoryContext cmd_context;
MemoryContext old_context;
+ /*
+ * INIT_LOGICAL_REPLICATION exports a snapshot until the next command
+ * arrives. Clean up the old stuff if there's anything.
+ */
+ SnapBuildClearExportedSnapshot();
+
elog(DEBUG1, "received replication command: %s", cmd_string);
CHECK_FOR_INTERRUPTS();
@@ -695,6 +1222,18 @@ exec_replication_command(const char *cmd_string)
StartReplication((StartReplicationCmd *) cmd_node);
break;
+ case T_InitLogicalReplicationCmd:
+ InitLogicalReplication((InitLogicalReplicationCmd *) cmd_node);
+ break;
+
+ case T_StartLogicalReplicationCmd:
+ StartLogicalReplication((StartLogicalReplicationCmd *) cmd_node);
+ break;
+
+ case T_FreeLogicalReplicationCmd:
+ FreeLogicalReplication((FreeLogicalReplicationCmd *) cmd_node);
+ break;
+
case T_BaseBackupCmd:
SendBaseBackup((BaseBackupCmd *) cmd_node);
break;
@@ -904,6 +1443,12 @@ ProcessStandbyReplyMessage(void)
SpinLockRelease(&walsnd->mutex);
}
+ /*
+ * Advance our local xmin horizon when the client confirmed a flush.
+ */
+ if (MyLogicalDecodingSlot && flushPtr != InvalidXLogRecPtr)
+ LogicalConfirmReceivedLocation(flushPtr);
+
if (!am_cascading_walsender)
SyncRepReleaseWaiters();
}
@@ -988,10 +1533,8 @@ ProcessStandbyHSFeedbackMessage(void)
/* Main loop of walsender process that streams the WAL over Copy messages. */
static void
-WalSndLoop(void)
+WalSndLoop(WalSndSendData send_data)
{
- bool caughtup = false;
-
/*
* Allocate buffers that will be used for each outgoing and incoming
* message. We do this just once to reduce palloc overhead.
@@ -1043,21 +1586,21 @@ WalSndLoop(void)
/*
* If we don't have any pending data in the output buffer, try to send
- * some more. If there is some, we don't bother to call XLogSend
+ * some more. If there is some, we don't bother to call send_data
* again until we've flushed it ... but we'd better assume we are not
* caught up.
*/
if (!pq_is_send_pending())
- XLogSend(&caughtup);
+ send_data();
else
- caughtup = false;
+ WalSndCaughtUp = false;
/* Try to flush pending output to the client */
if (pq_flush_if_writable() != 0)
goto send_failure;
/* If nothing remains to be sent right now ... */
- if (caughtup && !pq_is_send_pending())
+ if (WalSndCaughtUp && !pq_is_send_pending())
{
/*
* If we're in catchup state, move to streaming. This is an
@@ -1083,29 +1626,17 @@ WalSndLoop(void)
* the walsender is not sure which.
*/
if (walsender_ready_to_stop)
- {
- /* ... let's just be real sure we're caught up ... */
- XLogSend(&caughtup);
- if (caughtup && sentPtr == MyWalSnd->flush &&
- !pq_is_send_pending())
- {
- /* Inform the standby that XLOG streaming is done */
- EndCommand("COPY 0", DestRemote);
- pq_flush();
-
- proc_exit(0);
- }
- }
+ WalSndDone(send_data);
}
/*
* We don't block if not caught up, unless there is unsent data
* pending in which case we'd better block until the socket is
- * write-ready. This test is only needed for the case where XLogSend
+ * write-ready. This test is only needed for the case where send_data
* loaded a subset of the available data but then pq_flush_if_writable
* flushed it all --- we should immediately try to send more.
*/
- if ((caughtup && !streamingDoneSending) || pq_is_send_pending())
+ if ((WalSndCaughtUp && !streamingDoneSending) || pq_is_send_pending())
{
TimestampTz timeout = 0;
long sleeptime = 10000; /* 10 s */
@@ -1434,15 +1965,17 @@ retry:
}
/*
+ * Send out the WAL in its normal physical/stored form.
+ *
* Read up to MAX_SEND_SIZE bytes of WAL that's been flushed to disk,
* but not yet sent to the client, and buffer it in the libpq output
* buffer.
*
- * If there is no unsent WAL remaining, *caughtup is set to true, otherwise
- * *caughtup is set to false.
+ * If there is no unsent WAL remaining, WalSndCaughtUp is set to true,
+ * otherwise WalSndCaughtUp is set to false.
*/
static void
-XLogSend(bool *caughtup)
+XLogSendPhysical(void)
{
XLogRecPtr SendRqstPtr;
XLogRecPtr startptr;
@@ -1451,7 +1984,7 @@ XLogSend(bool *caughtup)
if (streamingDoneSending)
{
- *caughtup = true;
+ WalSndCaughtUp = true;
return;
}
@@ -1568,7 +2101,7 @@ XLogSend(bool *caughtup)
pq_putmessage_noblock('c', NULL, 0);
streamingDoneSending = true;
- *caughtup = true;
+ WalSndCaughtUp = true;
elog(DEBUG1, "walsender reached end of timeline at %X/%X (sent up to %X/%X)",
(uint32) (sendTimeLineValidUpto >> 32), (uint32) sendTimeLineValidUpto,
@@ -1580,7 +2113,7 @@ XLogSend(bool *caughtup)
Assert(sentPtr <= SendRqstPtr);
if (SendRqstPtr <= sentPtr)
{
- *caughtup = true;
+ WalSndCaughtUp = true;
return;
}
@@ -1604,15 +2137,15 @@ XLogSend(bool *caughtup)
{
endptr = SendRqstPtr;
if (sendTimeLineIsHistoric)
- *caughtup = false;
+ WalSndCaughtUp = false;
else
- *caughtup = true;
+ WalSndCaughtUp = true;
}
else
{
/* round down to page boundary. */
endptr -= (endptr % XLOG_BLCKSZ);
- *caughtup = false;
+ WalSndCaughtUp = false;
}
nbytes = endptr - startptr;
@@ -1673,6 +2206,96 @@ XLogSend(bool *caughtup)
}
/*
+ * Send out the WAL after it being decoded into a logical format by the output
+ * plugin specified in INIT_LOGICAL_DECODING
+ */
+static void
+XLogSendLogical(void)
+{
+ XLogRecord *record;
+ char *errm;
+
+ if (decoding_ctx == NULL)
+ {
+ decoding_ctx = AllocSetContextCreate(TopMemoryContext,
+ "decoding context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ }
+
+ record = XLogReadRecord(logical_decoding_ctx->reader, logical_startptr, &errm);
+ logical_startptr = InvalidXLogRecPtr;
+
+ /* xlog record was invalid */
+ if (errm != NULL)
+ elog(ERROR, "%s", errm);
+
+ if (record != NULL)
+ {
+ XLogRecordBuffer buf;
+
+ buf.origptr = logical_decoding_ctx->reader->ReadRecPtr;
+ buf.endptr = logical_decoding_ctx->reader->EndRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+
+ old_decoding_ctx = MemoryContextSwitchTo(decoding_ctx);
+
+ DecodeRecordIntoReorderBuffer(logical_decoding_ctx, &buf);
+
+ MemoryContextSwitchTo(old_decoding_ctx);
+
+ /*
+ * If the record we just read is at or beyond the flushed point, then
+ * we're caught up.
+ */
+ WalSndCaughtUp =
+ logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr();
+ }
+ else
+ /*
+ * xlogreader failed, and no error was reported? we must be caught up.
+ */
+ WalSndCaughtUp = true;
+
+ /* Update shared memory status */
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = MyWalSnd;
+
+ SpinLockAcquire(&walsnd->mutex);
+ walsnd->sentPtr = logical_decoding_ctx->reader->ReadRecPtr;
+ SpinLockRelease(&walsnd->mutex);
+ }
+}
+
+/*
+ * The sender is caught up, so we can go away for shutdown processing
+ * to finish normally. (This should only be called when the shutdown
+ * signal has been received from postmaster.)
+ *
+ * Note that if while doing this we determine that there's still more
+ * data to send, this function will return control to the caller.
+ */
+static void
+WalSndDone(WalSndSendData send_data)
+{
+ /* ... let's just be real sure we're caught up ... */
+ send_data();
+
+ if (WalSndCaughtUp && sentPtr == MyWalSnd->flush &&
+ !pq_is_send_pending())
+ {
+ /* Inform the standby that XLOG streaming is done */
+ EndCommand("COPY 0", DestRemote);
+ pq_flush();
+
+ proc_exit(0);
+ }
+}
+
+/*
* Returns the latest point in WAL that has been safely flushed to disk, and
* can be sent to the standby. This should only be called when in recovery,
* ie. we're streaming to a cascaded standby.
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index a0b741b..71d8f04 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
#include "postmaster/bgworker_internals.h"
#include "postmaster/bgwriter.h"
#include "postmaster/postmaster.h"
+#include "replication/logical.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "storage/bufmgr.h"
@@ -124,6 +125,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
size = add_size(size, ProcSignalShmemSize());
size = add_size(size, CheckpointerShmemSize());
size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, LogicalDecodingShmemSize());
size = add_size(size, WalSndShmemSize());
size = add_size(size, WalRcvShmemSize());
size = add_size(size, BTreeShmemSize());
@@ -230,6 +232,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
ProcSignalShmemInit();
CheckpointerShmemInit();
AutoVacuumShmemInit();
+ LogicalDecodingShmemInit();
WalSndShmemInit();
WalRcvShmemInit();
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index c2f86ff..11aa1f5 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -51,6 +51,9 @@
#include "access/xact.h"
#include "access/twophase.h"
#include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/walsender.h"
+#include "replication/walsender_private.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/spin.h"
@@ -1141,16 +1144,18 @@ TransactionIdIsActive(TransactionId xid)
* GetOldestXmin() move backwards, with no consequences for data integrity.
*/
TransactionId
-GetOldestXmin(bool allDbs, bool ignoreVacuum)
+GetOldestXmin(bool allDbs, bool ignoreVacuum, bool systable, bool alreadyLocked)
{
ProcArrayStruct *arrayP = procArray;
TransactionId result;
int index;
+ volatile TransactionId logical_xmin = InvalidTransactionId;
/* Cannot look for individual databases during recovery */
Assert(allDbs || !RecoveryInProgress());
- LWLockAcquire(ProcArrayLock, LW_SHARED);
+ if (!alreadyLocked)
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
/*
* We initialize the MIN() calculation with latestCompletedXid + 1. This
@@ -1197,6 +1202,10 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
}
}
+ /* fetch into volatile var while ProcArrayLock is held */
+ if (max_logical_slots > 0)
+ logical_xmin = LogicalDecodingCtl->xmin;
+
if (RecoveryInProgress())
{
/*
@@ -1205,7 +1214,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
*/
TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
- LWLockRelease(ProcArrayLock);
+ if (!alreadyLocked)
+ LWLockRelease(ProcArrayLock);
if (TransactionIdIsNormal(kaxmin) &&
TransactionIdPrecedes(kaxmin, result))
@@ -1213,10 +1223,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
}
else
{
- /*
- * No other information needed, so release the lock immediately.
- */
- LWLockRelease(ProcArrayLock);
+ if (!alreadyLocked)
+ LWLockRelease(ProcArrayLock);
/*
* Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
@@ -1237,6 +1245,15 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
result = FirstNormalTransactionId;
}
+ /*
+ * after locks are released and defer_cleanup_age has been applied, check
+ * whether we need to back up further to make logical decoding possible.
+ */
+ if (systable &&
+ TransactionIdIsValid(logical_xmin) &&
+ NormalTransactionIdPrecedes(logical_xmin, result))
+ result = logical_xmin;
+
return result;
}
@@ -1290,7 +1307,9 @@ GetMaxSnapshotSubxidCount(void)
* older than this are known not running any more.
* RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
* running transactions, except those running LAZY VACUUM). This is
- * the same computation done by GetOldestXmin(true, true).
+ * the same computation done by GetOldestXmin(true, true, ...).
+ * RecentGlobalDataXmin: the global xmin for non-catalog tables
+ * >= RecentGlobalXmin
*
* Note: this function should probably not be called with an argument that's
* not statically allocated (see xip allocation below).
@@ -1306,6 +1325,7 @@ GetSnapshotData(Snapshot snapshot)
int count = 0;
int subcount = 0;
bool suboverflowed = false;
+ volatile TransactionId logical_xmin = InvalidTransactionId;
Assert(snapshot != NULL);
@@ -1483,8 +1503,14 @@ GetSnapshotData(Snapshot snapshot)
suboverflowed = true;
}
+
+ /* fetch into volatile var while ProcArrayLock is held */
+ if (max_logical_slots > 0)
+ logical_xmin = LogicalDecodingCtl->xmin;
+
if (!TransactionIdIsValid(MyPgXact->xmin))
MyPgXact->xmin = TransactionXmin = xmin;
+
LWLockRelease(ProcArrayLock);
/*
@@ -1499,6 +1525,17 @@ GetSnapshotData(Snapshot snapshot)
RecentGlobalXmin = globalxmin - vacuum_defer_cleanup_age;
if (!TransactionIdIsNormal(RecentGlobalXmin))
RecentGlobalXmin = FirstNormalTransactionId;
+
+ /* Non-catalog tables can be vacuumed if older than this xid */
+ RecentGlobalDataXmin = RecentGlobalXmin;
+
+ /*
+ * peg the global xmin to the one required for logical decoding if required
+ */
+ if (TransactionIdIsNormal(logical_xmin) &&
+ NormalTransactionIdPrecedes(logical_xmin, RecentGlobalXmin))
+ RecentGlobalXmin = logical_xmin;
+
RecentXmin = xmin;
snapshot->xmin = xmin;
@@ -1599,9 +1636,11 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
* Similar to GetSnapshotData but returns more information. We include
* all PGXACTs with an assigned TransactionId, even VACUUM processes.
*
- * We acquire XidGenLock, but the caller is responsible for releasing it.
- * This ensures that no new XIDs enter the proc array until the caller has
- * WAL-logged this snapshot, and releases the lock.
+ * We acquire XidGenLock and ProcArrayLock, but the caller is responsible for
+ * releasing them. Acquiring XidGenLock ensures that no new XIDs enter the proc
+ * array until the caller has WAL-logged this snapshot, and releases the
+ * lock. Acquiring ProcArrayLock ensures that no transactions commit until the
+ * lock is released.
*
* The returned data structure is statically allocated; caller should not
* modify it, and must not assume it is valid past the next call.
@@ -1736,6 +1775,12 @@ GetRunningTransactionData(void)
}
}
+ /*
+ * Its important *not* to track decoding tasks here because snapbuild.c
+ * uses ->oldestRunningXid to manage its xmin. If it were to be included
+ * here the initial value could never increase.
+ */
+
CurrentRunningXacts->xcnt = count - subcount;
CurrentRunningXacts->subxcnt = subcount;
CurrentRunningXacts->subxid_overflow = suboverflowed;
@@ -1743,13 +1788,12 @@ GetRunningTransactionData(void)
CurrentRunningXacts->oldestRunningXid = oldestRunningXid;
CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
- /* We don't release XidGenLock here, the caller is responsible for that */
- LWLockRelease(ProcArrayLock);
-
Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));
+ /* We don't release the locks here, the caller is responsible for that */
+
return CurrentRunningXacts;
}
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 97da1a0..5f74c3e 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -879,8 +879,23 @@ LogStandbySnapshot(void)
* record we write, because standby will open up when it sees this.
*/
running = GetRunningTransactionData();
+
+ /*
+ * GetRunningTransactionData() acquired ProcArrayLock, we must release
+ * it. We can do that before inserting the WAL record because
+ * ProcArrayApplyRecoveryInfo can recheck the commit status using the
+ * clog. If we're doing logical replication we can't do that though, so
+ * hold the lock for a moment longer.
+ */
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ LWLockRelease(ProcArrayLock);
+
recptr = LogCurrentRunningXacts(running);
+ /* Release lock if we kept it longer ... */
+ if (wal_level >= WAL_LEVEL_LOGICAL)
+ LWLockRelease(ProcArrayLock);
+
/* GetRunningTransactionData() acquired XidGenLock, we must release it */
LWLockRelease(XidGenLock);
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index bfe7d78..015970a 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -512,7 +512,7 @@ RegisterSnapshotInvalidation(Oid dbId, Oid relId)
* Only the local caches are flushed; this does not transmit the message
* to other backends.
*/
-static void
+void
LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
{
if (msg->id >= 0)
@@ -596,7 +596,7 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
* since that tells us we've lost some shared-inval messages and hence
* don't know what needs to be invalidated.
*/
-static void
+void
InvalidateSystemCaches(void)
{
int i;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 44dd0d2..5d304ce 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1601,6 +1601,10 @@ RelationIdGetRelation(Oid relationId)
return rd;
}
+ /* up2date system relations, even during timetravel */
+ if (IsSystemRelationId(relationId))
+ SuspendDecodingSnapshots();
+
/*
* no reldesc in the cache, so have RelationBuildDesc() build one and add
* it.
@@ -1608,6 +1612,10 @@ RelationIdGetRelation(Oid relationId)
rd = RelationBuildDesc(relationId, true);
if (RelationIsValid(rd))
RelationIncrementReferenceCount(rd);
+
+ if (IsSystemRelationId(relationId))
+ UnSuspendDecodingSnapshots();
+
return rd;
}
@@ -1729,6 +1737,10 @@ RelationReloadIndexInfo(Relation relation)
return;
}
+ /* up2date system relations, even during timetravel */
+ if (IsSystemRelation(relation))
+ SuspendDecodingSnapshots();
+
/*
* Read the pg_class row
*
@@ -1796,6 +1808,9 @@ RelationReloadIndexInfo(Relation relation)
/* Okay, now it's valid again */
relation->rd_isvalid = true;
+
+ if (IsSystemRelation(relation))
+ UnSuspendDecodingSnapshots();
}
/*
@@ -1977,6 +1992,10 @@ RelationClearRelation(Relation relation, bool rebuild)
bool keep_tupdesc;
bool keep_rules;
+ /* up2date system relations, even during timetravel */
+ if (IsSystemRelation(relation))
+ SuspendDecodingSnapshots();
+
/* Build temporary entry, but don't link it into hashtable */
newrel = RelationBuildDesc(save_relid, false);
if (newrel == NULL)
@@ -2046,6 +2065,9 @@ RelationClearRelation(Relation relation, bool rebuild)
/* And now we can throw away the temporary entry */
RelationDestroyRelation(newrel);
+
+ if (IsSystemRelation(relation))
+ UnSuspendDecodingSnapshots();
}
}
@@ -3551,7 +3573,10 @@ RelationGetIndexList(Relation relation)
Form_pg_attribute attr;
/* internal column, like oid */
if (attno <= 0)
- continue;
+ {
+ found = false;
+ break;
+ }
attr = relation->rd_att->attrs[attno - 1];
if (!attr->attnotnull)
@@ -3839,17 +3864,26 @@ RelationGetIndexPredicate(Relation relation)
* be bms_free'd when not needed anymore.
*/
Bitmapset *
-RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
+RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
{
Bitmapset *indexattrs;
- Bitmapset *uindexattrs;
+ Bitmapset *uindexattrs; /* unique keys */
+ Bitmapset *cindexattrs; /* best candidate key */
List *indexoidlist;
ListCell *l;
MemoryContext oldcxt;
/* Quick exit if we already computed the result. */
if (relation->rd_indexattr != NULL)
- return bms_copy(keyAttrs ? relation->rd_keyattr : relation->rd_indexattr);
+ switch(attrKind)
+ {
+ case INDEX_ATTR_BITMAP_CANDIDATE_KEY:
+ return bms_copy(relation->rd_ckeyattr);
+ case INDEX_ATTR_BITMAP_KEY:
+ return bms_copy(relation->rd_keyattr);
+ case INDEX_ATTR_BITMAP_ALL:
+ return bms_copy(relation->rd_indexattr);
+ }
/* Fast path if definitely no indexes */
if (!RelationGetForm(relation)->relhasindex)
@@ -3876,13 +3910,16 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
*/
indexattrs = NULL;
uindexattrs = NULL;
+ cindexattrs = NULL;
foreach(l, indexoidlist)
{
Oid indexOid = lfirst_oid(l);
Relation indexDesc;
IndexInfo *indexInfo;
int i;
- bool isKey;
+ bool isCKey;/* candidate or primary key */
+ bool isKey;/* key member */
+
indexDesc = index_open(indexOid, AccessShareLock);
@@ -3894,6 +3931,8 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
indexInfo->ii_Expressions == NIL &&
indexInfo->ii_Predicate == NIL;
+ isCKey = indexOid == relation->rd_primary;
+
/* Collect simple attribute references */
for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
{
@@ -3903,6 +3942,11 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
{
indexattrs = bms_add_member(indexattrs,
attrnum - FirstLowInvalidHeapAttributeNumber);
+
+ if (isCKey)
+ cindexattrs = bms_add_member(cindexattrs,
+ attrnum - FirstLowInvalidHeapAttributeNumber);
+
if (isKey)
uindexattrs = bms_add_member(uindexattrs,
attrnum - FirstLowInvalidHeapAttributeNumber);
@@ -3924,10 +3968,21 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
relation->rd_indexattr = bms_copy(indexattrs);
relation->rd_keyattr = bms_copy(uindexattrs);
+ relation->rd_ckeyattr = bms_copy(cindexattrs);
MemoryContextSwitchTo(oldcxt);
/* We return our original working copy for caller to play with */
- return keyAttrs ? uindexattrs : indexattrs;
+ switch(attrKind)
+ {
+ case INDEX_ATTR_BITMAP_CANDIDATE_KEY:
+ return cindexattrs;
+ case INDEX_ATTR_BITMAP_KEY:
+ return uindexattrs;
+ case INDEX_ATTR_BITMAP_ALL:
+ return indexattrs;
+ default:
+ elog(ERROR, "unknown attrKind %u", attrKind);
+ }
}
/*
@@ -4902,3 +4957,49 @@ unlink_initfile(const char *initfilename)
elog(LOG, "could not remove cache file \"%s\": %m", initfilename);
}
}
+
+bool
+RelationIsDoingTimetravelInternal(Relation relation)
+{
+ Assert(wal_level >= WAL_LEVEL_LOGICAL);
+
+ if (!RelationNeedsWAL(relation))
+ return false;
+
+ /*
+ * XXX: Doing this test instead of using IsSystemNamespace has the
+ * advantage of classifying a catalog relation's toast tables as a
+ * timetravel relation as well. This is safe since even a oid wraparound
+ * will preserve this property (c.f. GetNewObjectId()).
+ */
+ if (IsSystemRelation(relation))
+ return true;
+
+ /*
+ * Also log relevant data if we want the table to behave as a catalog
+ * table, although its not a system provided one.
+ * XXX: we need to make sure both the relation and its toast relation have
+ * the flag set!
+ */
+ if (RelationIsTreatedAsCatalogTable(relation))
+ return true;
+
+ return false;
+}
+
+bool
+RelationIsLogicallyLoggedInternal(Relation relation)
+{
+ Assert(wal_level >= WAL_LEVEL_LOGICAL);
+ if (!RelationNeedsWAL(relation))
+ return false;
+ /*
+ * XXX: In addition to the above comment, we could decide to always log
+ * data even for real system catalogs, although the benefits of that seem
+ * unclear.
+ */
+ if (IsSystemRelation(relation))
+ return false;
+
+ return true;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3107f9c..4a81018 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -57,6 +57,7 @@
#include "postmaster/postmaster.h"
#include "postmaster/syslogger.h"
#include "postmaster/walwriter.h"
+#include "replication/logical.h"
#include "replication/syncrep.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
@@ -2072,6 +2073,17 @@ static struct config_int ConfigureNamesInt[] =
},
{
+ /* see max_connections */
+ {"max_logical_slots", PGC_POSTMASTER, REPLICATION_SENDING,
+ gettext_noop("Sets the maximum number of simultaneously defined WAL decoding slots."),
+ NULL
+ },
+ &max_logical_slots,
+ 0, 0, MAX_BACKENDS /*?*/,
+ NULL, NULL, NULL
+ },
+
+ {
{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
gettext_noop("Sets the maximum time to wait for WAL replication."),
NULL,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d69a02b..b04291c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -161,7 +161,7 @@
# - Settings -
-#wal_level = minimal # minimal, archive, or hot_standby
+#wal_level = minimal # minimal, archive, logical or hot_standby
# (change requires restart)
#fsync = on # turns forced synchronization on or off
#synchronous_commit = on # synchronization level;
@@ -208,11 +208,18 @@
# Set these on the master and on any standby that will send replication data.
-#max_wal_senders = 0 # max number of walsender processes
+#max_wal_senders = 0 # max number of walsender processes, including
+ # both physical and logical replication senders.
# (change requires restart)
#wal_keep_segments = 0 # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s # in milliseconds; 0 disables
+#max_logical_slots = 0 # max number of logical replication sender
+ # and receiver processes. Logical senders
+ # (but not receivers) also consume a
+ # max_wal_senders slot.
+ # (change requires restart)
+
# - Master Server -
# These settings are ignored on a standby server.
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 584d70c..f63bafa 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -69,7 +69,7 @@
*/
static SnapshotData CurrentSnapshotData = {HeapTupleSatisfiesMVCC};
static SnapshotData SecondarySnapshotData = {HeapTupleSatisfiesMVCC};
-static SnapshotData CatalogSnapshotData = {HeapTupleSatisfiesMVCC};
+SnapshotData CatalogSnapshotData = {HeapTupleSatisfiesMVCC};
/* Pointers to valid snapshots */
static Snapshot CurrentSnapshot = NULL;
@@ -86,13 +86,14 @@ static bool CatalogSnapshotStale = true;
* for the convenience of TransactionIdIsInProgress: even in bootstrap
* mode, we don't want it to say that BootstrapTransactionId is in progress.
*
- * RecentGlobalXmin is initialized to InvalidTransactionId, to ensure that no
+ * RecentGlobal(Data)?Xmin is initialized to InvalidTransactionId, to ensure that no
* one tries to use a stale value. Readers should ensure that it has been set
* to something else before using it.
*/
TransactionId TransactionXmin = FirstNormalTransactionId;
TransactionId RecentXmin = FirstNormalTransactionId;
TransactionId RecentGlobalXmin = InvalidTransactionId;
+TransactionId RecentGlobalDataXmin = InvalidTransactionId;
/*
* Elements of the active snapshot stack.
@@ -796,7 +797,7 @@ AtEOXact_Snapshot(bool isCommit)
* Returns the token (the file name) that can be used to import this
* snapshot.
*/
-static char *
+char *
ExportSnapshot(Snapshot snapshot)
{
TransactionId topXid;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index ed66c49..28ce805 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -62,6 +62,8 @@
#include "access/xact.h"
#include "storage/bufmgr.h"
#include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/combocid.h"
#include "utils/tqual.h"
@@ -70,9 +72,17 @@ SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};
+static Snapshot TimetravelSnapshot;
+/* (table, ctid) => (cmin, cmax) mapping during timetravel */
+static HTAB *tuplecid_data = NULL;
+static int timetravel_suspended = 0;
+
+
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
-
+static bool FailsSatisfies(HeapTuple htup, Snapshot snapshot, Buffer buffer);
+static bool RedirectSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
+ Buffer buffer);
/*
* SetHintBits()
@@ -1490,3 +1500,261 @@ HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
*/
return true;
}
+
+/*
+ * check whether the transaciont id 'xid' in in the pre-sorted array 'xip'.
+ */
+static bool
+TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
+{
+ return bsearch(&xid, xip, num,
+ sizeof(TransactionId), xidComparator) != NULL;
+}
+
+/*
+ * See the comments for HeapTupleSatisfiesMVCC for the semantics this function
+ * obeys.
+ *
+ * Only usable on tuples from catalog tables!
+ *
+ * We don't need to support HEAP_MOVED_(IN|OFF) for now because we only support
+ * reading catalog pages which couldn't have been created in an older version.
+ *
+ * We don't set any hint bits in here as it seems unlikely to be beneficial as
+ * those should already be set by normal access and it seems to be too
+ * dangerous to do so as the semantics of doing so during timetravel are more
+ * complicated than when dealing "only" with the present.
+ */
+bool
+HeapTupleSatisfiesMVCCDuringDecoding(HeapTuple htup, Snapshot snapshot,
+ Buffer buffer)
+{
+ HeapTupleHeader tuple = htup->t_data;
+ TransactionId xmin = HeapTupleHeaderGetXmin(tuple);
+ TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
+
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
+ /* inserting transaction aborted */
+ if (tuple->t_infomask & HEAP_XMIN_INVALID)
+ {
+ Assert(!TransactionIdDidCommit(xmin));
+ return false;
+ }
+ /* check if its one of our txids, toplevel is also in there */
+ else if (TransactionIdInArray(xmin, snapshot->subxip, snapshot->subxcnt))
+ {
+ CommandId cmin = HeapTupleHeaderGetRawCommandId(tuple);
+ CommandId cmax = InvalidCommandId;
+
+ /*
+ * If another transaction deleted this tuple or if cmin/cmax is stored
+ * in a combocid we need to to lookup the actual values externally. We
+ * need to do so in the deleted case because the deletion will have
+ * overwritten the cmin value when setting cmax (c.f. combocid.c).
+ */
+ if ((!(tuple->t_infomask & HEAP_XMAX_INVALID) &&
+ !TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt)) ||
+ tuple->t_infomask & HEAP_COMBOCID
+ )
+ {
+ bool resolved;
+
+ resolved = ResolveCminCmaxDuringDecoding(tuplecid_data, htup,
+ buffer, &cmin, &cmax);
+
+ if (!resolved)
+ elog(ERROR, "could not resolve cmin/cmax of catalog tuple");
+ }
+
+ Assert(cmin != InvalidCommandId);
+
+ if (cmin >= snapshot->curcid)
+ return false; /* inserted after scan started */
+ }
+ /* committed before our xmin horizon. Do a normal visibility check. */
+ else if (TransactionIdPrecedes(xmin, snapshot->xmin))
+ {
+ Assert(!(tuple->t_infomask & HEAP_XMIN_COMMITTED &&
+ !TransactionIdDidCommit(xmin)));
+
+ /* check for hint bit first, consult clog afterwards */
+ if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED) &&
+ !TransactionIdDidCommit(xmin))
+ return false;
+ }
+ /* beyond our xmax horizon, i.e. invisible */
+ else if (TransactionIdFollowsOrEquals(xmin, snapshot->xmax))
+ {
+ return false;
+ }
+ /* check if it's a committed transaction in [xmin, xmax) */
+ else if(TransactionIdInArray(xmin, snapshot->xip, snapshot->xcnt))
+ {
+ }
+ /*
+ * none of the above, i.e. between [xmin, xmax) but hasn't
+ * committed. I.e. invisible.
+ */
+ else
+ {
+ return false;
+ }
+
+ /* at this point we know xmin is visible, go on to check xmax */
+
+ /* why should those be in catalog tables? */
+ Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
+
+ /* xid invalid or aborted */
+ if (tuple->t_infomask & HEAP_XMAX_INVALID)
+ return true;
+ /* locked tuples are always visible */
+ else if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+ return true;
+ /* check if its one of our txids, toplevel is also in there */
+ else if (TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt))
+ {
+ CommandId cmin;
+ CommandId cmax = HeapTupleHeaderGetRawCommandId(tuple);
+
+ /* Lookup actual cmin/cmax values */
+ if (tuple->t_infomask & HEAP_COMBOCID)
+ {
+ bool resolved;
+
+ resolved = ResolveCminCmaxDuringDecoding(tuplecid_data, htup,
+ buffer, &cmin, &cmax);
+
+ if (!resolved)
+ elog(ERROR, "could not resolve combocid to cmax");
+ }
+
+ Assert(cmax != InvalidCommandId);
+
+ if (cmax >= snapshot->curcid)
+ return true; /* deleted after scan started */
+ else
+ return false; /* deleted before scan started */
+ }
+ /* below xmin horizon, normal transaction state is valid */
+ else if (TransactionIdPrecedes(xmax, snapshot->xmin))
+ {
+ Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED &&
+ !TransactionIdDidCommit(xmax)));
+
+ /* check hint bit first */
+ if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
+ return false;
+
+ /* check clog */
+ return !TransactionIdDidCommit(xmax);
+ }
+ /* above xmax horizon, we cannot possibly see the deleting transaction */
+ else if (TransactionIdFollowsOrEquals(xmax, snapshot->xmax))
+ return true;
+ /* xmax is between [xmin, xmax), check known committed array */
+ else if (TransactionIdInArray(xmax, snapshot->xip, snapshot->xcnt))
+ return false;
+ /* xmax is between [xmin, xmax), but known not to have committed yet */
+ else
+ return true;
+}
+
+/*
+ * Setup a snapshot that replaces normal catalog snapshots that allows catalog
+ * access to behave just like it did at a certain point in the past.
+ *
+ * Needed for after-the-fact WAL decoding.
+ */
+void
+SetupDecodingSnapshots(Snapshot timetravel_snapshot, HTAB *tuplecids)
+{
+ /* prevent recursively setting up decoding snapshots */
+ Assert(CatalogSnapshotData.satisfies != RedirectSatisfiesMVCC);
+
+ CatalogSnapshotData.satisfies = RedirectSatisfiesMVCC;
+ /* make sure normal snapshots aren't used*/
+ SnapshotSelfData.satisfies = FailsSatisfies;
+ SnapshotAnyData.satisfies = FailsSatisfies;
+ SnapshotToastData.satisfies = FailsSatisfies;
+ /* don't overwrite SnapshotToastData, we want that to behave normally */
+
+ /* setup the timetravel snapshot */
+ TimetravelSnapshot = timetravel_snapshot;
+
+ /* setup (cmin, cmax) lookup hash */
+ tuplecid_data = tuplecids;
+
+ timetravel_suspended = 0;
+}
+
+
+/*
+ * Make catalog snapshots behave normally again.
+ */
+void
+RevertFromDecodingSnapshots(void)
+{
+ Assert(timetravel_suspended == 0);
+
+ TimetravelSnapshot = NULL;
+ tuplecid_data = NULL;
+
+ /* rally to restore sanity and/or boredom */
+ CatalogSnapshotData.satisfies = HeapTupleSatisfiesMVCC;
+ SnapshotSelfData.satisfies = HeapTupleSatisfiesSelf;
+ SnapshotAnyData.satisfies = HeapTupleSatisfiesAny;
+ SnapshotToastData.satisfies = HeapTupleSatisfiesToast;
+ timetravel_suspended = 0;
+}
+
+/*
+ * Disable catalog snapshot timetravel and perform old-fashioned access but
+ * make re-enabling cheap.. This is useful for accessing catalog entries which
+ * must stay up2date like the pg_class entries of system relations.
+ *
+ * Can be called several times in a nested fashion since several of it's
+ * callers suspend timetravel access on several code levels.
+ */
+void
+SuspendDecodingSnapshots(void)
+{
+ timetravel_suspended++;
+}
+
+/*
+ * Enable timetravel again, After SuspendDecodingSnapshots it.
+ */
+void
+UnSuspendDecodingSnapshots(void)
+{
+ Assert(timetravel_suspended > 0);
+ timetravel_suspended--;
+}
+
+/*
+ * Error out if a normal snapshot is used. That is neither legal nor expected
+ * during timetravel, so this is just extra assurance.
+ */
+static bool
+FailsSatisfies(HeapTuple htup, Snapshot snapshot, Buffer buffer)
+{
+ elog(ERROR, "Normal snapshots cannot be used during timetravel access.");
+ return false;
+}
+
+
+/*
+ * Call the replacement SatisifiesMVCC with the required Snapshot data.
+ */
+static bool
+RedirectSatisfiesMVCC(HeapTuple htup, Snapshot snapshot, Buffer buffer)
+{
+ Assert(TimetravelSnapshot != NULL);
+ if (timetravel_suspended > 0)
+ return HeapTupleSatisfiesMVCC(htup, snapshot, buffer);
+ return HeapTupleSatisfiesMVCCDuringDecoding(htup, TimetravelSnapshot,
+ buffer);
+}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f66f530..a887035 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -193,7 +193,9 @@ const char *subdirs[] = {
"base/1",
"pg_tblspc",
"pg_stat",
- "pg_stat_tmp"
+ "pg_stat_tmp",
+ "pg_llog",
+ "pg_llog/snapshots"
};
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index fde483a..8c6cf24 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -77,6 +77,8 @@ wal_level_str(WalLevel wal_level)
return "archive";
case WAL_LEVEL_HOT_STANDBY:
return "hot_standby";
+ case WAL_LEVEL_LOGICAL:
+ return "logical";
}
return _("unrecognized wal_level");
}
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 4381778..42f3e6b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -55,6 +55,18 @@
#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
+#define XLOG_HEAP2_NEW_CID 0x70
+
+/*
+ * xl_heap_* ->flag values
+ */
+/* PD_ALL_VISIBLE was cleared */
+#define XLOG_HEAP_ALL_VISIBLE_CLEARED (1<<0)
+/* PD_ALL_VISIBLE was cleared in the 2nd page */
+#define XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED (1<<1)
+#define XLOG_HEAP_CONTAINS_OLD_TUPLE (1<<2)
+#define XLOG_HEAP_CONTAINS_OLD_KEY (1<<3)
+#define XLOG_HEAP_CONTAINS_NEW_TUPLE (1<<4)
/*
* All what we need to find changed tuple
@@ -78,10 +90,10 @@ typedef struct xl_heap_delete
xl_heaptid target; /* deleted tuple id */
TransactionId xmax; /* xmax of the deleted tuple */
uint8 infobits_set; /* infomask bits */
- bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
+ uint8 flags;
} xl_heap_delete;
-#define SizeOfHeapDelete (offsetof(xl_heap_delete, all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapDelete (offsetof(xl_heap_delete, flags) + sizeof(uint8))
/*
* We don't store the whole fixed part (HeapTupleHeaderData) of an inserted
@@ -100,15 +112,23 @@ typedef struct xl_heap_header
#define SizeOfHeapHeader (offsetof(xl_heap_header, t_hoff) + sizeof(uint8))
+typedef struct xl_heap_header_len
+{
+ uint16 t_len;
+ xl_heap_header header;
+} xl_heap_header_len;
+
+#define SizeOfHeapHeaderLen (offsetof(xl_heap_header_len, header) + SizeOfHeapHeader)
+
/* This is what we need to know about insert */
typedef struct xl_heap_insert
{
xl_heaptid target; /* inserted tuple id */
- bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
+ uint8 flags;
/* xl_heap_header & TUPLE DATA FOLLOWS AT END OF STRUCT */
} xl_heap_insert;
-#define SizeOfHeapInsert (offsetof(xl_heap_insert, all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapInsert (offsetof(xl_heap_insert, flags) + sizeof(uint8))
/*
* This is what we need to know about a multi-insert. The record consists of
@@ -120,7 +140,7 @@ typedef struct xl_heap_multi_insert
{
RelFileNode node;
BlockNumber blkno;
- bool all_visible_cleared;
+ uint8 flags;
uint16 ntuples;
OffsetNumber offsets[1];
@@ -147,13 +167,12 @@ typedef struct xl_heap_update
TransactionId old_xmax; /* xmax of the old tuple */
TransactionId new_xmax; /* xmax of the new tuple */
ItemPointerData newtid; /* new inserted tuple id */
- uint8 old_infobits_set; /* infomask bits to set on old tuple */
- bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
- bool new_all_visible_cleared; /* same for the page of newtid */
+ uint8 old_infobits_set; /* infomask bits to set on old tuple */
+ uint8 flags;
/* NEW TUPLE xl_heap_header AND TUPLE DATA FOLLOWS AT END OF STRUCT */
} xl_heap_update;
-#define SizeOfHeapUpdate (offsetof(xl_heap_update, new_all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapUpdate (offsetof(xl_heap_update, flags) + sizeof(uint8))
/*
* This is what we need to know about vacuum page cleanup/redirect
@@ -261,6 +280,28 @@ typedef struct xl_heap_visible
#define SizeOfHeapVisible (offsetof(xl_heap_visible, cutoff_xid) + sizeof(TransactionId))
+typedef struct xl_heap_new_cid
+{
+ /*
+ * store toplevel xid so we don't have to merge cids from different
+ * transactions
+ */
+ TransactionId top_xid;
+ CommandId cmin;
+ CommandId cmax;
+ /*
+ * don't really need the combocid but the padding makes it free and its
+ * useful for debugging.
+ */
+ CommandId combocid;
+ /*
+ * Store the relfilenode/ctid pair to facilitate lookups.
+ */
+ xl_heaptid target;
+} xl_heap_new_cid;
+
+#define SizeOfHeapNewCid (offsetof(xl_heap_new_cid, target) + SizeOfHeapTid)
+
extern void HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
TransactionId *latestRemovedXid);
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 23a41fd..8452ec5 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -63,6 +63,11 @@
(AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
(int32) ((id1) - (id2)) < 0)
+/* compare two XIDs already known to be normal; this is a macro for speed */
+#define NormalTransactionIdFollows(id1, id2) \
+ (AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
+ (int32) ((id1) - (id2)) > 0)
+
/* ----------
* Object ID (OID) zero is InvalidOid.
*
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 835f6ac..96502ce 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -215,6 +215,7 @@ extern TransactionId GetCurrentTransactionId(void);
extern TransactionId GetCurrentTransactionIdIfAny(void);
extern TransactionId GetStableLatestTransactionId(void);
extern SubTransactionId GetCurrentSubTransactionId(void);
+extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 002862c..7415a26 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -197,7 +197,8 @@ typedef enum WalLevel
{
WAL_LEVEL_MINIMAL = 0,
WAL_LEVEL_ARCHIVE,
- WAL_LEVEL_HOT_STANDBY
+ WAL_LEVEL_HOT_STANDBY,
+ WAL_LEVEL_LOGICAL
} WalLevel;
extern int wal_level;
@@ -210,9 +211,12 @@ extern int wal_level;
*/
#define XLogIsNeeded() (wal_level >= WAL_LEVEL_ARCHIVE)
-/* Do we need to WAL-log information required only for Hot Standby? */
+/* Do we need to WAL-log information required only for Hot Standby and logical replication? */
#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_HOT_STANDBY)
+/* Do we need to WAL-log information required only for logical replication? */
+#define XLogLogicalInfoActive() (wal_level >= WAL_LEVEL_LOGICAL)
+
#ifdef WAL_DEBUG
extern bool XLOG_DEBUG;
#endif
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 3829ce2..fdc8cc2 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -19,6 +19,7 @@
#ifndef XLOGREADER_H
#define XLOGREADER_H
+#include "access/xlog.h"
#include "access/xlog_internal.h"
typedef struct XLogReaderState XLogReaderState;
@@ -108,10 +109,20 @@ struct XLogReaderState
char *errormsg_buf;
};
-/* Get a new XLogReader */
+
extern XLogReaderState *XLogReaderAllocate(XLogPageReadCB pagereadfunc,
void *private_data);
+
+typedef struct XLogRecordBuffer
+{
+ XLogRecPtr origptr;
+ XLogRecPtr endptr;
+ XLogRecord record;
+ char *record_data;
+} XLogRecordBuffer;
+
+
/* Free an XLogReader */
extern void XLogReaderFree(XLogReaderState *state);
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 44b6f38..a96ed69 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -23,6 +23,7 @@ extern ForkNumber forkname_to_number(char *forkName);
extern char *GetDatabasePath(Oid dbNode, Oid spcNode);
+extern bool IsSystemRelationId(Oid relid);
extern bool IsSystemRelation(Relation relation);
extern bool IsToastRelation(Relation relation);
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index f03dd0b..cf9c143 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2621,6 +2621,8 @@ DATA(insert OID = 2022 ( pg_stat_get_activity PGNSP PGUID 12 1 100 0 0 f f f
DESCR("statistics: information about currently active backends");
DATA(insert OID = 3099 ( pg_stat_get_wal_senders PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 "" "{23,25,25,25,25,25,23,25}" "{o,o,o,o,o,o,o,o}" "{pid,state,sent_location,write_location,flush_location,replay_location,sync_priority,sync_state}" _null_ pg_stat_get_wal_senders _null_ _null_ _null_ ));
DESCR("statistics: information about currently active replication");
+DATA(insert OID = 3457 ( pg_stat_get_logical_decoding_slots PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 "" "{25,25,26,16,28,25}" "{o,o,o,o,o,o}" "{slot_name,plugin,database,active,xmin,restart_decoding_lsn}" _null_ pg_stat_get_logical_decoding_slots _null_ _null_ _null_ ));
+DESCR("statistics: information about logical replication slots currently in use");
DATA(insert OID = 2026 ( pg_backend_pid PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 23 "" _null_ _null_ _null_ _null_ pg_backend_pid _null_ _null_ _null_ ));
DESCR("statistics: current backend PID");
DATA(insert OID = 1937 ( pg_stat_get_backend_pid PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 23 "23" _null_ _null_ _null_ _null_ pg_stat_get_backend_pid _null_ _null_ _null_ ));
@@ -4725,6 +4727,10 @@ DESCR("SP-GiST support for quad tree over range");
DATA(insert OID = 3473 ( spg_range_quad_leaf_consistent PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "2281 2281" _null_ _null_ _null_ _null_ spg_range_quad_leaf_consistent _null_ _null_ _null_ ));
DESCR("SP-GiST support for quad tree over range");
+DATA(insert OID = 3779 ( init_logical_replication PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2249 "19 19" "{19,19,25,25}" "{i,i,o,o}" "{slotname,plugin,slotname,xlog_position}" _null_ init_logical_replication _null_ _null_ _null_ ));
+DESCR("set up a logical replication slot");
+DATA(insert OID = 3780 ( stop_logical_replication PGNSP PGUID 12 1 0 0 0 f f f f f f v 1 0 23 "19" _null_ _null_ _null_ _null_ stop_logical_replication _null_ _null_ _null_ ));
+DESCR("stop logical replication");
/* event triggers */
DATA(insert OID = 3566 ( pg_event_trigger_dropped_objects PGNSP PGUID 12 10 100 0 0 f f f f t t s 0 0 2249 "" "{26,26,23,25,25,25,25}" "{o,o,o,o,o,o,o}" "{classid, objid, objsubid, object_type, schema_name, object_name, object_identity}" _null_ pg_event_trigger_dropped_objects _null_ _null_ _null_ ));
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 08bec25..66b8263 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -156,7 +156,7 @@ extern void vac_update_relstats(Relation relation,
TransactionId frozenxid,
MultiXactId minmulti);
extern void vacuum_set_xid_limits(int freeze_min_age, int freeze_table_age,
- bool sharedRel,
+ bool sharedRel, bool catalogRel,
TransactionId *oldestXmin,
TransactionId *freezeLimit,
TransactionId *freezeTableLimit,
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 78368c6..360f98c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -409,6 +409,9 @@ typedef enum NodeTag
T_IdentifySystemCmd,
T_BaseBackupCmd,
T_StartReplicationCmd,
+ T_InitLogicalReplicationCmd,
+ T_StartLogicalReplicationCmd,
+ T_FreeLogicalReplicationCmd,
T_TimeLineHistoryCmd,
/*
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 85b4544..3da8d40 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -52,6 +52,41 @@ typedef struct StartReplicationCmd
/* ----------------------
+ * INIT_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct InitLogicalReplicationCmd
+{
+ NodeTag type;
+ char *name;
+ char *plugin;
+} InitLogicalReplicationCmd;
+
+
+/* ----------------------
+ * START_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct StartLogicalReplicationCmd
+{
+ NodeTag type;
+ char *name;
+ XLogRecPtr startpoint;
+ List *options;
+} StartLogicalReplicationCmd;
+
+/* ----------------------
+ * FREE_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct FreeLogicalReplicationCmd
+{
+ NodeTag type;
+ char *name;
+} FreeLogicalReplicationCmd;
+
+
+/* ----------------------
* TIMELINE_HISTORY command
* ----------------------
*/
diff --git a/src/include/replication/decode.h b/src/include/replication/decode.h
new file mode 100644
index 0000000..dd3f2ca
--- /dev/null
+++ b/src/include/replication/decode.h
@@ -0,0 +1,20 @@
+/*-------------------------------------------------------------------------
+ * decode.h
+ * PostgreSQL WAL to logical transformation
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DECODE_H
+#define DECODE_H
+
+#include "access/xlogreader.h"
+#include "replication/reorderbuffer.h"
+#include "replication/logical.h"
+
+void DecodeRecordIntoReorderBuffer(LogicalDecodingContext *ctx,
+ XLogRecordBuffer *buf);
+
+#endif
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
new file mode 100644
index 0000000..971180b
--- /dev/null
+++ b/src/include/replication/logical.h
@@ -0,0 +1,198 @@
+/*-------------------------------------------------------------------------
+ * logical.h
+ * PostgreSQL WAL to logical transformation
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LOGICAL_H
+#define LOGICAL_H
+
+#include "access/xlog.h"
+#include "access/xlogreader.h"
+#include "replication/output_plugin.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+
+/*
+ * Shared memory state of a single logical decoding slot
+ */
+typedef struct LogicalDecodingSlot
+{
+ /* lock, on same cacheline as effective_xmin */
+ slock_t mutex;
+
+ /* on-disk xmin, updated first */
+ TransactionId xmin;
+
+ /* in-memory xmin, updated after syncing to disk */
+ TransactionId effective_xmin;
+
+ /* is this slot defined */
+ bool in_use;
+
+ /* is somebody streaming out changes for this slot */
+ bool active;
+
+ /* have we been aborted while ->active */
+ bool aborted;
+
+ /* ----
+ * If we shutdown, crash, whatever where do we have to restart decoding
+ * from to
+ * a) find a valid & ready snapshot
+ * b) the complete content for all in-progress xacts
+ * ----
+ */
+ XLogRecPtr restart_decoding;
+
+ /*
+ * Last location we know the client has confirmed to have safely received
+ * data to. No earlier data can be decoded after a restart/crash.
+ */
+ XLogRecPtr confirmed_flush;
+
+ /* ----
+ * When the client has confirmed flushes >= candidate_xmin_after we can
+ * a) advance the pegged xmin
+ * b) advance restart_decoding_from so we have to read/keep less WAL
+ * ----
+ */
+ XLogRecPtr candidate_lsn;
+ TransactionId candidate_xmin;
+ XLogRecPtr candidate_restart_decoding;
+
+ /* database the slot is active on */
+ Oid database;
+
+ /* slot identifier */
+ NameData name;
+
+ /* plugin name */
+ NameData plugin;
+} LogicalDecodingSlot;
+
+/*
+ * Shared memory control area for all of logical decoding
+ */
+typedef struct LogicalDecodingCtlData
+{
+ /*
+ * Xmin across all logical slots.
+ *
+ * Protected by ProcArrayLock.
+ */
+ TransactionId xmin;
+
+ LogicalDecodingSlot logical_slots[FLEXIBLE_ARRAY_MEMBER];
+} LogicalDecodingCtlData;
+
+/*
+ * Pointers to shared memory
+ */
+extern LogicalDecodingCtlData *LogicalDecodingCtl;
+extern LogicalDecodingSlot *MyLogicalDecodingSlot;
+
+struct LogicalDecodingContext;
+
+typedef void (*LogicalOutputPluginWriterWrite) (
+ struct LogicalDecodingContext *lr,
+ XLogRecPtr Ptr,
+ TransactionId xid
+);
+
+typedef LogicalOutputPluginWriterWrite LogicalOutputPluginWriterPrepareWrite;
+
+/*
+ * Output plugin callbacks
+ */
+typedef struct OutputPluginCallbacks
+{
+ LogicalDecodeInitCB init_cb;
+ LogicalDecodeBeginCB begin_cb;
+ LogicalDecodeChangeCB change_cb;
+ LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeCleanupCB cleanup_cb;
+} OutputPluginCallbacks;
+
+typedef struct LogicalDecodingContext
+{
+ struct XLogReaderState *reader;
+ struct LogicalDecodingSlot *slot;
+ struct ReorderBuffer *reorder;
+ struct SnapBuild *snapshot_builder;
+
+ struct OutputPluginCallbacks callbacks;
+
+ bool stop_after_consistent;
+
+ /*
+ * User specified options
+ */
+ List *output_plugin_options;
+
+ /*
+ * User-Provided callback for writing/streaming out data.
+ */
+ LogicalOutputPluginWriterPrepareWrite prepare_write;
+ LogicalOutputPluginWriterWrite write;
+
+ /*
+ * Output buffer.
+ */
+ StringInfo out;
+
+ /*
+ * Private data pointer for the creator of the logical decoding context.
+ */
+ void *owner_private;
+
+ /*
+ * Private data pointer of the output plugin.
+ */
+ void *output_plugin_private;
+
+ /*
+ * Private data pointer for the data writer.
+ */
+ void *output_writer_private;
+} LogicalDecodingContext;
+
+/* GUCs */
+extern PGDLLIMPORT int max_logical_slots;
+
+extern Size LogicalDecodingShmemSize(void);
+extern void LogicalDecodingShmemInit(void);
+
+extern void LogicalDecodingAcquireFreeSlot(const char *name, const char *plugin);
+extern void LogicalDecodingReleaseSlot(void);
+extern void LogicalDecodingReAcquireSlot(const char *name);
+extern void LogicalDecodingFreeSlot(const char *name);
+
+extern void ComputeLogicalXmin(void);
+
+/* change logical xmin */
+extern void IncreaseLogicalXminForSlot(XLogRecPtr lsn, TransactionId xmin);
+
+/* change recovery restart location */
+extern void IncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart_lsn);
+
+extern void LogicalConfirmReceivedLocation(XLogRecPtr lsn);
+
+extern void CheckLogicalReplicationRequirements(void);
+
+extern void StartupLogicalReplication(XLogRecPtr checkPointRedo);
+
+extern LogicalDecodingContext *CreateLogicalDecodingContext(
+ LogicalDecodingSlot *slot,
+ bool is_init,
+ XLogRecPtr start_lsn,
+ List *output_plugin_options,
+ XLogPageReadCB read_page,
+ LogicalOutputPluginWriterPrepareWrite prepare_write,
+ LogicalOutputPluginWriterWrite do_write);
+extern bool LogicalDecodingContextReady(LogicalDecodingContext *ctx);
+extern void FreeLogicalDecodingContext(LogicalDecodingContext *ctx);
+
+#endif
diff --git a/src/include/replication/logicalfuncs.h b/src/include/replication/logicalfuncs.h
new file mode 100644
index 0000000..d6fd19c
--- /dev/null
+++ b/src/include/replication/logicalfuncs.h
@@ -0,0 +1,21 @@
+/*-------------------------------------------------------------------------
+ * logicalfuncs.h
+ * PostgreSQL WAL to logical transformation support functions
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LOGICALFUNCS_H
+#define LOGICALFUNCS_H
+
+#include "replication/logical.h"
+
+extern int logical_read_local_xlog_page(XLogReaderState *state,
+ XLogRecPtr targetPagePtr,
+ int reqLen, XLogRecPtr targetRecPtr,
+ char *cur_page, TimeLineID *pageTLI);
+
+extern Datum pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS);
+
+#endif
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
new file mode 100644
index 0000000..a9fcc2d
--- /dev/null
+++ b/src/include/replication/output_plugin.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ * output_plugin.h
+ * PostgreSQL Logical Decode Plugin Interface
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OUTPUT_PLUGIN_H
+#define OUTPUT_PLUGIN_H
+
+#include "replication/reorderbuffer.h"
+
+struct LogicalDecodingContext;
+
+/*
+ * Callback that gets called in a user-defined plugin. ctx->private_data can
+ * be set to some private data.
+ *
+ * "is_init" will be set to "true" if the decoding slot just got defined. When
+ * the same slot is used from there one, it will be "false".
+ *
+ * Gets looked up via the library symbol pg_decode_init.
+ */
+typedef void (*LogicalDecodeInitCB) (
+ struct LogicalDecodingContext *ctx,
+ bool is_init
+);
+
+/*
+ * Callback called for every BEGIN of a successful transaction.
+ *
+ * Gets looked up via the library symbol pg_decode_begin_txn.
+ */
+typedef void (*LogicalDecodeBeginCB) (
+ struct LogicalDecodingContext *,
+ ReorderBufferTXN *txn);
+
+/*
+ * Callback for every individual change in a successful transaction.
+ *
+ * Gets looked up via the library symbol pg_decode_change.
+ */
+typedef void (*LogicalDecodeChangeCB) (
+ struct LogicalDecodingContext *,
+ ReorderBufferTXN *txn,
+ Relation relation,
+ ReorderBufferChange *change
+);
+
+/*
+ * Called for every COMMIT of a successful transaction.
+ *
+ * Gets looked up via the library symbol pg_decode_commit_txn.
+ */
+typedef void (*LogicalDecodeCommitCB) (
+ struct LogicalDecodingContext *,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called to cleanup the state of an output plugin.
+ *
+ * Gets looked up via the library symbol pg_decode_cleanup.
+ */
+typedef void (*LogicalDecodeCleanupCB) (
+ struct LogicalDecodingContext *
+);
+
+#endif /* OUTPUT_PLUGIN_H */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
new file mode 100644
index 0000000..7a4e046
--- /dev/null
+++ b/src/include/replication/reorderbuffer.h
@@ -0,0 +1,342 @@
+/*
+ * reorderbuffer.h
+ *
+ * PostgreSQL logical replay buffer management
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * src/include/replication/reorderbuffer.h
+ */
+#ifndef REORDERBUFFER_H
+#define REORDERBUFFER_H
+
+#include "access/htup_details.h"
+#include "utils/hsearch.h"
+#include "utils/rel.h"
+
+#include "lib/ilist.h"
+
+#include "storage/sinval.h"
+
+#include "utils/snapshot.h"
+
+/* an individual tuple, stored in one chunk of memory */
+typedef struct ReorderBufferTupleBuf
+{
+ /* position in preallocated list */
+ slist_node node;
+
+ /* tuple, stored sequentially */
+ HeapTupleData tuple;
+ HeapTupleHeaderData header;
+ char data[MaxHeapTupleSize];
+} ReorderBufferTupleBuf;
+
+/* types of the change passed to a 'change' callback */
+enum ReorderBufferChangeType
+{
+ REORDER_BUFFER_CHANGE_INSERT,
+ REORDER_BUFFER_CHANGE_UPDATE,
+ REORDER_BUFFER_CHANGE_DELETE
+};
+
+/*
+ * a single 'change', can be an insert (with one tuple), an update (old, new),
+ * or a delete (old).
+ *
+ * The same struct is also used internally for other purposes but that should
+ * never be visible outside reorderbuffer.c.
+ */
+typedef struct ReorderBufferChange
+{
+ XLogRecPtr lsn;
+
+ /* type of change */
+ union
+ {
+ enum ReorderBufferChangeType action;
+ /* do not leak internal enum values to the outside */
+ int action_internal;
+ };
+
+ /*
+ * Context data for the change, which part of the union is valid depends
+ * on action/action_internal.
+ */
+ union
+ {
+ /* old, new tuples when action == *_INSERT|UPDATE|DELETE */
+ struct
+ {
+ /* relation that has been changed */
+ RelFileNode relnode;
+ /* valid for DELETE || UPDATE */
+ ReorderBufferTupleBuf *oldtuple;
+ /* valid for INSERT || UPDATE */
+ ReorderBufferTupleBuf *newtuple;
+ };
+
+ /* new snapshot */
+ Snapshot snapshot;
+
+ /* new command id for existing snapshot in a catalog changing tx */
+ CommandId command_id;
+
+ /* new cid mapping for catalog changing transaction */
+ struct
+ {
+ RelFileNode node;
+ ItemPointerData tid;
+ CommandId cmin;
+ CommandId cmax;
+ CommandId combocid;
+ } tuplecid;
+ };
+
+ /*
+ * While in use this is how a change is linked into a transactions,
+ * otherwise it's the preallocated list.
+ */
+ dlist_node node;
+} ReorderBufferChange;
+
+typedef struct ReorderBufferTXN
+{
+ /*
+ * The transactions transaction id, can be a toplevel or sub xid.
+ */
+ TransactionId xid;
+
+ /*
+ * LSN of the first data carrying, WAL record with knowledge about this
+ * xid. This is allowed to *not* be first record adorned with this xid, if
+ * the previous records aren't relevant for logical decoding.
+ */
+ XLogRecPtr first_lsn;
+
+ /* ----
+ * LSN of the record that lead to this xact to be committed or
+ * aborted. This can be a
+ * * plain commit record
+ * * plain commit record, of a parent transaction
+ * * prepared transaction commit
+ * * plain abort record
+ * * prepared transaction abort
+ * * error during decoding
+ * ----
+ */
+ XLogRecPtr final_lsn;
+
+ /*
+ * LSN pointing to the end of the commit record + 1.
+ */
+ XLogRecPtr end_lsn;
+
+ /*
+ * LSN of the last lsn at which snapshot information reside, so we can
+ * restart decoding from there and fully recover this transaction from
+ * WAL.
+ */
+ XLogRecPtr restart_decoding_lsn;
+
+ /*
+ * Base snapshot or NULL.
+ */
+ Snapshot base_snapshot;
+
+ /* did the TX have catalog changes */
+ bool does_timetravel;
+
+ /*
+ * Do we know this is a subxact?
+ */
+ bool is_known_as_subxact;
+
+ /*
+ * How many ReorderBufferChange's do we have in this txn.
+ *
+ * Changes in subtransactions are *not* included but tracked separately.
+ */
+ Size nentries;
+
+ /*
+ * How many of the above entries are stored in memory in contrast to being
+ * spilled to disk.
+ */
+ Size nentries_mem;
+
+ /*
+ * List of ReorderBufferChange structs, including new Snapshots and new
+ * CommandIds
+ */
+ dlist_head changes;
+
+ /*
+ * List of (relation, ctid) => (cmin, cmax) mappings for catalog tuples.
+ * Those are always assigned to the toplevel transaction. (Keep track of
+ * #entries to create a hash of the right size)
+ */
+ dlist_head tuplecids;
+ size_t ntuplecids;
+
+ /*
+ * On-demand built hash for looking up the above values.
+ */
+ HTAB *tuplecid_hash;
+
+ /*
+ * Hash containing (potentially partial) toast entries. NULL if no toast
+ * tuples have been found for the current change.
+ */
+ HTAB *toast_hash;
+
+ /*
+ * non-hierarchical list of subtransactions that are *not* aborted. Only
+ * used in toplevel transactions.
+ */
+ dlist_head subtxns;
+ size_t nsubtxns;
+
+ /* ---
+ * Position in one of three lists:
+ * * list of subtransactions if we are *known* to be subxact
+ * * list of toplevel xacts (can be a as-yet unknown subxact)
+ * * list of preallocated ReorderBufferTXNs
+ * ---
+ */
+ dlist_node node;
+
+ /*
+ * Stored cache invalidations. This is not a linked list because we get
+ * all the invalidations at once.
+ */
+ SharedInvalidationMessage *invalidations;
+ size_t ninvalidations;
+
+} ReorderBufferTXN;
+
+/* so we can define the callbacks used inside struct ReorderBuffer itself */
+typedef struct ReorderBuffer ReorderBuffer;
+
+/* change callback signature */
+typedef void (*ReorderBufferApplyChangeCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ Relation relation,
+ ReorderBufferChange *change);
+
+/* begin callback signature */
+typedef void (*ReorderBufferBeginCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn);
+
+/* commit callback signature */
+typedef void (*ReorderBufferCommitCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+struct ReorderBuffer
+{
+ /*
+ * xid => ReorderBufferTXN lookup table
+ */
+ HTAB *by_txn;
+
+ /*
+ * Transactions that could be a toplevel xact, ordered by LSN of the first
+ * record bearing that xid..
+ */
+ dlist_head toplevel_by_lsn;
+
+ /*
+ * one-entry sized cache for by_txn. Very frequently the same txn gets
+ * looked up over and over again.
+ */
+ TransactionId by_txn_last_xid;
+ ReorderBufferTXN *by_txn_last_txn;
+
+ /*
+ * Callacks to be called when a transactions commits.
+ */
+ ReorderBufferBeginCB begin;
+ ReorderBufferApplyChangeCB apply_change;
+ ReorderBufferCommitCB commit;
+
+ /*
+ * Pointer that will be passed untouched to the callbacks.
+ */
+ void *private_data;
+
+ /*
+ * Private memory context.
+ */
+ MemoryContext context;
+
+ /*
+ * Data structure slab cache.
+ *
+ * We allocate/deallocate some structures very frequently, to avoid bigger
+ * overhead we cache some unused ones here.
+ *
+ * The maximum number of cached entries is controlled by const variables
+ * ontop of reorderbuffer.c
+ */
+
+ /* cached ReorderBufferTXNs */
+ dlist_head cached_transactions;
+ Size nr_cached_transactions;
+
+ /* cached ReorderBufferChanges */
+ dlist_head cached_changes;
+ Size nr_cached_changes;
+
+ /* cached ReorderBufferTupleBufs */
+ slist_head cached_tuplebufs;
+ Size nr_cached_tuplebufs;
+
+ XLogRecPtr current_restart_decoding_lsn;
+
+ /* buffer for disk<->memory conversions */
+ char *outbuf;
+ Size outbufsize;
+};
+
+
+ReorderBuffer *ReorderBufferAllocate(void);
+void ReorderBufferFree(ReorderBuffer *);
+
+ReorderBufferTupleBuf *ReorderBufferGetTupleBuf(ReorderBuffer *);
+void ReorderBufferReturnTupleBuf(ReorderBuffer *, ReorderBufferTupleBuf *tuple);
+ReorderBufferChange *ReorderBufferGetChange(ReorderBuffer *);
+void ReorderBufferReturnChange(ReorderBuffer *, ReorderBufferChange *);
+
+void ReorderBufferQueueChange(ReorderBuffer *, TransactionId, XLogRecPtr lsn, ReorderBufferChange *);
+void ReorderBufferCommit(ReorderBuffer *, TransactionId,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
+void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
+void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
+void ReorderBufferAbort(ReorderBuffer *, TransactionId, XLogRecPtr lsn);
+
+void ReorderBufferSetBaseSnapshot(ReorderBuffer *, TransactionId, XLogRecPtr lsn, struct SnapshotData *snap);
+void ReorderBufferAddSnapshot(ReorderBuffer *, TransactionId, XLogRecPtr lsn, struct SnapshotData *snap);
+void ReorderBufferAddNewCommandId(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+ CommandId cid);
+void ReorderBufferAddNewTupleCids(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+ RelFileNode node, ItemPointerData pt,
+ CommandId cmin, CommandId cmax, CommandId combocid);
+void ReorderBufferAddInvalidations(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+ Size nmsgs, SharedInvalidationMessage *msgs);
+bool ReorderBufferIsXidKnown(ReorderBuffer *, TransactionId xid);
+void ReorderBufferXidSetTimetravel(ReorderBuffer *, TransactionId xid, XLogRecPtr lsn);
+bool ReorderBufferXidDoesTimetravel(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+
+ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
+
+void ReorderBufferSetRestartPoint(ReorderBuffer *, XLogRecPtr ptr);
+
+void ReorderBufferStartup(void);
+
+#endif
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
new file mode 100644
index 0000000..7a4a217
--- /dev/null
+++ b/src/include/replication/snapbuild.h
@@ -0,0 +1,81 @@
+/*-------------------------------------------------------------------------
+ *
+ * snapbuild.h
+ * Exports from replication/logical/snapbuild.c.
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * src/include/replication/snapbuild.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SNAPBUILD_H
+#define SNAPBUILD_H
+
+#include "access/xlogdefs.h"
+#include "utils/snapmgr.h"
+
+typedef enum
+{
+ /*
+ * Initial state, we can't do much yet.
+ */
+ SNAPBUILD_START,
+
+ /*
+ * We have collected enough information to decode tuples in transactions
+ * that started after this.
+ *
+ * Once we reached this we start to collect changes. We cannot apply them
+ * yet because the might be based on transactions that were still running
+ * when we reached them yet.
+ */
+ SNAPBUILD_FULL_SNAPSHOT,
+
+ /*
+ * Found a point after hitting built_full_snapshot where all transactions
+ * that were running at that point finished. Till we reach that we hold
+ * off calling any commit callbacks.
+ */
+ SNAPBUILD_CONSISTENT
+} SnapBuildState;
+
+/* forward declare so we don't have to expose the struct to the public */
+struct SnapBuild;
+typedef struct SnapBuild SnapBuild;
+
+/* forward declare so we don't have to include xlogreader.h */
+struct XLogRecordBuffer;
+struct ReorderBuffer;
+
+extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
+ TransactionId xmin_horizon, XLogRecPtr start_lsn);
+extern void FreeSnapshotBuilder(SnapBuild *cache);
+
+extern void SnapBuildSnapDecRefcount(Snapshot snap);
+
+extern const char *SnapBuildExportSnapshot(SnapBuild *snapstate);
+extern void SnapBuildClearExportedSnapshot(void);
+
+extern SnapBuildState SnapBuildCurrentState(SnapBuild *snapstate);
+
+extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+
+/* don't want to include heapam_xlog.h */
+struct xl_heap_new_cid;
+struct xl_running_xacts;
+
+extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
+ TransactionId xid, int nsubxacts,
+ TransactionId *subxacts);
+extern void SnapBuildAbortTxn(SnapBuild *builder, TransactionId xid,
+ int nsubxacts, TransactionId *subxacts);
+extern bool SnapBuildProcessChange(SnapBuild *builder, TransactionId xid,
+ XLogRecPtr lsn);
+extern void SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
+ XLogRecPtr lsn, struct xl_heap_new_cid *cid);
+extern void SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn,
+ struct xl_running_xacts *running);
+extern void SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn);
+
+#endif /* SNAPBUILD_H */
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 7eaa21b..daae320 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -66,6 +66,7 @@ typedef struct WalSnd
extern WalSnd *MyWalSnd;
+
/* There is one WalSndCtl struct for the whole database cluster */
typedef struct
{
@@ -93,7 +94,6 @@ typedef struct
extern WalSndCtlData *WalSndCtl;
-
extern void WalSndSetState(WalSndState state);
/*
@@ -108,4 +108,8 @@ extern void replication_scanner_finish(void);
extern Node *replication_parse_result;
+/* logical wal sender data gathering functions */
+extern XLogRecPtr WalSndWaitForWal(XLogRecPtr loc);
+
+
#endif /* _WALSENDER_PRIVATE_H */
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index e0eb184..75c56a9 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -116,6 +116,9 @@ typedef ItemPointerData *ItemPointer;
/*
* ItemPointerCopy
* Copies the contents of one disk item pointer to another.
+ *
+ * Should there ever be padding in an ItemPointer this would need to be handled
+ * differently as it's used as hash key.
*/
#define ItemPointerCopy(fromPointer, toPointer) \
( \
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 39415a3..a33d6cf 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -80,6 +80,7 @@ typedef enum LWLockId
OldSerXidLock,
SyncRepLock,
BackgroundWorkerLock,
+ LogicalReplicationCtlLock,
/* Individual lock IDs end here */
FirstBufMappingLock,
FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index c5f58b4..744317e 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -50,7 +50,7 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
extern bool TransactionIdIsActive(TransactionId xid);
-extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum);
+extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum, bool systable, bool alreadyLocked);
extern TransactionId GetOldestActiveTransactionId(void);
extern VirtualTransactionId *GetVirtualXIDsDelayingChkpt(int *nvxids);
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index 7e70e57..5448818 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -147,4 +147,6 @@ extern void ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs
int nmsgs, bool RelcacheInitFileInval,
Oid dbid, Oid tsid);
+extern void LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg);
+
#endif /* SINVAL_H */
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 6fd6e1e..5424912 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -64,4 +64,5 @@ extern void CacheRegisterRelcacheCallback(RelcacheCallbackFunction func,
extern void CallSyscacheCallbacks(int cacheid, uint32 hashvalue);
+extern void InvalidateSystemCaches(void);
#endif /* INVAL_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 0281b4b..6a4d2d5 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -104,6 +104,7 @@ typedef struct RelationData
List *rd_indexlist; /* list of OIDs of indexes on relation */
Bitmapset *rd_indexattr; /* identifies columns used in indexes */
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
+ Bitmapset *rd_ckeyattr; /* cols that are included ref'd by pkey */
Oid rd_oidindex; /* OID of unique index on OID, if any */
LockInfoData rd_lockInfo; /* lock mgr's info for locking relation */
RuleLock *rd_rules; /* rewrite rules */
@@ -221,6 +222,7 @@ typedef struct StdRdOptions
AutoVacOpts autovacuum; /* autovacuum-related options */
bool security_barrier; /* for views */
int check_option_offset; /* for views */
+ bool treat_as_catalog_table; /* treat as timetraveleable table */
} StdRdOptions;
#define HEAP_MIN_FILLFACTOR 10
@@ -290,6 +292,15 @@ typedef struct StdRdOptions
"cascaded") == 0 : false)
/*
+ * RelationIsTreatedAsCatalogTable
+ * Returns whether the relation should be treated as a catalog table
+ * from the pov of logical decoding.
+ */
+#define RelationIsTreatedAsCatalogTable(relation) \
+ ((relation)->rd_options ? \
+ ((StdRdOptions *) (relation)->rd_options)->treat_as_catalog_table : false)
+
+/*
* RelationIsValid
* True iff relation descriptor is valid.
*/
@@ -441,7 +452,6 @@ typedef struct StdRdOptions
((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP && \
!(relation)->rd_islocaltemp)
-
/*
* RelationIsScannable
* Currently can only be false for a materialized view which has not been
@@ -458,6 +468,24 @@ typedef struct StdRdOptions
*/
#define RelationIsPopulated(relation) ((relation)->rd_rel->relispopulated)
+/*
+ * RelationIsDoingTimetravel
+ * True if we need to log enough information to provide timetravel access
+ */
+#define RelationIsDoingTimetravel(relation) \
+ (wal_level >= WAL_LEVEL_LOGICAL && \
+ RelationIsDoingTimetravelInternal(relation))
+
+/*
+ * RelationIsLogicallyLogged
+ * True if we need to log enough information to provide timetravel access
+ */
+#define RelationIsLogicallyLogged(relation) \
+ (wal_level >= WAL_LEVEL_LOGICAL && \
+ RelationIsLogicallyLoggedInternal(relation))
+
+extern bool RelationIsDoingTimetravelInternal(Relation relation);
+extern bool RelationIsLogicallyLoggedInternal(Relation relation);
/* routines in utils/cache/relcache.c */
extern void RelationIncrementReferenceCount(Relation rel);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8ac2549..cfeded8 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -41,7 +41,16 @@ extern List *RelationGetIndexList(Relation relation);
extern Oid RelationGetOidIndex(Relation relation);
extern List *RelationGetIndexExpressions(Relation relation);
extern List *RelationGetIndexPredicate(Relation relation);
-extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs);
+
+typedef enum IndexAttrBitmapKind {
+ INDEX_ATTR_BITMAP_ALL,
+ INDEX_ATTR_BITMAP_KEY,
+ INDEX_ATTR_BITMAP_CANDIDATE_KEY
+} IndexAttrBitmapKind;
+
+extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
+ IndexAttrBitmapKind keyAttrs);
+
extern void RelationGetExclusionInfo(Relation indexRelation,
Oid **operators,
Oid **procs,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 81a286c..2187f58 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -23,6 +23,7 @@ extern bool FirstSnapshotSet;
extern TransactionId TransactionXmin;
extern TransactionId RecentXmin;
extern TransactionId RecentGlobalXmin;
+extern TransactionId RecentGlobalDataXmin;
extern Snapshot GetTransactionSnapshot(void);
extern Snapshot GetLatestSnapshot(void);
@@ -53,4 +54,6 @@ extern bool XactHasExportedSnapshots(void);
extern void DeleteAllExportedSnapshotFiles(void);
extern bool ThereAreNoPriorRegisteredSnapshots(void);
+extern char *ExportSnapshot(Snapshot snapshot);
+
#endif /* SNAPMGR_H */
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 19f56e4..cd3f880 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -22,6 +22,7 @@
extern PGDLLIMPORT SnapshotData SnapshotSelfData;
extern PGDLLIMPORT SnapshotData SnapshotAnyData;
extern PGDLLIMPORT SnapshotData SnapshotToastData;
+extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
#define SnapshotSelf (&SnapshotSelfData)
#define SnapshotAny (&SnapshotAnyData)
@@ -37,7 +38,8 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
/* This macro encodes the knowledge of which snapshots are MVCC-safe */
#define IsMVCCSnapshot(snapshot) \
- ((snapshot)->satisfies == HeapTupleSatisfiesMVCC)
+ ((snapshot)->satisfies == HeapTupleSatisfiesMVCC || \
+ (snapshot)->satisfies == HeapTupleSatisfiesMVCCDuringDecoding)
/*
* HeapTupleSatisfiesVisibility
@@ -86,4 +88,21 @@ extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid);
extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
+/* Support for catalog timetravel */
+struct HTAB;
+extern bool HeapTupleSatisfiesMVCCDuringDecoding(HeapTuple htup,
+ Snapshot snapshot, Buffer buffer);
+extern void SetupDecodingSnapshots(Snapshot snapshot_now, struct HTAB *tuplecids);
+extern void RevertFromDecodingSnapshots(void);
+extern void SuspendDecodingSnapshots(void);
+extern void UnSuspendDecodingSnapshots(void);
+
+/*
+ * To avoid leaking to much knowledge about reorderbuffer implementation
+ * details this is implemented in reorderbuffer.c not tqual.c.
+ */
+extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data, HeapTuple htup,
+ Buffer buffer,
+ CommandId *cmin, CommandId *cmax);
+
#endif /* TQUAL_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 8f24c51..d49e499 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1679,6 +1679,13 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, +
| pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock +
| FROM pg_database d;
+ pg_stat_logical_decoding | SELECT l.slot_name, +
+ | l.plugin, +
+ | l.database, +
+ | l.active, +
+ | l.xmin, +
+ | l.restart_decoding_lsn +
+ | FROM pg_stat_get_logical_decoding_slots() l(slot_name, plugin, database, active, xmin, restart_decoding_lsn);
pg_stat_replication | SELECT s.pid, +
| s.usesysid, +
| u.rolname AS usename, +
@@ -2142,7 +2149,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| FROM tv;
tvvmv | SELECT tvvm.grandtot +
| FROM tvvm;
-(64 rows)
+(65 rows)
SELECT tablename, rulename, definition FROM pg_rules
ORDER BY tablename, rulename;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b20eb0d..648caa0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -621,6 +621,7 @@ Form_pg_ts_template
Form_pg_type
Form_pg_user_mapping
FormatNode
+FreeLogicalReplicationCmd
FromCharDateMode
FromExpr
FuncCall
@@ -791,6 +792,7 @@ IdentifySystemCmd
IncrementVarSublevelsUp_context
Index
IndexArrayKeyInfo
+IndexAttrBitmapKind
IndexBuildCallback
IndexBuildResult
IndexBulkDeleteCallback
@@ -818,6 +820,7 @@ IndxInfo
InfoItem
InhInfo
InhOption
+InitLogicalReplicationCmd
InheritableSocket
InlineCodeBlock
InsertStmt
@@ -937,6 +940,17 @@ LockTupleMode
LockingClause
LogOpts
LogStmtLevel
+LogicalDecodeBeginCB
+LogicalDecodeChangeCB
+LogicalDecodeCleanupCB
+LogicalDecodeCommitCB
+LogicalDecodeInitCB
+LogicalDecodingCheckpointData
+LogicalDecodingContext
+LogicalDecodingCtlData
+LogicalDecodingSlot
+LogicalOutputPluginWriterPrepareWrite
+LogicalOutputPluginWriterWrite
LogicalTape
LogicalTapeSet
MAGIC
@@ -1050,6 +1064,7 @@ OprInfo
OprProofCacheEntry
OprProofCacheKey
OutputContext
+OutputPluginCallbacks
OverrideSearchPath
OverrideStackEntry
PACE_HEADER
@@ -1464,6 +1479,21 @@ Relids
RelocationBufferInfo
RenameStmt
ReopenPtr
+ReorderBuffer
+ReorderBufferApplyChangeCB
+ReorderBufferBeginCB
+ReorderBufferChange
+ReorderBufferChangeTypeInternal
+ReorderBufferCommitCB
+ReorderBufferDiskChange
+ReorderBufferIterTXNEntry
+ReorderBufferIterTXNState
+ReorderBufferToastEnt
+ReorderBufferTupleBuf
+ReorderBufferTupleCidEnt
+ReorderBufferTupleCidKey
+ReorderBufferTXN
+ReorderBufferTXNByIdEnt
ReplaceVarsFromTargetList_context
ReplaceVarsNoMatchOption
ResTarget
@@ -1518,6 +1548,8 @@ SID_NAME_USE
SISeg
SMgrRelation
SMgrRelationData
+SnapBuildAction
+SnapBuildState
SOCKADDR
SOCKET
SPELL
@@ -1609,6 +1641,8 @@ SlruSharedData
Snapshot
SnapshotData
SnapshotSatisfiesFunc
+Snapstate
+SnapstateOnDisk
SockAddr
Sort
SortBy
@@ -1651,6 +1685,7 @@ StandardChunkHeader
StartBlobPtr
StartBlobsPtr
StartDataPtr
+StartLogicalReplicationCmd
StartReplicationCmd
StartupPacket
StatEntry
@@ -1874,6 +1909,7 @@ WalRcvData
WalRcvState
WalSnd
WalSndCtlData
+WalSndSendData
WalSndState
WholeRowVarExprState
WindowAgg
@@ -1925,6 +1961,7 @@ XLogReaderState
XLogRecData
XLogRecPtr
XLogRecord
+XLogRecordBuffer
XLogSegNo
XLogSource
XLogwrtResult
@@ -2347,6 +2384,7 @@ symbol
tablespaceinfo
teReqs
teSection
+TestDecodingData
temp_tablespaces_extra
text
timeKEY
@@ -2419,11 +2457,13 @@ xl_heap_cleanup_info
xl_heap_delete
xl_heap_freeze
xl_heap_header
+xl_heap_header_len
xl_heap_inplace
xl_heap_insert
xl_heap_lock
xl_heap_lock_updated
xl_heap_multi_insert
+xl_heap_new_cid
xl_heap_newpage
xl_heap_update
xl_heap_visible
--
1.8.4.21.g992c386.dirty
0005-wal_decoding-test_decoding-Add-a-simple-decoding-mod.patchtext/x-patch; charset=us-asciiDownload
>From 9b7532b418d87087175f75bcbb5b7fb36ace4509 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 5/8] wal_decoding: test_decoding: Add a simple decoding module
in contrib
This is mostly useful for testing, demonstration and documentation purposes.
---
contrib/Makefile | 1 +
contrib/test_decoding/Makefile | 16 ++
contrib/test_decoding/test_decoding.c | 322 ++++++++++++++++++++++++++++++++++
3 files changed, 339 insertions(+)
create mode 100644 contrib/test_decoding/Makefile
create mode 100644 contrib/test_decoding/test_decoding.c
diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..6d2fe32 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -50,6 +50,7 @@ SUBDIRS = \
tablefunc \
tcn \
test_parser \
+ test_decoding \
tsearch2 \
unaccent \
vacuumlo \
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
new file mode 100644
index 0000000..2ac9653
--- /dev/null
+++ b/contrib/test_decoding/Makefile
@@ -0,0 +1,16 @@
+# contrib/test_decoding/Makefile
+
+MODULE_big = test_decoding
+OBJS = test_decoding.o
+
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_decoding
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
new file mode 100644
index 0000000..fb9a240
--- /dev/null
+++ b/contrib/test_decoding/test_decoding.c
@@ -0,0 +1,322 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_decoding.c
+ * example output plugin for the logical replication functionality
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/test_decoding/test_decoding.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/sysattr.h"
+
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "catalog/index.h"
+
+#include "nodes/parsenodes.h"
+
+#include "replication/output_plugin.h"
+#include "replication/logical.h"
+
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relcache.h"
+#include "utils/syscache.h"
+#include "utils/typcache.h"
+
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+
+typedef struct
+{
+ MemoryContext context;
+ bool include_xids;
+} TestDecodingData;
+
+/* These must be available to pg_dlsym() */
+extern void pg_decode_init(LogicalDecodingContext *ctx, bool is_init);
+extern void pg_decode_begin_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn);
+extern void pg_decode_commit_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+extern void pg_decode_change(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, Relation rel,
+ ReorderBufferChange *change);
+
+void
+_PG_init(void)
+{
+}
+
+/* initialize this plugin */
+void
+pg_decode_init(LogicalDecodingContext *ctx, bool is_init)
+{
+ ListCell *option;
+ TestDecodingData *data;
+
+ AssertVariableIsOfType(&pg_decode_init, LogicalDecodeInitCB);
+
+ data = palloc(sizeof(TestDecodingData));
+ data->context = AllocSetContextCreate(TopMemoryContext,
+ "text conversion context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ data->include_xids = true;
+
+ ctx->output_plugin_private = data;
+
+ foreach(option, ctx->output_plugin_options)
+ {
+ DefElem *elem = lfirst(option);
+
+ Assert(elem->arg == NULL || IsA(elem->arg, String));
+
+ if (strcmp(elem->defname, "hide-xids") == 0)
+ {
+ /* FIXME: parse argument */
+ data->include_xids = false;
+ }
+ else
+ {
+ elog(WARNING, "option %s = %s is unknown",
+ elem->defname, elem->arg ? strVal(elem->arg) : "(null)");
+ }
+ }
+}
+
+/* BEGIN callback */
+void
+pg_decode_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ AssertVariableIsOfType(&pg_decode_begin_txn, LogicalDecodeBeginCB);
+
+ ctx->prepare_write(ctx, txn->end_lsn, txn->xid);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "BEGIN %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "BEGIN");
+ ctx->write(ctx, txn->end_lsn, txn->xid);
+}
+
+/* COMMIT callback */
+void
+pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ AssertVariableIsOfType(&pg_decode_commit_txn, LogicalDecodeCommitCB);
+
+ ctx->prepare_write(ctx, txn->end_lsn, txn->xid);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "COMMIT %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "COMMIT");
+ ctx->write(ctx, txn->end_lsn, txn->xid);
+}
+
+/* print the tuple 'tuple' into the StringInfo s */
+static void
+tuple_to_stringinfo(StringInfo s, TupleDesc tupdesc, HeapTuple tuple)
+{
+ int natt;
+ Oid oid;
+
+ /* print oid of tuple, it's not included in the TupleDesc */
+ if ((oid = HeapTupleHeaderGetOid(tuple->t_data)) != InvalidOid)
+ {
+ appendStringInfo(s, " oid[oid]:%u", oid);
+ }
+
+ /* print all columns individually */
+ for (natt = 0; natt < tupdesc->natts; natt++)
+ {
+ Form_pg_attribute attr; /* the attribute itself */
+ Oid typid; /* type of current attribute */
+ HeapTuple type_tuple; /* information about a type */
+ Form_pg_type type_form;
+ Oid typoutput; /* output function */
+ bool typisvarlena;
+ Datum origval; /* possibly toasted Datum */
+ Datum val; /* definitely detoasted Datum */
+ char *outputstr = NULL;
+ bool isnull; /* column is null? */
+
+ attr = tupdesc->attrs[natt];
+
+ /*
+ * don't print dropped columns, we can't be sure everything is
+ * available for them
+ */
+ if (attr->attisdropped)
+ continue;
+
+ /*
+ * Don't print system columns, oid will already have been printed if
+ * present.
+ */
+ if (attr->attnum < 0)
+ continue;
+
+ typid = attr->atttypid;
+
+ /* gather type name */
+ type_tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
+ if (!HeapTupleIsValid(type_tuple))
+ elog(ERROR, "cache lookup failed for type %u", typid);
+ type_form = (Form_pg_type) GETSTRUCT(type_tuple);
+
+ /* print attribute name */
+ appendStringInfoChar(s, ' ');
+ appendStringInfoString(s, NameStr(attr->attname));
+
+ /* print attribute type */
+ appendStringInfoChar(s, '[');
+ appendStringInfoString(s, NameStr(type_form->typname));
+ appendStringInfoChar(s, ']');
+
+ /* query output function */
+ getTypeOutputInfo(typid,
+ &typoutput, &typisvarlena);
+
+ ReleaseSysCache(type_tuple);
+
+ /* get Datum from tuple */
+ origval = fastgetattr(tuple, natt + 1, tupdesc, &isnull);
+
+ if (isnull)
+ outputstr = "(null)";
+ else if (typisvarlena && VARATT_IS_EXTERNAL_ONDISK(origval))
+ outputstr = "(unchanged-toast-datum)";
+ else if (typisvarlena)
+ val = PointerGetDatum(PG_DETOAST_DATUM(origval));
+ else
+ val = origval;
+
+ /* call output function if necessary */
+ if (outputstr == NULL)
+ outputstr = OidOutputFunctionCall(typoutput, val);
+
+ /* print data */
+ appendStringInfoChar(s, ':');
+ appendStringInfoString(s, outputstr);
+ }
+}
+
+/*
+ * callback for individual changed tuples
+ */
+void
+pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ TestDecodingData *data;
+ Form_pg_class class_form;
+ TupleDesc tupdesc;
+ MemoryContext old;
+
+ AssertVariableIsOfType(&pg_decode_change, LogicalDecodeChangeCB);
+
+ data = ctx->output_plugin_private;
+ class_form = RelationGetForm(relation);
+ tupdesc = RelationGetDescr(relation);
+
+ /* Avoid leaking memory by using and resetting our own context */
+ old = MemoryContextSwitchTo(data->context);
+
+ ctx->prepare_write(ctx, change->lsn, txn->xid);
+
+ appendStringInfoString(ctx->out, "table \"");
+ appendStringInfoString(ctx->out, NameStr(class_form->relname));
+ appendStringInfoString(ctx->out, "\":");
+
+ switch (change->action)
+ {
+ case REORDER_BUFFER_CHANGE_INSERT:
+ appendStringInfoString(ctx->out, " INSERT:");
+ if (change->newtuple == NULL)
+ appendStringInfoString(ctx->out, " (no-tuple-data)");
+ else
+ tuple_to_stringinfo(ctx->out, tupdesc, &change->newtuple->tuple);
+ break;
+ case REORDER_BUFFER_CHANGE_UPDATE:
+ appendStringInfoString(ctx->out, " UPDATE:");
+ if (change->oldtuple != NULL)
+ {
+ Relation indexrel;
+ TupleDesc indexdesc;
+
+ appendStringInfoString(ctx->out, " old-pkey:");
+ RelationGetIndexList(relation);
+
+ if (!OidIsValid(relation->rd_primary))
+ {
+ elog(LOG, "tuple in table with oid: %u without primary key",
+ RelationGetRelid(relation));
+ break;
+ }
+
+ indexrel = RelationIdGetRelation(relation->rd_primary);
+
+ indexdesc = RelationGetDescr(indexrel);
+
+ tuple_to_stringinfo(ctx->out, indexdesc, &change->oldtuple->tuple);
+
+ RelationClose(indexrel);
+ appendStringInfoString(ctx->out, " new-tuple:");
+ }
+
+ if (change->newtuple == NULL)
+ appendStringInfoString(ctx->out, " (no-tuple-data)");
+ else
+ tuple_to_stringinfo(ctx->out, tupdesc, &change->newtuple->tuple);
+
+ break;
+ case REORDER_BUFFER_CHANGE_DELETE:
+ appendStringInfoString(ctx->out, " DELETE:");
+
+ /* if there was no PK, we only know that a delete happened */
+ if (change->oldtuple == NULL)
+ appendStringInfoString(ctx->out, " (no-tuple-data)");
+ /* In DELETE, only the PK is present; display that */
+ else
+ {
+ Relation indexrel;
+
+ /* make sure rd_primary is set */
+ RelationGetIndexList(relation);
+
+ if (!OidIsValid(relation->rd_primary))
+ {
+ elog(LOG, "tuple in table with oid: %u without primary key",
+ RelationGetRelid(relation));
+ break;
+ }
+
+ indexrel = RelationIdGetRelation(relation->rd_primary);
+
+ tuple_to_stringinfo(ctx->out, RelationGetDescr(indexrel),
+ &change->oldtuple->tuple);
+
+ RelationClose(indexrel);
+ }
+ break;
+ }
+
+ MemoryContextSwitchTo(old);
+ MemoryContextReset(data->context);
+
+ ctx->write(ctx, change->lsn, txn->xid);
+}
--
1.8.4.21.g992c386.dirty
0006-wal_decoding-pg_receivellog-Introduce-pg_receivexlog.patchtext/x-patch; charset=us-asciiDownload
>From e4e5016a34411c2a18cccd6ab4b3f749fe283ce1 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 6/8] wal_decoding: pg_receivellog: Introduce pg_receivexlog
equivalent for logical changes
---
src/backend/utils/cache/relcache.c | 3 +
src/bin/pg_basebackup/.gitignore | 1 +
src/bin/pg_basebackup/Makefile | 11 +-
src/bin/pg_basebackup/pg_receivellog.c | 860 +++++++++++++++++++++++++++++++++
src/bin/pg_basebackup/receivelog.c | 137 +-----
src/bin/pg_basebackup/receivelog.h | 2 +
src/bin/pg_basebackup/streamutil.c | 123 ++++-
src/bin/pg_basebackup/streamutil.h | 10 +
8 files changed, 1023 insertions(+), 124 deletions(-)
create mode 100644 src/bin/pg_basebackup/pg_receivellog.c
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 5d304ce..1b66e64 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1577,6 +1577,9 @@ RelationIdGetRelation(Oid relationId)
{
Relation rd;
+ /* Make sure we're in a xact, even if this ends up being a cache hit */
+ Assert(IsTransactionState());
+
/*
* first try to find reldesc in the cache
*/
diff --git a/src/bin/pg_basebackup/.gitignore b/src/bin/pg_basebackup/.gitignore
index 1334a1f..eb2978c 100644
--- a/src/bin/pg_basebackup/.gitignore
+++ b/src/bin/pg_basebackup/.gitignore
@@ -1,2 +1,3 @@
/pg_basebackup
/pg_receivexlog
+/pg_receivellog
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a707c93..c251249 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -20,7 +20,7 @@ override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
OBJS=receivelog.o streamutil.o $(WIN32RES)
-all: pg_basebackup pg_receivexlog
+all: pg_basebackup pg_receivexlog pg_receivellog
pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport
$(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -28,9 +28,13 @@ pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport
pg_receivexlog: pg_receivexlog.o $(OBJS) | submake-libpq submake-libpgport
$(CC) $(CFLAGS) pg_receivexlog.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_receivellog: pg_receivellog.o $(OBJS) | submake-libpq submake-libpgport
+ $(CC) $(CFLAGS) pg_receivellog.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
install: all installdirs
$(INSTALL_PROGRAM) pg_basebackup$(X) '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
$(INSTALL_PROGRAM) pg_receivexlog$(X) '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
+ $(INSTALL_PROGRAM) pg_receivellog$(X) '$(DESTDIR)$(bindir)/pg_receivellog$(X)'
installdirs:
$(MKDIR_P) '$(DESTDIR)$(bindir)'
@@ -38,6 +42,9 @@ installdirs:
uninstall:
rm -f '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
rm -f '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
+ rm -f '$(DESTDIR)$(bindir)/pg_receivellog$(X)'
clean distclean maintainer-clean:
- rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o
+ rm -f pg_basebackup$(X) pg_receivexlog$(X) pg_receivellog$(X) \
+ pg_basebackup.o pg_receivexlog.o pg_receivellog.o \
+ $(OBJS)
diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c
new file mode 100644
index 0000000..fc81608
--- /dev/null
+++ b/src/bin/pg_basebackup/pg_receivellog.c
@@ -0,0 +1,860 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_receivellog.c - receive streaming logical log data and write it
+ * to a local file.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/pg_receivellog.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "streamutil.h"
+
+#include "getopt_long.h"
+
+#include "libpq-fe.h"
+#include "libpq/pqsignal.h"
+
+#include "access/xlog_internal.h"
+#include "common/fe_memutils.h"
+
+#include <dirent.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+/* Time to sleep between reconnection attempts */
+#define RECONNECT_SLEEP_TIME 5
+
+/* Global Options */
+static char *outfile = NULL;
+static int verbose = 0;
+static int noloop = 0;
+static int standby_message_timeout = 10 * 1000; /* 10 sec = default */
+static const char *slot = NULL;
+static XLogRecPtr startpos = InvalidXLogRecPtr;
+static bool do_init_slot = false;
+static bool do_start_slot = false;
+static bool do_stop_slot = false;
+
+/* filled pairwise with option, value. value may be NULL */
+static char **options;
+static size_t noptions = 0;
+static const char *plugin = "test_decoding";
+
+/* Global State */
+static int outfd = -1;
+static volatile bool time_to_abort = false;
+
+static void usage(void);
+static void StreamLog();
+
+static void
+usage(void)
+{
+ printf(_("%s receives PostgreSQL logical change stream.\n\n"),
+ progname);
+ printf(_("Usage:\n"));
+ printf(_(" %s [OPTION]...\n"), progname);
+ printf(_("\nOptions:\n"));
+ printf(_(" -f, --file=FILE receive log into this file. - for stdout\n"));
+ printf(_(" -n, --no-loop do not loop on connection lost\n"));
+ printf(_(" -v, --verbose output verbose messages\n"));
+ printf(_(" -V, --version output version information, then exit\n"));
+ printf(_(" -?, --help show this help, then exit\n"));
+ printf(_("\nConnection options:\n"));
+ printf(_(" -d, --database=DBNAME database to connect to\n"));
+ printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
+ printf(_(" -p, --port=PORT database server port number\n"));
+ printf(_(" -U, --username=NAME connect as specified database user\n"));
+ printf(_(" -w, --no-password never prompt for password\n"));
+ printf(_(" -W, --password force password prompt (should happen automatically)\n"));
+ printf(_("\nReplication options:\n"));
+ printf(_(" -o, --option=NAME[=VALUE]\n"
+ " Specify option NAME with optional value VAL, to be passed\n"
+ " to the output plugin\n"));
+ printf(_(" -P, --plugin=PLUGIN use output plugin PLUGIN (defaults to test_decoding)\n"));
+ printf(_(" -s, --status-interval=INTERVAL\n"
+ " time between status packets sent to server (in seconds)\n"));
+ printf(_(" -S, --slot=SLOT use existing replication slot SLOT instead of starting a new one\n"));
+ printf(_(" -I, --startpos=PTR Where in an existing slot should the streaming start"));
+ printf(_("\nAction to be performed:\n"));
+ printf(_(" --init initiate a new replication slot (for the slotname see --slot)\n"));
+ printf(_(" --start start streaming in a replication slot (for the slotname see --slot)\n"));
+ printf(_(" --stop stop the replication slot (for the slotname see --slot)\n"));
+ printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+/*
+ * Send a Standby Status Update message to server.
+ */
+static bool
+sendFeedback(PGconn *conn, XLogRecPtr blockpos, int64 now, bool force, bool replyRequested)
+{
+ char replybuf[1 + 8 + 8 + 8 + 8 + 1];
+ int len = 0;
+
+ /*
+ * we normally don't want to send superflous feedbacks, but if
+ * it's because of a timeout we need to, otherwise
+ * replication_timeout will kill us.
+ */
+ if (blockpos == startpos && !force)
+ return true;
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: confirming flush up to %X/%X (slot %s)\n"),
+ progname, (uint32) (blockpos >> 32), (uint32) blockpos,
+ slot);
+
+ replybuf[len] = 'r';
+ len += 1;
+ fe_sendint64(blockpos, &replybuf[len]); /* write */
+ len += 8;
+ fe_sendint64(blockpos, &replybuf[len]); /* flush */
+ len += 8;
+ fe_sendint64(InvalidXLogRecPtr, &replybuf[len]); /* apply */
+ len += 8;
+ fe_sendint64(now, &replybuf[len]); /* sendTime */
+ len += 8;
+ replybuf[len] = replyRequested ? 1 : 0; /* replyRequested */
+ len += 1;
+
+ startpos = blockpos;
+
+ if (PQputCopyData(conn, replybuf, len) <= 0 || PQflush(conn))
+ {
+ fprintf(stderr, _("%s: could not send feedback packet: %s"),
+ progname, PQerrorMessage(conn));
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Start the log streaming
+ */
+static void
+StreamLog(void)
+{
+ PGresult *res;
+ char query[512];
+ char *copybuf = NULL;
+ int64 last_status = -1;
+ XLogRecPtr logoff = InvalidXLogRecPtr;
+ int written;
+ int i;
+
+ /*
+ * Connect in replication mode to the server
+ */
+ if (!conn)
+ conn = GetConnection();
+ if (!conn)
+ /* Error message already written in GetConnection() */
+ return;
+
+ /*
+ * Start the replication
+ */
+ if (verbose)
+ fprintf(stderr,
+ _("%s: starting log streaming at %X/%X (slot %s)\n"),
+ progname, (uint32) (startpos >> 32), (uint32) startpos,
+ slot);
+
+ /* Initiate the replication stream at specified location */
+ written = snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION \"%s\" %X/%X",
+ slot, (uint32) (startpos >> 32), (uint32) startpos);
+
+ /*
+ * add options to string, if present
+ * Oh, if we just had stringinfo in src/common...
+ */
+ if (noptions)
+ written += snprintf(query + written, sizeof(query) - written, " (");
+
+ for (i = 0; i < noptions; i++)
+ {
+ /* separator */
+ if (i > 0)
+ written += snprintf(query + written, sizeof(query) - written, ", ");
+
+ /* write option name */
+ written += snprintf(query + written, sizeof(query) - written, "\"%s\"",
+ options[(i * 2)]);
+
+ if (written >= sizeof(query) - 1)
+ {
+ fprintf(stderr, _("%s: option string too long\n"), progname);
+ exit(1); /* no point in retrying, fatal error */
+ }
+
+ /* write option name if specified */
+ if (options[(i * 2) + 1] != NULL)
+ {
+ written += snprintf(query + written, sizeof(query) - written, " '%s'",
+ options[(i * 2) + 1]);
+
+ if (written >= sizeof(query) - 1)
+ {
+ fprintf(stderr, _("%s: option string too long\n"), progname);
+ exit(1); /* no point in retrying, fatal error */
+ }
+ }
+ }
+
+ if (noptions)
+ {
+ written += snprintf(query + written, sizeof(query) - written, ")");
+ if (written >= sizeof(query) - 1)
+ {
+ fprintf(stderr, _("%s: option string too long\n"), progname);
+ exit(1); /* no point in retrying, fatal error */
+ }
+ }
+
+ res = PQexec(conn, query);
+ if (PQresultStatus(res) != PGRES_COPY_BOTH)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s\n"),
+ progname, query, PQresultErrorMessage(res));
+ PQclear(res);
+ goto error;
+ }
+ PQclear(res);
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: initiated streaming\n"),
+ progname);
+
+ while (!time_to_abort)
+ {
+ int r;
+ int bytes_left;
+ int bytes_written;
+ int64 now;
+ int hdr_len;
+
+ if (copybuf != NULL)
+ {
+ PQfreemem(copybuf);
+ copybuf = NULL;
+ }
+
+ /*
+ * Potentially send a status message to the master
+ */
+ now = feGetCurrentTimestamp();
+ if (standby_message_timeout > 0 &&
+ feTimestampDifferenceExceeds(last_status, now,
+ standby_message_timeout))
+ {
+ /* Time to send feedback! */
+ if (!sendFeedback(conn, logoff, now, true, false))
+ goto error;
+
+ last_status = now;
+ }
+
+ r = PQgetCopyData(conn, ©buf, 1);
+ if (r == 0)
+ {
+ /*
+ * In async mode, and no data available. We block on reading but
+ * not more than the specified timeout, so that we can send a
+ * response back to the client.
+ */
+ fd_set input_mask;
+ struct timeval timeout;
+ struct timeval *timeoutptr;
+
+ FD_ZERO(&input_mask);
+ FD_SET(PQsocket(conn), &input_mask);
+ if (standby_message_timeout)
+ {
+ int64 targettime;
+ long secs;
+ int usecs;
+
+ targettime = last_status + (standby_message_timeout - 1) *
+ ((int64) 1000);
+ feTimestampDifference(now,
+ targettime,
+ &secs,
+ &usecs);
+ if (secs <= 0)
+ timeout.tv_sec = 1; /* Always sleep at least 1 sec */
+ else
+ timeout.tv_sec = secs;
+ timeout.tv_usec = usecs;
+ timeoutptr = &timeout;
+ }
+ else
+ timeoutptr = NULL;
+
+ r = select(PQsocket(conn) + 1, &input_mask, NULL, NULL, timeoutptr);
+ if (r == 0 || (r < 0 && errno == EINTR))
+ {
+ /*
+ * Got a timeout or signal. Continue the loop and either
+ * deliver a status packet to the server or just go back into
+ * blocking.
+ */
+ continue;
+ }
+ else if (r < 0)
+ {
+ fprintf(stderr, _("%s: select() failed: %s\n"),
+ progname, strerror(errno));
+ goto error;
+ }
+ /* Else there is actually data on the socket */
+ if (PQconsumeInput(conn) == 0)
+ {
+ fprintf(stderr,
+ _("%s: could not receive data from WAL stream: %s"),
+ progname, PQerrorMessage(conn));
+ goto error;
+ }
+ continue;
+ }
+ if (r == -1)
+ /* End of copy stream */
+ break;
+ if (r == -2)
+ {
+ fprintf(stderr, _("%s: could not read COPY data: %s"),
+ progname, PQerrorMessage(conn));
+ goto error;
+ }
+
+ /* Check the message type. */
+ if (copybuf[0] == 'k')
+ {
+ int pos;
+ bool replyRequested;
+
+ /*
+ * Parse the keepalive message, enclosed in the CopyData message.
+ * We just check if the server requested a reply, and ignore the
+ * rest.
+ */
+ pos = 1; /* skip msgtype 'k' */
+ pos += 8; /* skip walEnd */
+ pos += 8; /* skip sendTime */
+
+ if (r < pos + 1)
+ {
+ fprintf(stderr, _("%s: streaming header too small: %d\n"),
+ progname, r);
+ goto error;
+ }
+ replyRequested = copybuf[pos];
+
+ /* If the server requested an immediate reply, send one. */
+ if (replyRequested)
+ {
+ now = feGetCurrentTimestamp();
+ if (!sendFeedback(conn, logoff, now, false, false))
+ goto error;
+ last_status = now;
+ }
+ continue;
+ }
+ else if (copybuf[0] != 'w')
+ {
+ fprintf(stderr, _("%s: unrecognized streaming header: \"%c\"\n"),
+ progname, copybuf[0]);
+ goto error;
+ }
+
+
+ /*
+ * Read the header of the XLogData message, enclosed in the CopyData
+ * message. We only need the WAL location field (dataStart), the rest
+ * of the header is ignored.
+ */
+ hdr_len = 1; /* msgtype 'w' */
+ hdr_len += 8; /* dataStart */
+ hdr_len += 8; /* walEnd */
+ hdr_len += 8; /* sendTime */
+ if (r < hdr_len + 1)
+ {
+ fprintf(stderr, _("%s: streaming header too small: %d\n"),
+ progname, r);
+ goto error;
+ }
+
+ /* Extract WAL location for this block */
+ {
+ XLogRecPtr temp = fe_recvint64(©buf[1]);
+
+ logoff = Max(temp, logoff);
+ }
+
+ if (outfd == -1 && strcmp(outfile, "-") == 0)
+ {
+ outfd = fileno(stdout);
+ }
+ else if (outfd == -1)
+ {
+ outfd = open(outfile, O_CREAT | O_APPEND | O_WRONLY | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+ if (outfd == -1)
+ {
+ fprintf(stderr,
+ _("%s: could not open log file \"%s\": %s\n"),
+ progname, outfile, strerror(errno));
+ goto error;
+ }
+ }
+
+ bytes_left = r - hdr_len;
+ bytes_written = 0;
+
+
+ while (bytes_left)
+ {
+ int ret;
+
+ ret = write(outfd,
+ copybuf + hdr_len + bytes_written,
+ bytes_left);
+
+ if (ret < 0)
+ {
+ fprintf(stderr,
+ _("%s: could not write %u bytes to log file \"%s\": %s\n"),
+ progname, bytes_left, outfile,
+ strerror(errno));
+ goto error;
+ }
+
+ /* Write was successful, advance our position */
+ bytes_written += ret;
+ bytes_left -= ret;
+ }
+
+ if (write(outfd, "\n", 1) != 1)
+ {
+ fprintf(stderr,
+ _("%s: could not write %u bytes to log file \"%s\": %s\n"),
+ progname, 1, outfile,
+ strerror(errno));
+ goto error;
+ }
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ fprintf(stderr,
+ _("%s: unexpected termination of replication stream: %s"),
+ progname, PQresultErrorMessage(res));
+ goto error;
+ }
+ PQclear(res);
+
+ if (copybuf != NULL)
+ PQfreemem(copybuf);
+
+ if (outfd != -1 && close(outfd) != 0)
+ fprintf(stderr, _("%s: could not close file \"%s\": %s\n"),
+ progname, outfile, strerror(errno));
+ outfd = -1;
+error:
+ PQfinish(conn);
+ conn = NULL;
+}
+
+/*
+ * When sigint is called, just tell the system to exit at the next possible
+ * moment.
+ */
+#ifndef WIN32
+
+static void
+sigint_handler(int signum)
+{
+ time_to_abort = true;
+}
+#endif
+
+int
+main(int argc, char **argv)
+{
+ PGresult *res;
+ static struct option long_options[] = {
+/* general options */
+ {"file", required_argument, NULL, 'f'},
+ {"no-loop", no_argument, NULL, 'n'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"version", no_argument, NULL, 'V'},
+ {"help", no_argument, NULL, '?'},
+/* connnection options */
+ {"database", required_argument, NULL, 'd'},
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+/* replication options */
+ {"option", required_argument, NULL, 'o'},
+ {"plugin", required_argument, NULL, 'P'},
+ {"status-interval", required_argument, NULL, 's'},
+ {"slot", required_argument, NULL, 'S'},
+ {"startpos", required_argument, NULL, 'I'},
+/* action */
+ {"init", no_argument, NULL, 1},
+ {"start", no_argument, NULL, 2},
+ {"stop", no_argument, NULL, 3},
+ {NULL, 0, NULL, 0}
+ };
+ int c;
+ int option_index;
+ uint32 hi,
+ lo;
+
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_receivellog"));
+
+ if (argc > 1)
+ {
+ if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+ {
+ usage();
+ exit(0);
+ }
+ else if (strcmp(argv[1], "-V") == 0 ||
+ strcmp(argv[1], "--version") == 0)
+ {
+ puts("pg_receivellog (PostgreSQL) " PG_VERSION);
+ exit(0);
+ }
+ }
+
+ while ((c = getopt_long(argc, argv, "f:nvd:h:o:p:U:wWP:s:S:",
+ long_options, &option_index)) != -1)
+ {
+ switch (c)
+ {
+/* general options */
+ case 'f':
+ outfile = pg_strdup(optarg);
+ break;
+ case 'n':
+ noloop = 1;
+ break;
+ case 'v':
+ verbose++;
+ break;
+/* connnection options */
+ case 'd':
+ dbname = pg_strdup(optarg);
+ break;
+ case 'h':
+ dbhost = pg_strdup(optarg);
+ break;
+ case 'p':
+ if (atoi(optarg) <= 0)
+ {
+ fprintf(stderr, _("%s: invalid port number \"%s\"\n"),
+ progname, optarg);
+ exit(1);
+ }
+ dbport = pg_strdup(optarg);
+ break;
+ case 'U':
+ dbuser = pg_strdup(optarg);
+ break;
+ case 'w':
+ dbgetpassword = -1;
+ break;
+ case 'W':
+ dbgetpassword = 1;
+ break;
+/* replication options */
+ case 'o':
+ {
+ char *data = pg_strdup(optarg);
+ char *val = strchr(data, '=');
+
+ if (val != NULL)
+ {
+ /* remove =; separate data from val */
+ *val = '\0';
+ val++;
+ }
+
+ noptions += 1;
+ options = pg_realloc(options, sizeof(char*) * noptions * 2);
+
+ options[(noptions - 1) * 2] = data;
+ options[(noptions - 1) * 2 + 1] = val;
+ }
+
+ break;
+ case 'P':
+ plugin = pg_strdup(optarg);
+ break;
+ case 's':
+ standby_message_timeout = atoi(optarg) * 1000;
+ if (standby_message_timeout < 0)
+ {
+ fprintf(stderr, _("%s: invalid status interval \"%s\"\n"),
+ progname, optarg);
+ exit(1);
+ }
+ break;
+ case 'S':
+ slot = pg_strdup(optarg);
+ break;
+ case 'I':
+ if (sscanf(optarg, "%X/%X", &hi, &lo) != 2)
+ {
+ fprintf(stderr,
+ _("%s: could not parse start position \"%s\"\n"),
+ progname, optarg);
+ exit(1);
+ }
+ startpos = ((uint64) hi) << 32 | lo;
+ break;
+/* action */
+ case 1:
+ do_init_slot = true;
+ break;
+ case 2:
+ do_start_slot = true;
+ break;
+ case 3:
+ do_stop_slot = true;
+ break;
+
+ default:
+
+ /*
+ * getopt_long already emitted a complaint
+ */
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ }
+
+ /*
+ * Any non-option arguments?
+ */
+ if (optind < argc)
+ {
+ fprintf(stderr,
+ _("%s: too many command-line arguments (first is \"%s\")\n"),
+ progname, argv[optind]);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Required arguments
+ */
+ if (slot == NULL)
+ {
+ fprintf(stderr, _("%s: no slot specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (!do_stop_slot && outfile == NULL)
+ {
+ fprintf(stderr, _("%s: no target file specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (!do_stop_slot && dbname == NULL)
+ {
+ fprintf(stderr, _("%s: no database specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (!do_stop_slot && !do_init_slot && !do_start_slot)
+ {
+ fprintf(stderr, _("%s: at least one action needs to be specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (do_stop_slot && (do_init_slot || do_start_slot))
+ {
+ fprintf(stderr, _("%s: --stop cannot be combined with --init or --start\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (startpos && (do_init_slot || do_stop_slot))
+ {
+ fprintf(stderr, _("%s: --startpos cannot be combined with --init or --stop\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+#ifndef WIN32
+ pqsignal(SIGINT, sigint_handler);
+#endif
+
+ /*
+ * don't really need this but it actually helps to get more precise error
+ * messages about authentication, required GUCs and such without starting
+ * to loop around connection attempts lateron.
+ */
+ {
+ conn = GetConnection();
+ if (!conn)
+ /* Error message already written in GetConnection() */
+ exit(1);
+
+ /*
+ * Run IDENTIFY_SYSTEM so we can get the timeline and current xlog
+ * position.
+ */
+ res = PQexec(conn, "IDENTIFY_SYSTEM");
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+ progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
+ disconnect_and_exit(1);
+ }
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
+ {
+ fprintf(stderr,
+ _("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
+ progname, PQntuples(res), PQnfields(res), 1, 4);
+ disconnect_and_exit(1);
+ }
+ PQclear(res);
+ }
+
+
+ /*
+ * stop a replication slot
+ */
+ if (do_stop_slot)
+ {
+ char query[256];
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: init replication slot \"%s\"\n"),
+ progname, slot);
+
+ snprintf(query, sizeof(query), "FREE_LOGICAL_REPLICATION \"%s\"",
+ slot);
+ res = PQexec(conn, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+ progname, query, PQerrorMessage(conn));
+ disconnect_and_exit(1);
+ }
+
+ if (PQntuples(res) != 0 || PQnfields(res) != 0)
+ {
+ fprintf(stderr,
+ _("%s: could not stop logical rep: got %d rows and %d fields, expected %d rows and %d fields\n"),
+ progname, PQntuples(res), PQnfields(res), 0, 0);
+ disconnect_and_exit(1);
+ }
+
+ PQclear(res);
+ disconnect_and_exit(0);
+ }
+
+ /*
+ * init a replication slot
+ */
+ if (do_init_slot)
+ {
+ char query[256];
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: init replication slot \"%s\"\n"),
+ progname, slot);
+
+ snprintf(query, sizeof(query), "INIT_LOGICAL_REPLICATION \"%s\" \"%s\"",
+ slot, plugin);
+
+ res = PQexec(conn, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+ progname, query, PQerrorMessage(conn));
+ disconnect_and_exit(1);
+ }
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
+ {
+ fprintf(stderr,
+ _("%s: could not init logical rep: got %d rows and %d fields, expected %d rows and %d fields\n"),
+ progname, PQntuples(res), PQnfields(res), 1, 4);
+ disconnect_and_exit(1);
+ }
+
+ if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+ {
+ fprintf(stderr,
+ _("%s: could not parse log location \"%s\"\n"),
+ progname, PQgetvalue(res, 0, 1));
+ disconnect_and_exit(1);
+ }
+ startpos = ((uint64) hi) << 32 | lo;
+
+ slot = strdup(PQgetvalue(res, 0, 0));
+ PQclear(res);
+ }
+
+
+ if (!do_start_slot)
+ disconnect_and_exit(0);
+
+ while (true)
+ {
+ StreamLog();
+ if (time_to_abort)
+ {
+ /*
+ * We've been Ctrl-C'ed. That's not an error, so exit without an
+ * errorcode.
+ */
+ disconnect_and_exit(0);
+ }
+ else if (noloop)
+ {
+ fprintf(stderr, _("%s: disconnected.\n"), progname);
+ exit(1);
+ }
+ else
+ {
+ fprintf(stderr,
+ /* translator: check source for value for %d */
+ _("%s: disconnected. Waiting %d seconds to try again.\n"),
+ progname, RECONNECT_SLEEP_TIME);
+ pg_usleep(RECONNECT_SLEEP_TIME * 1000000);
+ }
+ }
+}
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index 22a5340..f027e1e 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -11,21 +11,18 @@
* src/bin/pg_basebackup/receivelog.c
*-------------------------------------------------------------------------
*/
+
#include "postgres_fe.h"
-#include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/types.h>
-#include <unistd.h>
-/* for ntohl/htonl */
-#include <netinet/in.h>
-#include <arpa/inet.h>
+/* local includes */
+#include "receivelog.h"
+#include "streamutil.h"
#include "libpq-fe.h"
#include "access/xlog_internal.h"
-#include "receivelog.h"
-#include "streamutil.h"
+#include <sys/stat.h>
+#include <unistd.h>
/* fd and filename for currently open WAL file */
@@ -193,63 +190,6 @@ close_walfile(char *basedir, char *partial_suffix)
/*
- * Local version of GetCurrentTimestamp(), since we are not linked with
- * backend code. The protocol always uses integer timestamps, regardless of
- * server setting.
- */
-static int64
-localGetCurrentTimestamp(void)
-{
- int64 result;
- struct timeval tp;
-
- gettimeofday(&tp, NULL);
-
- result = (int64) tp.tv_sec -
- ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY);
-
- result = (result * USECS_PER_SEC) + tp.tv_usec;
-
- return result;
-}
-
-/*
- * Local version of TimestampDifference(), since we are not linked with
- * backend code.
- */
-static void
-localTimestampDifference(int64 start_time, int64 stop_time,
- long *secs, int *microsecs)
-{
- int64 diff = stop_time - start_time;
-
- if (diff <= 0)
- {
- *secs = 0;
- *microsecs = 0;
- }
- else
- {
- *secs = (long) (diff / USECS_PER_SEC);
- *microsecs = (int) (diff % USECS_PER_SEC);
- }
-}
-
-/*
- * Local version of TimestampDifferenceExceeds(), since we are not
- * linked with backend code.
- */
-static bool
-localTimestampDifferenceExceeds(int64 start_time,
- int64 stop_time,
- int msec)
-{
- int64 diff = stop_time - start_time;
-
- return (diff >= msec * INT64CONST(1000));
-}
-
-/*
* Check if a timeline history file exists.
*/
static bool
@@ -369,47 +309,6 @@ writeTimeLineHistoryFile(char *basedir, TimeLineID tli, char *filename, char *co
}
/*
- * Converts an int64 to network byte order.
- */
-static void
-sendint64(int64 i, char *buf)
-{
- uint32 n32;
-
- /* High order half first, since we're doing MSB-first */
- n32 = (uint32) (i >> 32);
- n32 = htonl(n32);
- memcpy(&buf[0], &n32, 4);
-
- /* Now the low order half */
- n32 = (uint32) i;
- n32 = htonl(n32);
- memcpy(&buf[4], &n32, 4);
-}
-
-/*
- * Converts an int64 from network byte order to native format.
- */
-static int64
-recvint64(char *buf)
-{
- int64 result;
- uint32 h32;
- uint32 l32;
-
- memcpy(&h32, buf, 4);
- memcpy(&l32, buf + 4, 4);
- h32 = ntohl(h32);
- l32 = ntohl(l32);
-
- result = h32;
- result <<= 32;
- result |= l32;
-
- return result;
-}
-
-/*
* Send a Standby Status Update message to server.
*/
static bool
@@ -420,13 +319,13 @@ sendFeedback(PGconn *conn, XLogRecPtr blockpos, int64 now, bool replyRequested)
replybuf[len] = 'r';
len += 1;
- sendint64(blockpos, &replybuf[len]); /* write */
+ fe_sendint64(blockpos, &replybuf[len]); /* write */
len += 8;
- sendint64(InvalidXLogRecPtr, &replybuf[len]); /* flush */
+ fe_sendint64(InvalidXLogRecPtr, &replybuf[len]); /* flush */
len += 8;
- sendint64(InvalidXLogRecPtr, &replybuf[len]); /* apply */
+ fe_sendint64(InvalidXLogRecPtr, &replybuf[len]); /* apply */
len += 8;
- sendint64(now, &replybuf[len]); /* sendTime */
+ fe_sendint64(now, &replybuf[len]); /* sendTime */
len += 8;
replybuf[len] = replyRequested ? 1 : 0; /* replyRequested */
len += 1;
@@ -828,9 +727,9 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
/*
* Potentially send a status message to the master
*/
- now = localGetCurrentTimestamp();
+ now = feGetCurrentTimestamp();
if (still_sending && standby_message_timeout > 0 &&
- localTimestampDifferenceExceeds(last_status, now,
+ feTimestampDifferenceExceeds(last_status, now,
standby_message_timeout))
{
/* Time to send feedback! */
@@ -859,10 +758,10 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
int usecs;
targettime = last_status + (standby_message_timeout - 1) * ((int64) 1000);
- localTimestampDifference(now,
- targettime,
- &secs,
- &usecs);
+ feTimestampDifference(now,
+ targettime,
+ &secs,
+ &usecs);
if (secs <= 0)
timeout.tv_sec = 1; /* Always sleep at least 1 sec */
else
@@ -966,7 +865,7 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
/* If the server requested an immediate reply, send one. */
if (replyRequested && still_sending)
{
- now = localGetCurrentTimestamp();
+ now = feGetCurrentTimestamp();
if (!sendFeedback(conn, blockpos, now, false))
goto error;
last_status = now;
@@ -996,7 +895,7 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
progname, r);
goto error;
}
- blockpos = recvint64(©buf[1]);
+ blockpos = fe_recvint64(©buf[1]);
/* Extract WAL location for this block */
xlogoff = blockpos % XLOG_SEG_SIZE;
diff --git a/src/bin/pg_basebackup/receivelog.h b/src/bin/pg_basebackup/receivelog.h
index 7c983cd..f4789a5 100644
--- a/src/bin/pg_basebackup/receivelog.h
+++ b/src/bin/pg_basebackup/receivelog.h
@@ -1,3 +1,5 @@
+#include "libpq-fe.h"
+
#include "access/xlogdefs.h"
/*
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index 1dfb80f..c8d436d 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -11,17 +11,35 @@
*-------------------------------------------------------------------------
*/
-#include "postgres_fe.h"
+/*
+ * We have to use postgres.h not postgres_fe.h here, because there's
+ * backend-only stuff in the datetime include files we need. But we need a
+ * frontend-ish environment otherwise. Hence this ugly hack.
+ */
+#define FRONTEND 1
+#include "postgres.h"
+
#include "streamutil.h"
+#include "common/fe_memutils.h"
+#include "utils/datetime.h"
+
#include <stdio.h>
#include <string.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+/* for ntohl/htonl */
+#include <netinet/in.h>
+#include <arpa/inet.h>
const char *progname;
char *connection_string = NULL;
char *dbhost = NULL;
char *dbuser = NULL;
char *dbport = NULL;
+char *dbname = NULL;
int dbgetpassword = 0; /* 0=auto, -1=never, 1=always */
static char *dbpassword = NULL;
PGconn *conn = NULL;
@@ -86,10 +104,10 @@ GetConnection(void)
}
keywords[i] = "dbname";
- values[i] = "replication";
+ values[i] = dbname == NULL ? "replication" : dbname;
i++;
keywords[i] = "replication";
- values[i] = "true";
+ values[i] = dbname == NULL ? "true" : "database";
i++;
keywords[i] = "fallback_application_name";
values[i] = progname;
@@ -210,3 +228,102 @@ GetConnection(void)
return tmpconn;
}
}
+
+
+/*
+ * Frontend version of GetCurrentTimestamp(), since we are not linked with
+ * backend code. The protocol always uses integer timestamps, regardless of
+ * server setting.
+ */
+int64
+feGetCurrentTimestamp(void)
+{
+ int64 result;
+ struct timeval tp;
+
+ gettimeofday(&tp, NULL);
+
+ result = (int64) tp.tv_sec -
+ ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY);
+
+ result = (result * USECS_PER_SEC) + tp.tv_usec;
+
+ return result;
+}
+
+/*
+ * Frontend version of TimestampDifference(), since we are not linked with
+ * backend code.
+ */
+void
+feTimestampDifference(int64 start_time, int64 stop_time,
+ long *secs, int *microsecs)
+{
+ int64 diff = stop_time - start_time;
+
+ if (diff <= 0)
+ {
+ *secs = 0;
+ *microsecs = 0;
+ }
+ else
+ {
+ *secs = (long) (diff / USECS_PER_SEC);
+ *microsecs = (int) (diff % USECS_PER_SEC);
+ }
+}
+
+/*
+ * Frontend version of TimestampDifferenceExceeds(), since we are not
+ * linked with backend code.
+ */
+bool
+feTimestampDifferenceExceeds(int64 start_time,
+ int64 stop_time,
+ int msec)
+{
+ int64 diff = stop_time - start_time;
+
+ return (diff >= msec * INT64CONST(1000));
+}
+
+/*
+ * Converts an int64 to network byte order.
+ */
+void
+fe_sendint64(int64 i, char *buf)
+{
+ uint32 n32;
+
+ /* High order half first, since we're doing MSB-first */
+ n32 = (uint32) (i >> 32);
+ n32 = htonl(n32);
+ memcpy(&buf[0], &n32, 4);
+
+ /* Now the low order half */
+ n32 = (uint32) i;
+ n32 = htonl(n32);
+ memcpy(&buf[4], &n32, 4);
+}
+
+/*
+ * Converts an int64 from network byte order to native format.
+ */
+int64
+fe_recvint64(char *buf)
+{
+ int64 result;
+ uint32 h32;
+ uint32 l32;
+
+ memcpy(&h32, buf, 4);
+ memcpy(&l32, buf + 4, 4);
+ h32 = ntohl(h32);
+ l32 = ntohl(l32);
+
+ result = h32;
+ result <<= 32;
+ result |= l32;
+
+ return result;
+}
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 77d6b86..4286df8 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -5,6 +5,7 @@ extern char *connection_string;
extern char *dbhost;
extern char *dbuser;
extern char *dbport;
+extern char *dbname;
extern int dbgetpassword;
/* Connection kept global so we can disconnect easily */
@@ -17,3 +18,12 @@ extern PGconn *conn;
}
extern PGconn *GetConnection(void);
+
+extern int64 feGetCurrentTimestamp(void);
+extern void feTimestampDifference(int64 start_time, int64 stop_time,
+ long *secs, int *microsecs);
+
+extern bool feTimestampDifferenceExceeds(int64 start_time, int64 stop_time,
+ int msec);
+extern void fe_sendint64(int64 i, char *buf);
+extern int64 fe_recvint64(char *buf);
--
1.8.4.21.g992c386.dirty
0007-wal_decoding-test_logical_decoding-Add-extension-for.patchtext/x-patch; charset=us-asciiDownload
From 17bf92534dbf2710bd8424d7b1755d35e32a38d3 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen <ams@2ndQuadrant.com>
Date: Mon, 19 Aug 2013 13:24:31 +0200
Subject: [PATCH 7/8] wal_decoding: test_logical_decoding: Add extension for
easier testing of logical decoding
This extension provides three functions for manipulating replication slots:
* init_logical_replication - initiate a replication slot and wait for consistent state
* start_logical_replication - return all changes since the last call up to now, without blocking
* free_logical_replication - free the logical slot again
Those are pretty direct synonyms for the replication connection commands.
Due to questions about how to integrate logical replication tests this module
also contains the current tests of logical replication itself.
Author: Abhijit Menon-Sen
---
contrib/Makefile | 1 +
contrib/test_logical_decoding/Makefile | 33 ++
contrib/test_logical_decoding/expected/ddl.out | 625 +++++++++++++++++++++
contrib/test_logical_decoding/expected/rewrite.out | 70 +++
contrib/test_logical_decoding/logical.conf | 2 +
contrib/test_logical_decoding/sql/ddl.sql | 316 +++++++++++
contrib/test_logical_decoding/sql/rewrite.sql | 29 +
.../test_logical_decoding--1.0.sql | 6 +
.../test_logical_decoding/test_logical_decoding.c | 238 ++++++++
.../test_logical_decoding.control | 5 +
10 files changed, 1325 insertions(+)
create mode 100644 contrib/test_logical_decoding/Makefile
create mode 100644 contrib/test_logical_decoding/expected/ddl.out
create mode 100644 contrib/test_logical_decoding/expected/rewrite.out
create mode 100644 contrib/test_logical_decoding/logical.conf
create mode 100644 contrib/test_logical_decoding/sql/ddl.sql
create mode 100644 contrib/test_logical_decoding/sql/rewrite.sql
create mode 100644 contrib/test_logical_decoding/test_logical_decoding--1.0.sql
create mode 100644 contrib/test_logical_decoding/test_logical_decoding.c
create mode 100644 contrib/test_logical_decoding/test_logical_decoding.control
diff --git a/contrib/Makefile b/contrib/Makefile
index 6d2fe32..41cb892 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -51,6 +51,7 @@ SUBDIRS = \
tcn \
test_parser \
test_decoding \
+ test_logical_decoding \
tsearch2 \
unaccent \
vacuumlo \
diff --git a/contrib/test_logical_decoding/Makefile b/contrib/test_logical_decoding/Makefile
new file mode 100644
index 0000000..f1990d3
--- /dev/null
+++ b/contrib/test_logical_decoding/Makefile
@@ -0,0 +1,33 @@
+MODULE_big = test_logical_decoding
+OBJS = test_logical_decoding.o
+
+EXTENSION = test_logical_decoding
+DATA = test_logical_decoding--1.0.sql
+
+# Note: because we don't tell the Makefile there are any regression tests,
+# we have to clean those result files explicitly
+EXTRA_CLEAN = -r $(pg_regress_clean_files)
+
+subdir = contrib/test_logical_decoding
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+
+# Disabled because these tests require "wal_level=logical", which
+# typical installcheck users do not have (e.g. buildfarm clients).
+installcheck:;
+
+submake-regress:
+ $(MAKE) -C $(top_builddir)/src/test/regress
+
+submake-test_decoding:
+ $(MAKE) -C $(top_builddir)/contrib/test_decoding
+
+check: all | submake-regress submake-test_decoding
+ $(pg_regress_check) --temp-config $(top_srcdir)/contrib/test_logical_decoding/logical.conf \
+ --temp-install=./tmp_check \
+ --extra-install=contrib/test_decoding \
+ --extra-install=contrib/test_logical_decoding \
+ ddl rewrite
+
+PHONY: submake-test_decoding submake-regress
diff --git a/contrib/test_logical_decoding/expected/ddl.out b/contrib/test_logical_decoding/expected/ddl.out
new file mode 100644
index 0000000..c161a43
--- /dev/null
+++ b/contrib/test_logical_decoding/expected/ddl.out
@@ -0,0 +1,625 @@
+CREATE EXTENSION test_logical_decoding;
+-- predictability
+SET synchronous_commit = on;
+-- faster startup
+CHECKPOINT;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+-- fail because of an already existing slot
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ERROR: There already is a logical slot named "regression_slot"
+-- succeed once
+SELECT stop_logical_replication('regression_slot');
+ stop_logical_replication
+--------------------------
+ 0
+(1 row)
+
+-- fail
+SELECT stop_logical_replication('regression_slot');
+ERROR: couldn't find logical slot "regression_slot"
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+/* check whether status function reports us, only reproduceable columns */
+SELECT slot_name, plugin, active,
+ xmin::xid IS NOT NULL,
+ pg_xlog_location_diff(restart_decoding_lsn, '0/01000000') > 0
+FROM pg_stat_logical_decoding;
+ slot_name | plugin | active | ?column? | ?column?
+-----------------+---------------+--------+----------+----------
+ regression_slot | test_decoding | f | t | t
+(1 row)
+
+/*
+ * Check that changes are handled correctly when interleaved with ddl
+ */
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+COMMIT;
+ALTER TABLE replication_example ADD COLUMN bar int;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+COMMIT;
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+-- collect all changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+---------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:1 somedata[int4]:1 text[varchar]:1
+ table "replication_example": INSERT: id[int4]:2 somedata[int4]:1 text[varchar]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:3 somedata[int4]:2 text[varchar]:1 bar[int4]:4
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:4 somedata[int4]:2 text[varchar]:2 bar[int4]:4
+ table "replication_example": INSERT: id[int4]:5 somedata[int4]:2 text[varchar]:3 bar[int4]:4
+ table "replication_example": INSERT: id[int4]:6 somedata[int4]:2 text[varchar]:4 bar[int4]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:7 somedata[int4]:3 text[varchar]:1
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:8 somedata[int4]:3 text[varchar]:2
+ table "replication_example": INSERT: id[int4]:9 somedata[int4]:3 text[varchar]:3
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:10 somedata[int4]:4 somenum[varchar]:1
+ COMMIT
+(30 rows)
+
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
+-- throw away changes, they contain oids
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ count
+-------
+ 12
+(1 row)
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+BEGIN;
+INSERT INTO replication_example(somedata, somenum) VALUES (6, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod1 int;
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 2, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod2 int;
+INSERT INTO replication_example(somedata, somenum, zaphod2) VALUES (6, 3, 1);
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 4, 2);
+COMMIT;
+/*
+ * check whether the correct indexes are chosen for deletions
+ */
+CREATE TABLE tr_unique(id2 serial unique NOT NULL, data int);
+INSERT INTO tr_unique(data) VALUES(10);
+--show deletion with unique index
+DELETE FROM tr_unique;
+ALTER TABLE tr_unique RENAME TO tr_pkey;
+-- show changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "replication_example": INSERT: id[int4]:11 somedata[int4]:5 somenum[int4]:1
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:12 somedata[int4]:6 somenum[int4]:1
+ table "replication_example": INSERT: id[int4]:13 somedata[int4]:6 somenum[int4]:2 zaphod1[int4]:1
+ table "replication_example": INSERT: id[int4]:14 somedata[int4]:6 somenum[int4]:3 zaphod1[int4]:(null) zaphod2[int4]:1
+ table "replication_example": INSERT: id[int4]:15 somedata[int4]:6 somenum[int4]:4 zaphod1[int4]:2 zaphod2[int4]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "tr_unique": INSERT: id2[int4]:1 data[int4]:10
+ COMMIT
+ BEGIN
+ table "tr_unique": DELETE: id2[int4]:1
+ COMMIT
+ BEGIN
+ COMMIT
+(19 rows)
+
+-- hide changes bc of oid visible in full table rewrites
+ALTER TABLE tr_pkey ADD COLUMN id serial primary key;
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ count
+-------
+ 2
+(1 row)
+
+INSERT INTO tr_pkey(data) VALUES(1);
+--show deletion with primary key
+DELETE FROM tr_pkey;
+/* display results */
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+--------------------------------------------------------------
+ BEGIN
+ table "tr_pkey": INSERT: id2[int4]:2 data[int4]:1 id[int4]:1
+ COMMIT
+ BEGIN
+ table "tr_pkey": DELETE: id[int4]:1
+ COMMIT
+(6 rows)
+
+/*
+ * check that disk spooling works
+ */
+BEGIN;
+CREATE TABLE tr_etoomuch (id serial primary key, data int);
+INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+DELETE FROM tr_etoomuch WHERE id < 5000;
+UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+COMMIT;
+/* display results, but hide most of the output */
+SELECT count(*), min(data), max(data)
+FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1')
+GROUP BY substring(data, 1, 24)
+ORDER BY 1;
+ count | min | max
+-------+---------------------------------------------------------------+-------------------------------------------------------------
+ 1 | COMMIT | COMMIT
+ 1 | BEGIN | BEGIN
+ 4999 | table "tr_etoomuch": DELETE: id[int4]:1 | table "tr_etoomuch": DELETE: id[int4]:999
+ 5234 | table "tr_etoomuch": UPDATE: id[int4]:10000 data[int4]:-10000 | table "tr_etoomuch": UPDATE: id[int4]:9999 data[int4]:-9999
+ 10234 | table "tr_etoomuch": INSERT: id[int4]:10000 data[int4]:10000 | table "tr_etoomuch": INSERT: id[int4]:9 data[int4]:9
+(5 rows)
+
+/*
+ * check whether we subtransactions correctly in relation with each other
+ */
+CREATE TABLE tr_sub (id serial primary key, path text);
+-- toplevel, subtxn, toplevel, subtxn, subtxn
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('1-top-#1');
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#2');
+RELEASE SAVEPOINT a;
+SAVEPOINT b;
+SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#2');
+RELEASE SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-#1');
+RELEASE SAVEPOINT b;
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:1 path[text]:1-top-#1
+ table "tr_sub": INSERT: id[int4]:2 path[text]:1-top-1-#1
+ table "tr_sub": INSERT: id[int4]:3 path[text]:1-top-1-#2
+ table "tr_sub": INSERT: id[int4]:4 path[text]:1-top-2-1-#1
+ table "tr_sub": INSERT: id[int4]:5 path[text]:1-top-2-1-#2
+ table "tr_sub": INSERT: id[int4]:6 path[text]:1-top-2-#1
+ COMMIT
+(10 rows)
+
+-- check that we handle xlog assignments correctly
+BEGIN;
+-- nest 80 subtxns
+SAVEPOINT subtop;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+-- assign xid by inserting
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#1');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#2');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#3');
+RELEASE SAVEPOINT subtop;
+INSERT INTO tr_sub(path) VALUES ('2-top-#1');
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+--------------------------------------------------------------
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:7 path[text]:2-top-1...--#1
+ table "tr_sub": INSERT: id[int4]:8 path[text]:2-top-1...--#2
+ table "tr_sub": INSERT: id[int4]:9 path[text]:2-top-1...--#3
+ table "tr_sub": INSERT: id[int4]:10 path[text]:2-top-#1
+ COMMIT
+(6 rows)
+
+-- make sure rollbacked subtransactions aren't decoded
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-#1');
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-1-#1');
+SAVEPOINT b;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-2-#1');
+ROLLBACK TO SAVEPOINT b;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-#2');
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+-------------------------------------------------------------
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:11 path[text]:3-top-2-#1
+ table "tr_sub": INSERT: id[int4]:12 path[text]:3-top-2-1-#1
+ table "tr_sub": INSERT: id[int4]:14 path[text]:3-top-2-#2
+ COMMIT
+(5 rows)
+
+-- test whether a known, but not yet logged toplevel xact, followed by a
+-- subxact commit is handled correctly
+BEGIN;
+SELECT txid_current() != 0; -- so no fixed xid apears in the outfile
+ ?column?
+----------
+ t
+(1 row)
+
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('4-top-1-#1');
+RELEASE SAVEPOINT a;
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------
+(0 rows)
+
+/*
+ * Check whether treating a table as a catalog table works somewhat
+ */
+CREATE TABLE replication_metadata (
+ id serial primary key,
+ relation name NOT NULL,
+ options text[]
+)
+WITH (treat_as_catalog_table = true)
+;
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=true
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('foo', ARRAY['a', 'b']);
+ALTER TABLE replication_metadata RESET (treat_as_catalog_table);
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('bar', ARRAY['a', 'b']);
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=true
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('blub', NULL);
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = false);
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=false
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('zaphod', NULL);
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+----------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:1 relation[name]:foo options[_text]:{a,b}
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:2 relation[name]:bar options[_text]:{a,b}
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:3 relation[name]:blub options[_text]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:4 relation[name]:zaphod options[_text]:(null)
+ COMMIT
+(20 rows)
+
+/*
+ * check whether we handle updates/deletes correct with & without a pkey
+ */
+/* we should handle the case without a key at all more gracefully */
+CREATE TABLE table_without_key(id serial, data int);
+INSERT INTO table_without_key(data) VALUES(1),(2);
+DELETE FROM table_without_key WHERE data = 1;
+UPDATE table_without_key SET data = 3 WHERE data = 2;
+UPDATE table_without_key SET id = -id;
+UPDATE table_without_key SET id = -id;
+DELETE FROM table_without_key WHERE data = 3;
+CREATE TABLE table_with_pkey(id serial primary key, data int);
+INSERT INTO table_with_pkey(data) VALUES(1), (2);
+DELETE FROM table_with_pkey WHERE data = 1;
+UPDATE table_with_pkey SET data = 3 WHERE data = 2;
+UPDATE table_with_pkey SET id = -id;
+UPDATE table_with_pkey SET id = -id;
+DELETE FROM table_with_pkey WHERE data = 3;
+CREATE TABLE table_with_unique(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id DROP NOT NULL;
+INSERT INTO table_with_unique(data) VALUES(1), (2);
+DELETE FROM table_with_unique WHERE data = 1;
+UPDATE table_with_unique SET data = 3 WHERE data = 2;
+UPDATE table_with_unique SET id = -id;
+UPDATE table_with_unique SET id = -id;
+DELETE FROM table_with_unique WHERE data = 3;
+CREATE TABLE table_with_unique_not_null(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id SET NOT NULL; --already set
+INSERT INTO table_with_unique_not_null(data) VALUES(1), (2);
+DELETE FROM table_with_unique_not_null WHERE data = 1;
+UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
+UPDATE table_with_unique_not_null SET id = -id;
+UPDATE table_with_unique_not_null SET id = -id;
+DELETE FROM table_with_unique_not_null WHERE data = 3;
+CREATE TABLE table_with_oid(id serial, data int) WITH oids;
+CREATE UNIQUE INDEX table_with_oid_oid ON table_with_oid(oid);
+INSERT INTO table_with_oid(data) VALUES(1), (2);
+DELETE FROM table_with_oid WHERE data = 1;
+UPDATE table_with_oid SET data = 3 WHERE data = 2;
+DELETE FROM table_with_oid WHERE data = 3;
+UPDATE table_with_oid SET id = -id;
+UPDATE table_with_oid SET id = -id;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_without_key": INSERT: id[int4]:1 data[int4]:1
+ table "table_without_key": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_without_key": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_pkey": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_pkey": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_pkey": DELETE: id[int4]:1
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: old-pkey: id[int4]:2 new-tuple: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: old-pkey: id[int4]:-2 new-tuple: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": DELETE: id[int4]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_unique": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_unique": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_unique": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_unique_not_null": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": DELETE: id[int4]:1
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:2 new-tuple: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:-2 new-tuple: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": DELETE: id[int4]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_oid": INSERT: oid[oid]:16484 id[int4]:1 data[int4]:1
+ table "table_with_oid": INSERT: oid[oid]:16485 id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_oid": DELETE: oid[oid]:16484
+ COMMIT
+ BEGIN
+ table "table_with_oid": UPDATE: oid[oid]:16485 id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_oid": DELETE: oid[oid]:16485
+ COMMIT
+(105 rows)
+
+-- check toast support
+SELECT setseed(0);
+ setseed
+---------
+
+(1 row)
+
+CREATE TABLE toasttable(
+ id serial primary key,
+ toasted_col1 text,
+ rand1 float8 DEFAULT random(),
+ toasted_col2 text,
+ rand2 float8 DEFAULT random()
+ );
+-- uncompressed external toast data
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+-- compressed external toast data
+INSERT INTO toasttable(toasted_col2) SELECT repeat(string_agg(to_char(g.i, 'FM0000'), ''), 50) FROM generate_series(1, 500) g(i);
+-- update of existing column
+UPDATE toasttable
+ SET toasted_col1 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "toasttable": INSERT: id[int4]:1 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.840187716763467 toasted_col2[text]:(null) rand2[float8]:0.394382926635444
+ COMMIT
+ BEGIN
+ table "toasttable": INSERT: id[int4]:2 toasted_col1[text]:(null) rand1[float8]:0.783099223393947 toasted_col2[text]:0001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500 rand2[float8]:0.798440033104271
+ COMMIT
+ BEGIN
+ table "toasttable": UPDATE: id[int4]:1 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.840187716763467 toasted_col2[text]:(null) rand2[float8]:0.394382926635444
+ COMMIT
+(11 rows)
+
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+-- update of second column, first column unchanged
+UPDATE toasttable
+ SET toasted_col2 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+-- make sure we decode correctly even if the toast table is gone
+DROP TABLE toasttable;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "toasttable": INSERT: id[int4]:3 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.911647357512265 toasted_col2[text]:(null) rand2[float8]:0.197551369201392
+ COMMIT
+ BEGIN
+ table "toasttable": UPDATE: id[int4]:1 toasted_col1[text]:(unchanged-toast-datum) rand1[float8]:0.840187716763467 toasted_col2[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand2[float8]:0.394382926635444
+ COMMIT
+ BEGIN
+ COMMIT
+(8 rows)
+
+-- done, free logical replication slot
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------
+(0 rows)
+
+SELECT stop_logical_replication('regression_slot');
+ stop_logical_replication
+--------------------------
+ 0
+(1 row)
+
+/* check whether we aren't visible anymore now */
+SELECT * FROM pg_stat_logical_decoding;
+ slot_name | plugin | database | active | xmin | restart_decoding_lsn
+-----------+--------+----------+--------+------+----------------------
+(0 rows)
+
diff --git a/contrib/test_logical_decoding/expected/rewrite.out b/contrib/test_logical_decoding/expected/rewrite.out
new file mode 100644
index 0000000..392e465
--- /dev/null
+++ b/contrib/test_logical_decoding/expected/rewrite.out
@@ -0,0 +1,70 @@
+CREATE EXTENSION test_logical_decoding;
+ERROR: extension "test_logical_decoding" already exists
+-- predictability
+SET synchronous_commit = on;
+DROP TABLE IF EXISTS replication_example;
+-- faster startup
+CHECKPOINT;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+INSERT INTO replication_example(somedata) VALUES (1);
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+---------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:1 somedata[int4]:1 text[varchar]:(null)
+ COMMIT
+(5 rows)
+
+INSERT INTO replication_example(somedata) VALUES (2);
+VACUUM FULL pg_am;
+VACUUM FULL pg_amop;
+VACUUM FULL pg_proc;
+VACUUM FULL pg_opclass;
+VACUUM FULL pg_class;
+VACUUM FULL pg_type;
+VACUUM FULL pg_index;
+VACUUM FULL pg_database;
+INSERT INTO replication_example(somedata) VALUES (3);
+-- make old files go away
+CHECKPOINT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+---------------------------------------------------------------------------------------
+ BEGIN
+ table "replication_example": INSERT: id[int4]:2 somedata[int4]:2 text[varchar]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:3 somedata[int4]:3 text[varchar]:(null)
+ COMMIT
+(22 rows)
+
+SELECT stop_logical_replication('regression_slot');
+ stop_logical_replication
+--------------------------
+ 0
+(1 row)
+
diff --git a/contrib/test_logical_decoding/logical.conf b/contrib/test_logical_decoding/logical.conf
new file mode 100644
index 0000000..a7c6c86
--- /dev/null
+++ b/contrib/test_logical_decoding/logical.conf
@@ -0,0 +1,2 @@
+wal_level = logical
+max_logical_slots = 4
diff --git a/contrib/test_logical_decoding/sql/ddl.sql b/contrib/test_logical_decoding/sql/ddl.sql
new file mode 100644
index 0000000..b1eee39
--- /dev/null
+++ b/contrib/test_logical_decoding/sql/ddl.sql
@@ -0,0 +1,316 @@
+CREATE EXTENSION test_logical_decoding;
+-- predictability
+SET synchronous_commit = on;
+
+-- faster startup
+CHECKPOINT;
+
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+-- fail because of an already existing slot
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+-- succeed once
+SELECT stop_logical_replication('regression_slot');
+-- fail
+SELECT stop_logical_replication('regression_slot');
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+
+/* check whether status function reports us, only reproduceable columns */
+SELECT slot_name, plugin, active,
+ xmin::xid IS NOT NULL,
+ pg_xlog_location_diff(restart_decoding_lsn, '0/01000000') > 0
+FROM pg_stat_logical_decoding;
+
+/*
+ * Check that changes are handled correctly when interleaved with ddl
+ */
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+COMMIT;
+
+ALTER TABLE replication_example ADD COLUMN bar int;
+
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+COMMIT;
+
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+
+-- collect all changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
+-- throw away changes, they contain oids
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, somenum) VALUES (6, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod1 int;
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 2, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod2 int;
+INSERT INTO replication_example(somedata, somenum, zaphod2) VALUES (6, 3, 1);
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 4, 2);
+COMMIT;
+
+/*
+ * check whether the correct indexes are chosen for deletions
+ */
+
+CREATE TABLE tr_unique(id2 serial unique NOT NULL, data int);
+INSERT INTO tr_unique(data) VALUES(10);
+--show deletion with unique index
+DELETE FROM tr_unique;
+
+ALTER TABLE tr_unique RENAME TO tr_pkey;
+
+-- show changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- hide changes bc of oid visible in full table rewrites
+ALTER TABLE tr_pkey ADD COLUMN id serial primary key;
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO tr_pkey(data) VALUES(1);
+--show deletion with primary key
+DELETE FROM tr_pkey;
+
+/* display results */
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+/*
+ * check that disk spooling works
+ */
+BEGIN;
+CREATE TABLE tr_etoomuch (id serial primary key, data int);
+INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+DELETE FROM tr_etoomuch WHERE id < 5000;
+UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+COMMIT;
+
+/* display results, but hide most of the output */
+SELECT count(*), min(data), max(data)
+FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1')
+GROUP BY substring(data, 1, 24)
+ORDER BY 1;
+
+/*
+ * check whether we subtransactions correctly in relation with each other
+ */
+CREATE TABLE tr_sub (id serial primary key, path text);
+
+-- toplevel, subtxn, toplevel, subtxn, subtxn
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('1-top-#1');
+
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#2');
+RELEASE SAVEPOINT a;
+
+SAVEPOINT b;
+SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#2');
+RELEASE SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-#1');
+RELEASE SAVEPOINT b;
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- check that we handle xlog assignments correctly
+BEGIN;
+-- nest 80 subtxns
+SAVEPOINT subtop;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+-- assign xid by inserting
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#1');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#2');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#3');
+RELEASE SAVEPOINT subtop;
+INSERT INTO tr_sub(path) VALUES ('2-top-#1');
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- make sure rollbacked subtransactions aren't decoded
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-#1');
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-1-#1');
+SAVEPOINT b;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-2-#1');
+ROLLBACK TO SAVEPOINT b;
+INSERT INTO tr_sub(path) VALUES ('3-top-2-#2');
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- test whether a known, but not yet logged toplevel xact, followed by a
+-- subxact commit is handled correctly
+BEGIN;
+SELECT txid_current() != 0; -- so no fixed xid apears in the outfile
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('4-top-1-#1');
+RELEASE SAVEPOINT a;
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+
+/*
+ * Check whether treating a table as a catalog table works somewhat
+ */
+CREATE TABLE replication_metadata (
+ id serial primary key,
+ relation name NOT NULL,
+ options text[]
+)
+WITH (treat_as_catalog_table = true)
+;
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('foo', ARRAY['a', 'b']);
+
+ALTER TABLE replication_metadata RESET (treat_as_catalog_table);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('bar', ARRAY['a', 'b']);
+
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('blub', NULL);
+
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = false);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('zaphod', NULL);
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+/*
+ * check whether we handle updates/deletes correct with & without a pkey
+ */
+
+/* we should handle the case without a key at all more gracefully */
+CREATE TABLE table_without_key(id serial, data int);
+INSERT INTO table_without_key(data) VALUES(1),(2);
+DELETE FROM table_without_key WHERE data = 1;
+UPDATE table_without_key SET data = 3 WHERE data = 2;
+UPDATE table_without_key SET id = -id;
+UPDATE table_without_key SET id = -id;
+DELETE FROM table_without_key WHERE data = 3;
+
+CREATE TABLE table_with_pkey(id serial primary key, data int);
+INSERT INTO table_with_pkey(data) VALUES(1), (2);
+DELETE FROM table_with_pkey WHERE data = 1;
+UPDATE table_with_pkey SET data = 3 WHERE data = 2;
+UPDATE table_with_pkey SET id = -id;
+UPDATE table_with_pkey SET id = -id;
+DELETE FROM table_with_pkey WHERE data = 3;
+
+CREATE TABLE table_with_unique(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id DROP NOT NULL;
+INSERT INTO table_with_unique(data) VALUES(1), (2);
+DELETE FROM table_with_unique WHERE data = 1;
+UPDATE table_with_unique SET data = 3 WHERE data = 2;
+UPDATE table_with_unique SET id = -id;
+UPDATE table_with_unique SET id = -id;
+DELETE FROM table_with_unique WHERE data = 3;
+
+CREATE TABLE table_with_unique_not_null(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id SET NOT NULL; --already set
+INSERT INTO table_with_unique_not_null(data) VALUES(1), (2);
+DELETE FROM table_with_unique_not_null WHERE data = 1;
+UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
+UPDATE table_with_unique_not_null SET id = -id;
+UPDATE table_with_unique_not_null SET id = -id;
+DELETE FROM table_with_unique_not_null WHERE data = 3;
+
+CREATE TABLE table_with_oid(id serial, data int) WITH oids;
+CREATE UNIQUE INDEX table_with_oid_oid ON table_with_oid(oid);
+INSERT INTO table_with_oid(data) VALUES(1), (2);
+DELETE FROM table_with_oid WHERE data = 1;
+UPDATE table_with_oid SET data = 3 WHERE data = 2;
+DELETE FROM table_with_oid WHERE data = 3;
+UPDATE table_with_oid SET id = -id;
+UPDATE table_with_oid SET id = -id;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- check toast support
+SELECT setseed(0);
+CREATE TABLE toasttable(
+ id serial primary key,
+ toasted_col1 text,
+ rand1 float8 DEFAULT random(),
+ toasted_col2 text,
+ rand2 float8 DEFAULT random()
+ );
+
+-- uncompressed external toast data
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+
+-- compressed external toast data
+INSERT INTO toasttable(toasted_col2) SELECT repeat(string_agg(to_char(g.i, 'FM0000'), ''), 50) FROM generate_series(1, 500) g(i);
+
+-- update of existing column
+UPDATE toasttable
+ SET toasted_col1 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+
+-- update of second column, first column unchanged
+UPDATE toasttable
+ SET toasted_col2 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+
+-- make sure we decode correctly even if the toast table is gone
+DROP TABLE toasttable;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- done, free logical replication slot
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+SELECT stop_logical_replication('regression_slot');
+
+/* check whether we aren't visible anymore now */
+SELECT * FROM pg_stat_logical_decoding;
diff --git a/contrib/test_logical_decoding/sql/rewrite.sql b/contrib/test_logical_decoding/sql/rewrite.sql
new file mode 100644
index 0000000..2400fe3
--- /dev/null
+++ b/contrib/test_logical_decoding/sql/rewrite.sql
@@ -0,0 +1,29 @@
+CREATE EXTENSION test_logical_decoding;
+-- predictability
+SET synchronous_commit = on;
+
+DROP TABLE IF EXISTS replication_example;
+
+-- faster startup
+CHECKPOINT;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+INSERT INTO replication_example(somedata) VALUES (1);
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO replication_example(somedata) VALUES (2);
+VACUUM FULL pg_am;
+VACUUM FULL pg_amop;
+VACUUM FULL pg_proc;
+VACUUM FULL pg_opclass;
+VACUUM FULL pg_class;
+VACUUM FULL pg_type;
+VACUUM FULL pg_index;
+VACUUM FULL pg_database;
+INSERT INTO replication_example(somedata) VALUES (3);
+
+-- make old files go away
+CHECKPOINT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+SELECT stop_logical_replication('regression_slot');
diff --git a/contrib/test_logical_decoding/test_logical_decoding--1.0.sql b/contrib/test_logical_decoding/test_logical_decoding--1.0.sql
new file mode 100644
index 0000000..b6e048c
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding--1.0.sql
@@ -0,0 +1,6 @@
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_logical_decoding" to load this file. \quit
+
+CREATE FUNCTION start_logical_replication (slotname name, pos text, VARIADIC options text[] DEFAULT '{}', OUT location text, OUT xid bigint, OUT data text) RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'start_logical_replication'
+LANGUAGE C IMMUTABLE STRICT;
diff --git a/contrib/test_logical_decoding/test_logical_decoding.c b/contrib/test_logical_decoding/test_logical_decoding.c
new file mode 100644
index 0000000..26ecdfa
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding.c
@@ -0,0 +1,238 @@
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "storage/fd.h"
+#include "miscadmin.h"
+#include "funcapi.h"
+
+PG_MODULE_MAGIC;
+
+Datum start_logical_replication(PG_FUNCTION_ARGS);
+
+static Tuplestorestate *tupstore = NULL;
+static TupleDesc tupdesc;
+
+static void
+LogicalOutputPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ resetStringInfo(ctx->out);
+}
+
+static void
+LogicalOutputWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ Datum values[3];
+ bool nulls[3];
+ char buf[60];
+
+ sprintf(buf, "%X/%X", (uint32) (lsn >> 32), (uint32) lsn);
+
+ memset(nulls, 0, sizeof(nulls));
+ values[0] = CStringGetTextDatum(buf);
+ values[1] = Int64GetDatum(xid);
+ values[2] = CStringGetTextDatum(ctx->out->data);
+
+ tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+}
+
+PG_FUNCTION_INFO_V1(start_logical_replication);
+
+Datum
+start_logical_replication(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext per_query_ctx;
+ MemoryContext oldcontext;
+
+ XLogRecPtr now;
+ XLogRecPtr startptr;
+ XLogRecPtr rp;
+
+ LogicalDecodingContext *ctx;
+
+ ResourceOwner old_resowner = CurrentResourceOwner;
+ ArrayType *arr;
+ Size ndim;
+ List *options = NIL;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ arr = PG_GETARG_ARRAYTYPE_P(2);
+ ndim = ARR_NDIM(arr);
+
+
+ per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+ if (ndim > 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("start_logical_replication only accept one dimension of arguments")));
+ }
+ else if (array_contains_nulls(arr))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("start_logical_replication expects NOT NULL options")));
+ }
+ else if (ndim == 1)
+ {
+ int nelems;
+ Datum *datum_opts;
+ int i;
+
+ Assert(ARR_ELEMTYPE(arr) == TEXTOID);
+
+ deconstruct_array(arr, TEXTOID, -1, false, 'i',
+ &datum_opts, NULL, &nelems);
+
+ if (nelems % 2 != 0)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("options need to be specified pairwise")));
+ }
+
+ for (i = 0; i < nelems; i += 2)
+ {
+ char *name = VARDATA(DatumGetTextP(datum_opts[i]));
+ char *opt = VARDATA(DatumGetTextP(datum_opts[i + 1]));
+
+ options = lappend(options, makeDefElem(name, (Node *) makeString(opt)));
+ }
+ }
+
+ tupstore = tuplestore_begin_heap(true, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = tupstore;
+ rsinfo->setDesc = tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * XXX: It's impolite to ignore our argument and keep decoding until the
+ * current position.
+ */
+ now = GetFlushRecPtr();
+
+ /*
+ * We need to create a normal_snapshot_reader, but adjust it to use our
+ * page_read callback, and also make its reorder buffer use our callback
+ * wrappers that don't depend on walsender.
+ */
+
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingReAcquireSlot(NameStr(*name));
+
+ ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false,
+ MyLogicalDecodingSlot->confirmed_flush,
+ options,
+ logical_read_local_xlog_page,
+ LogicalOutputPrepareWrite,
+ LogicalOutputWrite);
+
+ startptr = MyLogicalDecodingSlot->restart_decoding;
+
+ elog(DEBUG1, "Starting logical replication from %X/%X to %X/%X",
+ (uint32) (MyLogicalDecodingSlot->restart_decoding >> 32),
+ (uint32) MyLogicalDecodingSlot->restart_decoding,
+ (uint32) (now >> 32), (uint32) now);
+
+ CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner, "logical decoding");
+
+ /* invalidate non-timetravel entries */
+ InvalidateSystemCaches();
+
+ PG_TRY();
+ {
+
+ while ((startptr != InvalidXLogRecPtr && startptr < now) ||
+ (ctx->reader->EndRecPtr && ctx->reader->EndRecPtr < now))
+ {
+ XLogRecord *record;
+ char *errm = NULL;
+
+ record = XLogReadRecord(ctx->reader, startptr, &errm);
+ if (errm)
+ elog(ERROR, "%s", errm);
+
+ startptr = InvalidXLogRecPtr;
+
+ if (record != NULL)
+ {
+ XLogRecordBuffer buf;
+
+ buf.origptr = ctx->reader->ReadRecPtr;
+ buf.endptr = ctx->reader->EndRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+
+ /*
+ * The {begin_txn,change,commit_txn}_wrapper callbacks above
+ * will store the description into our tuplestore.
+ */
+ DecodeRecordIntoReorderBuffer(ctx, &buf);
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ LogicalDecodingReleaseSlot();
+
+ /*
+ * clear timetravel entries: XXX allowed in aborted TXN?
+ */
+ InvalidateSystemCaches();
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ rp = ctx->reader->EndRecPtr;
+ if (rp >= now)
+ {
+ elog(DEBUG1, "Reached endpoint (wanted: %X/%X, got: %X/%X)",
+ (uint32) (now >> 32), (uint32) now,
+ (uint32) (rp >> 32), (uint32) rp);
+ }
+
+ tuplestore_donestoring(tupstore);
+
+ CurrentResourceOwner = old_resowner;
+
+ /*
+ * Next time, start where we left off. (Hunting things, the family
+ * business..)
+ */
+ MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+
+ LogicalDecodingReleaseSlot();
+
+ return (Datum) 0;
+}
diff --git a/contrib/test_logical_decoding/test_logical_decoding.control b/contrib/test_logical_decoding/test_logical_decoding.control
new file mode 100644
index 0000000..0dce19f
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding.control
@@ -0,0 +1,5 @@
+# test_logical_decoding extension
+comment = 'test logical decoding'
+default_version = '1.0'
+module_pathname = '$libdir/test_logical_decoding'
+relocatable = true
--
1.8.4.21.g992c386.dirty
0008-wal_decoding-design-document-v2.4-and-snapshot-build.patchtext/x-patch; charset=us-asciiDownload
>From 0c4c3957bd497a8384c69201d2ea1be5c7c67a8a Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 19 Aug 2013 13:24:31 +0200
Subject: [PATCH 8/8] wal_decoding: design document v2.4 and snapshot building
design doc v0.5
---
src/backend/replication/logical/DESIGN.txt | 593 +++++++++++++++++++++
src/backend/replication/logical/Makefile | 6 +
.../replication/logical/README.SNAPBUILD.txt | 241 +++++++++
3 files changed, 840 insertions(+)
create mode 100644 src/backend/replication/logical/DESIGN.txt
create mode 100644 src/backend/replication/logical/README.SNAPBUILD.txt
diff --git a/src/backend/replication/logical/DESIGN.txt b/src/backend/replication/logical/DESIGN.txt
new file mode 100644
index 0000000..d76fdb4
--- /dev/null
+++ b/src/backend/replication/logical/DESIGN.txt
@@ -0,0 +1,593 @@
+//-*- mode: adoc -*-
+= High Level Design for Logical Replication in Postgres =
+:copyright: PostgreSQL Global Development Group 2012
+:author: Andres Freund, 2ndQuadrant Ltd.
+:email: andres@2ndQuadrant.com
+
+== Introduction ==
+
+This document aims to first explain why we think postgres needs another
+replication solution and what that solution needs to offer in our opinion. Then
+it sketches out our proposed implementation.
+
+In contrast to an earlier version of the design document which talked about the
+implementation of four parts of replication solutions:
+
+1. Source data generation
+1. Transportation of that data
+1. Applying the changes
+1. Conflict resolution
+
+this version only plans to talk about the first part in detail as it is an
+independent and complex part usable for a wide range of use cases which we want
+to get included into postgres in a first step.
+
+=== Previous discussions ===
+
+There are two rather large threads discussing several parts of the initial
+prototype and proposed architecture:
+
+- http://archives.postgresql.org/message-id/201206131327.24092.andres@2ndquadrant.com[Logical Replication/BDR prototype and architecture]
+- http://archives.postgresql.org/message-id/201206211341.25322.andres@2ndquadrant.com[Catalog/Metadata consistency during changeset extraction from WAL]
+
+Those discussions lead to some fundamental design changes which are presented in this document.
+
+=== Changes from v1 ===
+* At least a partial decoding step required/possible on the source system
+* No intermediate ("schema only") instances required
+* DDL handling, without event triggers
+* A very simple text conversion is provided for debugging/demo purposes
+* Smaller scope
+
+== Existing approaches to replication in Postgres ==
+
+If any currently used approach to replication can be made to support every
+use-case/feature we need, it likely is not a good idea to implement something
+different. Currently three basic approaches are in use in/around postgres
+today:
+
+. Trigger based
+. Recovery based/Physical footnote:[Often referred to by terms like Hot Standby, Streaming Replication, Point In Time Recovery]
+. Statement based
+
+Statement based replication has obvious and known problems with consistency and
+correctness making it hard to use in the general case so we will not further
+discuss it here.
+
+Lets have a look at the advantages/disadvantages of the other approaches:
+
+=== Trigger based Replication ===
+
+This variant has a multitude of significant advantages:
+
+* implementable in userspace
+* easy to customize
+* just about everything can be made configurable
+* cross version support
+* cross architecture support
+* can feed into systems other than postgres
+* no overhead from writes to non-replicated tables
+* writable standbys
+* mature solutions
+* multimaster implementations possible & existing
+
+But also a number of disadvantages, some of them very hard to solve:
+
+* essentially duplicates the amount of writes (or even more!)
+* synchronous replication hard or impossible to implement
+* noticeable CPU overhead
+** trigger functions
+** text conversion of data
+* complex parts implemented in several solutions
+* not in core
+
+Especially the higher amount of writes might seem easy to solve at a first
+glance but a solution not using a normal transactional table for its log/queue
+has to solve a lot of problems. The major ones are:
+
+* crash safety, restartability & spilling to disk
+* consistency with the commit status of transactions
+* only a minimal amount of synchronous work should be done inside individual
+transactions
+
+In our opinion those problems are restricting progress/wider distribution of
+these class of solutions. It is our aim though that existing solutions in this
+space - most prominently slony and londiste - can benefit from the work we are
+doing & planning to do by incorporating at least parts of the changeset
+generation infrastructure.
+
+=== Recovery based Replication ===
+
+This type of solution, being built into postgres and of increasing popularity,
+has and will have its use cases and we do not aim to replace but to complement
+it. We plan to reuse some of the infrastructure and to make it possible to mix
+both modes of replication
+
+Advantages:
+
+* builtin
+* built on existing infrastructure from crash recovery
+* efficient
+** minimal CPU, memory overhead on primary
+** low amount of additional writes
+* synchronous operation mode
+* low maintenance once setup
+* handles DDL
+
+Disadvantages:
+
+* standbys are read only
+* no cross version support
+* no cross architecture support
+* no replication into foreign systems
+* hard to customize
+* not configurable on the level of database, tables, ...
+
+== Goals ==
+
+As seen in the previous short survey of the two major interesting classes of
+replication solution there is a significant gap between those. Our aim is to
+make it smaller.
+
+We aim for:
+
+* in core
+* low CPU overhead
+* low storage overhead
+* asynchronous, optionally synchronous operation modes
+* robust
+* modular
+* basis for other technologies (sharding, replication into other DBMS's, ...)
+* basis for at least one multi-master solution
+* make the implementation as unintrusive as possible, but not more
+
+== New Architecture ==
+
+=== Overview ===
+
+Our proposal is to reuse the basic principle of WAL based replication, namely
+reusing data that already needs to be written for another purpose, and extend
+it to allow most, but not all, the flexibility of trigger based solutions.
+We want to do that by decoding the WAL back into a non-physical form.
+
+To get the flexibility we and others want we propose that the last step of
+changeset generation, transforming it into a format that can be used by the
+replication consumer, is done in an extensible manner. In the schema the part
+that does that is described as 'Output Plugin'. To keep the amount of
+duplication between different plugins as low as possible the plugin should only
+do a a very limited amount of work.
+
+The following paragraphs contain reasoning for the individual design decisions
+made and their highlevel design.
+
+=== Schematics ===
+
+The basic proposed architecture for changeset extraction is presented in the
+following diagram. The first part should look familiar to anyone knowing
+postgres' architecture. The second is where most of the new magic happens.
+
+[[basic-schema]]
+.Architecture Schema
+["ditaa"]
+------------------------------------------------------------------------------
+ Traditional Stuff
+
+ +---------+---------+---------+---------+----+
+ | Backend | Backend | Backend | Autovac | ...|
+ +----+----+---+-----+----+----+----+----+-+--+
+ | | | | |
+ +------+ | +--------+ | |
+ +-+ | | | +----------------+ |
+ | | | | | |
+ | v v v v |
+ | +------------+ |
+ | | WAL writer |<------------------+
+ | +------------+
+ | | | | | |
+ v v v v v v +-------------------+
++--------+ +---------+ +->| Startup/Recovery |
+|{s} | |{s} | | +-------------------+
+|Catalog | | WAL |---+->| SR/Hot Standby |
+| | | | | +-------------------+
++--------+ +---------+ +->| Point in Time |
+ ^ | +-------------------+
+ ---|----------|--------------------------------
+ | New Stuff
++---+ |
+| v Running separately
+| +----------------+ +=-------------------------+
+| | Walsender | | | |
+| | v | | +-------------------+ |
+| +-------------+ | | +->| Logical Rep. | |
+| | WAL | | | | +-------------------+ |
++-| decoding | | | +->| Multimaster | |
+| +------+------/ | | | +-------------------+ |
+| | | | | +->| Slony | |
+| | v | | | +-------------------+ |
+| +-------------+ | | +->| Auditing | |
+| | TX | | | | +-------------------+ |
++-| reassembly | | | +->| Mysql/... | |
+| +-------------/ | | | +-------------------+ |
+| | | | | +->| Custom Solutions | |
+| | v | | | +-------------------+ |
+| +-------------+ | | +->| Debugging | |
+| | Output | | | | +-------------------+ |
++-| Plugin |--|--|-+->| Data Recovery | |
+ +-------------/ | | +-------------------+ |
+ | | | |
+ +----------------+ +--------------------------|
+------------------------------------------------------------------------------
+
+=== WAL enrichement ===
+
+To be able to decode individual WAL records at the very minimal they need to
+contain enough information to reconstruct what has happened to which row. The
+action is already encoded in the WAL records header in most of the cases.
+
+As an example of missing data, the WAL record emitted when a row gets deleted,
+only contains its physical location. At the very least we need a way to
+identify the deleted row: in a relational database the minimal amount of data
+that does that should be the primary key footnote:[Yes, there are use cases
+where the whole row is needed, or where no primary key can be found].
+
+We propose that for now it is enough to extend the relevant WAL record with
+additional data when the newly introduced 'WAL_level = logical' is set.
+
+Previously it has been argued on the hackers mailing list that a generic 'WAL
+record annotation' mechanism might be a good thing. That mechanism would allow
+to attach arbitrary data to individual wal records making it easier to extend
+postgres to support something like what we propose.. While we don't oppose that
+idea we think it is largely orthogonal issue to this proposal as a whole
+because the format of a WAL records is version dependent by nature and the
+necessary changes for our easy way are small, so not much effort is lost.
+
+A full annotation capability is a complex endeavour on its own as the parts of
+the code generating the relevant WAL records has somewhat complex requirements
+and cannot easily be configured from the outside.
+
+Currently this is contained in the http://archives.postgresql.org/message-id/1347669575-14371-6-git-send-email-andres@2ndquadrant.com[Log enough data into the wal to reconstruct logical changes from it] patch.
+
+=== WAL parsing & decoding ===
+
+The main complexity when reading the WAL as stored on disk is that the format
+is somewhat complex and the existing parser is too deeply integrated in the
+recovery system to be directly reusable. Once a reusable parser exists decoding
+the binary data into individual WAL records is a small problem.
+
+Currently two competing proposals for this module exist, each having its own
+merits. In the grand scheme of this proposal it is irrelevant which one gets
+picked as long as the functionality gets integrated.
+
+The mailing list post
+http:http://archives.postgresql.org/message-id/1347669575-14371-3-git-send-email-andres@2ndquadrant.com[Add
+support for a generic wal reading facility dubbed XLogReader] contains both
+competing patches and discussion around which one is preferable.
+
+Once the WAL has been decoded into individual records two major issues exist:
+
+1. records from different transactions and even individual user level actions
+are intermingled
+1. the data attached to records cannot be interpreted on its own, it is only
+meaningful with a lot of required information (including table, columns, types
+and more)
+
+The solution to the first issue is described in the next section: <<tx-reassembly>>
+
+The second problem is probably the reason why no mature solution to reuse the
+WAL for logical changeset generation exists today. See the <<snapbuilder>>
+paragraph for some details.
+
+As decoding, Transaction reassembly and Snapshot building are interdependent
+they currently are implemented in the same patch:
+http://archives.postgresql.org/message-id/1347669575-14371-8-git-send-email-andres@2ndquadrant.com[Introduce
+wal decoding via catalog timetravel]
+
+That patch also includes a small demonstration that the approach works in the
+presence of DDL:
+
+[[example-of-decoding]]
+.Decoding example
+[NOTE]
+---------------------------
+/* just so we keep a sensible xmin horizon */
+ROLLBACK PREPARED 'f';
+BEGIN;
+CREATE TABLE keepalive();
+PREPARE TRANSACTION 'f';
+
+DROP TABLE IF EXISTS replication_example;
+
+SELECT pg_current_xlog_insert_location();
+CHECKPOINT;
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text
+varchar(120));
+begin;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+commit;
+
+
+ALTER TABLE replication_example ADD COLUMN bar int;
+
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+
+/* slightly more complex schema change, still no table rewrite */
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+commit;
+
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+
+/* complex schema change, changing types of existing column, rewriting the table */
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING
+(somenum::int4);
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+
+SELECT pg_current_xlog_insert_location();
+
+/* now decode what has been written to the WAL during that time */
+
+SELECT decode_xlog('0/1893D78', '0/18BE398');
+
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:1 somedata[int4]:1 text[varchar]:1
+WARNING: tuple is: id[int4]:2 somedata[int4]:1 text[varchar]:2
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:3 somedata[int4]:2 text[varchar]:1 bar[int4]:4
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:4 somedata[int4]:2 text[varchar]:2 bar[int4]:4
+WARNING: tuple is: id[int4]:5 somedata[int4]:2 text[varchar]:3 bar[int4]:4
+WARNING: tuple is: id[int4]:6 somedata[int4]:2 text[varchar]:4 bar[int4]:
+(null)
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:7 somedata[int4]:3 text[varchar]:1
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:8 somedata[int4]:3 text[varchar]:2
+WARNING: tuple is: id[int4]:9 somedata[int4]:3 text[varchar]:3
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:10 somedata[int4]:4 somenum[varchar]:1
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:11 somedata[int4]:5 somenum[int4]:1
+WARNING: COMMIT
+
+---------------------------
+
+[[tx-reassembly]]
+=== TX reassembly ===
+
+In order to make usage of the decoded stream easy we want to present the user
+level code with a correctly ordered image of individual transactions at once
+because otherwise every user will have to reassemble transactions themselves.
+
+Transaction reassembly needs to solve several problems:
+
+1. changes inside a transaction can be interspersed with other transactions
+1. a top level transaction only knows which subtransactions belong to it when
+it reads the commit record
+1. individual user level actions can be smeared over multiple records (TOAST)
+
+Our proposed module solves 1) and 2) by building individual streams of records
+split by xid. While not fully implemented yet we plan to spill those individual
+xid streams to disk after a certain amount of memory is used. This can be
+implemented without any change in the external interface.
+
+As all the individual streams are already sorted by LSN by definition - we
+build them from the wal in a FIFO manner, and the position in the WAL is the
+definition of the LSN footnote:[the LSN is just the byte position int the WAL
+stream] - the individual changes can be merged efficiently by a k-way merge
+(without sorting!) by keeping the individual streams in a binary heap.
+
+To manipulate the binary heap a generic implementation is proposed. Several
+independent implementations of binary heaps already exist in the postgres code,
+but none of them is generic. The patch is available at
+http://archives.postgresql.org/message-id/1347669575-14371-2-git-send-email-andres@2ndquadrant.com[Add
+minimal binary heap implementation].
+
+[NOTE]
+============
+The reassembly component was previously coined ApplyCache because it was
+proposed to run on replication consumers just before applying changes. This is
+not the case anymore.
+
+It is still called that way in the source of the patch recently submitted.
+============
+
+[[snapbuilder]]
+=== Snapshot building ===
+
+To decode the contents of wal records describing data changes we need to decode
+and transform their contents. A single tuple is stored in a data structure
+called HeapTuple. As stored on disk that structure doesn't contain any
+information about the format of its contents.
+
+The basic problem is twofold:
+
+1. The wal records only contain the relfilenode not the relation oid of a table
+11. The relfilenode changes when an action performing a full table rewrite is performed
+1. To interpret a HeapTuple correctly the exact schema definition from back
+when the wal record was inserted into the wal stream needs to be available
+
+We chose to implement timetraveling access to the system catalog using
+postgres' MVCC nature & implementation because of the following advantages:
+
+* low amount of additional data in wal
+* genericity
+* similarity of implementation to Hot Standby, quite a bit of the infrastructure is reusable
+* all kinds of DDL can be handled in reliable manner
+* extensibility to user defined catalog like tables
+
+Timetravel access to the catalog means that we are able to look at the catalog
+just as it looked when changes were generated. That allows us to get the
+correct information about the contents of the aforementioned HeapTuple's so we
+can decode them reliably.
+
+Other solutions we thought about that fell through:
+* catalog only proxy instances that apply schema changes exactly to the point
+ were decoding using ``old fashioned'' wal replay
+* do the decoding on a 2nd machine, replicating all DDL exactly, rely on the catalog there
+* do not allow DDL at all
+* always add enough data into the WAL to allow decoding
+* build a fully versioned catalog
+
+The email thread available under
+http://archives.postgresql.org/message-id/201206211341.25322.andres@2ndquadrant.com[Catalog/Metadata
+consistency during changeset extraction from WAL] contains some details,
+advantages and disadvantages about the different possible implementations.
+
+How we build snapshots is somewhat intricate and complicated and seems to be
+out of scope for this document. We will provide a second document discussing
+the implementation in detail. Let's just assume it is possible from here on.
+
+[NOTE]
+Some details are already available in comments inside 'src/backend/replication/logical/snapbuild.{c,h}'.
+
+=== Output Plugin ===
+
+As already mentioned previously our aim is to make the implementation of output
+plugins as simple and non-redundant as possible as we expect several different
+ones with different use cases to emerge quickly. See <<basic-schema>> for a
+list of possible output plugins that we think might emerge.
+
+Although we for now only plan to tackle logical replication and based on that a
+multi-master implementation in the near future we definitely aim to provide all
+use-cases with something easily useable!
+
+To decode and translate local transaction an output plugin needs to be able to
+transform transactions as a whole so it can apply them as a meaningful
+transaction at the other side.
+
+What we do to provide that is, that very time we find a transaction commit and
+thus have completed reassembling the transaction we start to provide the
+individual changes to the output plugin. It currently only has to fill out 3
+callbacks:
+[options="header"]
+|=====================================================================================================================================
+|Callback |Passed Parameters |Called per TX | Use
+|begin |xid |once |Begin of a reassembled transaction
+|change |xid, subxid, change, mvcc snapshot |every change |Gets passed every change so it can transform it to the target format
+|commit |xid |once |End of a reassembled transaction
+|=====================================================================================================================================
+
+During each of those callback an appropriate timetraveling SnapshotNow snapshot
+is setup so the callbacks can perform all read-only catalog accesses they need,
+including using the sys/rel/catcache. For obvious reasons only read access is
+allowed.
+
+The snapshot guarantees that the result of lookups are be the same as they
+were/would have been when the change was originally created.
+
+Additionally they get passed a MVCC snapshot, to e.g. run sql queries on
+catalogs or similar.
+
+[IMPORTANT]
+============
+At the moment none of these snapshots can be used to access normal user
+tables. Adding additional tables to the allowed set is easy implementation
+wise, but every transaction changing such tables incurs a noticeably higher
+overhead.
+============
+
+For now transactions won't be decoded/output in parallel. There are ideas to
+improve on this, but we don't think the complexity is appropriate for the first
+release of this feature.
+
+This is an adoption barrier for databases where large amounts of data get
+loaded/written in one transaction.
+
+=== Setup of replication nodes ===
+
+When setting up a new standby/consumer of a primary some problem exist
+independent of the implementation of the consumer. The gist of the problem is
+that when making a base backup and starting to stream all changes since that
+point transactions that were running during all this cannot be included:
+
+* Transaction that have not committed before starting to dump a database are
+ invisible to the dumping process
+
+* Transactions that began before the point from which on the WAL is being
+ decoded are incomplete and cannot be replayed
+
+Our proposal for a solution to this is to detect points in the WAL stream where we can provide:
+
+. A snapshot exported similarly to pg_export_snapshot() footnote:[http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION] that can be imported with +SET TRANSACTION SNAPSHOT+ footnote:[http://www.postgresql.org/docs/devel/static/sql-set-transaction.html]
+. A stream of changes that will include the complete data of all transactions seen as running by the snapshot generated in 1)
+
+See the diagram.
+
+[[setup-schema]]
+.Control flow during setup of a new node
+["ditaa",scaling="0.7"]
+------------------------------------------------------------------------------
++----------------+
+| Walsender | | +------------+
+| v | | Consumer |
++-------------+ |<--IDENTIFY_SYSTEM-------------| |
+| WAL | | | |
+| decoding | |----....---------------------->| |
++------+------/ | | |
+| | | | |
+| v | | |
++-------------+ |<--INIT_LOGICAL $PLUGIN--------| |
+| TX | | | |
+| reassembly | |---FOUND_STARTING %X/%X------->| |
++-------------/ | | |
+| | |---FOUND_CONSISTENT %X/%X----->| |
+| v |---pg_dump snapshot----------->| |
++-------------+ |---replication slot %P-------->| |
+| Output | | | |
+| Plugin | | ^ | |
++-------------/ | | | |
+| | +-run pg_dump separately --| |
+| | | |
+| |<--STREAM_DATA-----------------| |
+| | | |
+| |---data ---------------------->| |
+| | | |
+| | | |
+| | ---- SHUTDOWN ------------- | |
+| | | |
+| | | |
+| |<--RESTART_LOGICAL $PLUGIN %P--| |
+| | | |
+| |---data----------------------->| |
+| | | |
+| | | |
++----------------+ +------------+
+
+------------------------------------------------------------------------------
+
+=== Disadvantages of the approach ===
+
+* somewhat intricate code for snapshot timetravel
+* output plugins/walsenders need to work per database as they access the catalog
+* when sending to multiple standbys some work is done multiple times
+* decoding/applying multiple transactions in parallel is somewhat hard
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 310a45c..6fae278 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -17,3 +17,9 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
OBJS = decode.o logical.o logicalfuncs.o reorderbuffer.o snapbuild.o
include $(top_srcdir)/src/backend/common.mk
+
+DESIGN.pdf: DESIGN.txt
+ a2x -v --fop -f pdf -D $(shell pwd) $<
+
+README.SNAPBUILD.pdf: README.SNAPBUILD.txt
+ a2x -v --fop -f pdf -D $(shell pwd) $<
diff --git a/src/backend/replication/logical/README.SNAPBUILD.txt b/src/backend/replication/logical/README.SNAPBUILD.txt
new file mode 100644
index 0000000..b6c7470
--- /dev/null
+++ b/src/backend/replication/logical/README.SNAPBUILD.txt
@@ -0,0 +1,241 @@
+= Snapshot Building =
+:author: Andres Freund, 2nQuadrant Ltd
+
+== Why do we need timetravel catalog access ==
+
+When doing WAL decoding (see DESIGN.txt for reasons to do so), we need to know
+how the catalog looked at the point a record was inserted into the WAL, because
+without that information we don't know much more about the record other than
+its length. It's just an arbitrary bunch of bytes without further information.
+Unfortunately, due the possibility that the table definition might change we
+cannot just access a newer version of the catalog and assume the table
+definition continues to be the same.
+
+If only the type information were required, it might be enough to annotate the
+wal records with a bit more information (table oid, table name, column name,
+column type) --- but as we want to be able to convert the output to more useful
+formats such as text, we additionally need to be able to call output functions.
+Those need a normal environment including the usual caches and normal catalog
+access to lookup operators, functions and other types.
+
+Our solution to this is to add the capability to access the catalog such as it
+was at the time the record was inserted into the WAL. The locking used during
+WAL generation guarantees the catalog is/was in a consistent state at that
+point. We call this 'time-travel catalog access'.
+
+Interesting cases include:
+
+- enums
+- composite types
+- extension types
+- non-C functions
+- relfilenode to table OID mapping
+
+Due to postgres' non-overwriting storage manager, regular modifications of a
+table's content are theoretically non-destructive. The problem is that there is
+no way to access an arbitrary point in time even if the data for it is there.
+
+This module adds the capability to do so in the very limited set of
+circumstances we need it in for WAL decoding. It does *not* provide a general
+time-travelling facility.
+
+A 'Snapshot' is the data structure used in postgres to describe which tuples
+are visible and which are not. We need to build a Snapshot which can be used to
+access the catalog the way it looked when the wal record was inserted.
+
+Restrictions:
+
+- Only works for catalog tables or tables explicitly marked as such.
+- Snapshot modifications are somewhat expensive
+- it cannot build initial visibility information for every point in time, it
+ needs a specific circumstances to start.
+
+== How are time-travel snapshots built ==
+
+'Hot Standby' added infrastructure to build snapshots from WAL during recovery in
+the 9.0 release. Most of that can be reused for our purposes.
+
+We cannot reuse all of the hot standby infrastructure because:
+
+- we are not in recovery
+- we need to look at interim states *inside* a transaction
+- we need the capability to have multiple different snapshots arround at the same time
+
+Normally the catalog is accessed using SnapshotNow which can legally be
+replaced by SnapshotMVCC that has been taken at the start of a scan. So catalog
+timetravel contains infrastructure to make SnapshotNow catalog access use
+appropriate MVCC snapshots. They aren't generated with GetSnapshotData()
+though, but reassembled from WAL contents.
+
+We collect our data in a normal struct SnapshotData, repurposing some fields
+creatively:
+
+- +Snapshot->xip+ contains all transaction we consider committed
+- +Snapshot->subxip+ contains all transactions belonging to our transaction,
+ including the toplevel one
+- +Snapshot->active_count+ is used as a refcount
+
+The meaning of +xip+ is inverted in comparison with non-timetravel snapshots in
+the sense that members of the array are the committed transactions, not the in
+progress ones. Because usually only a tiny percentage of comitted transactions
+will have modified the catalog between xmin and xmax this allows us to keep the
+array small in the usual cases. It also makes subtransaction handling easier
+since we neither need to query pg_subtrans (which we couldn't anyway since it's
+truncated at restart) nor have problems with suboverflowed snapshots.
+
+== Building of initial snapshot ==
+
+We can start building an initial snapshot as soon as we find either an
++XLOG_RUNNING_XACTS+ or an +XLOG_CHECKPOINT_SHUTDOWN+ record because they allow us
+to know how many transactions are running.
+
+We need to know which transactions were running when we start to build a
+snapshot/start decoding as we don't have enough information about them (they
+could have done catalog modifications before we started watching). Also, we
+wouldn't have the complete contents of those transactions, because we started
+reading after they began. (The latter is also important when building
+snapshots that can be used to build a consistent initial clone.)
+
+There also is the problem that +XLOG_RUNNING_XACT+ records can be
+'suboverflowed' which means there were more running subtransactions than
+fitting into shared memory. In that case we use the same incremental building
+trick hot standby uses which is either
+
+1. wait till further +XLOG_RUNNING_XACT+ records have a running->oldestRunningXid
+after the initial xl_runnign_xacts->nextXid
+2. wait for a further +XLOG_RUNNING_XACT+ that is not overflowed or
+a +XLOG_CHECKPOINT_SHUTDOWN+
+
+When we start building a snapshot we are in the +SNAPBUILD_START+ state. As
+soon as we find any visibility information, even if incomplete, we change to
++SNAPBUILD_INITIAL_POINT+.
+
+When we have collected enough information to decode any transaction starting
+after that point in time we fall over to +SNAPBUILD_FULL_SNAPSHOT+. If those
+transactions commit before the next state is reached, we throw their complete
+contents away.
+
+As soon as all transactions that were running when we switched over to
++SNAPBUILD_FULL_SNAPSHOT+ commit, we change state to +SNAPBUILD_CONSISTENT+.
+Every transaction that commits from now on gets handed to the output plugin.
+When doing the switch to +SNAPBUILD_CONSISTENT+ we optionally export a snapshot
+which makes all transactions that committed up to this point visible. This
+exported snapshot can be used to run pg_dump; replaying all changes emitted
+by the output plugin on a database restored from such a dump will result in
+a consistent clone.
+
+["ditaa",scaling="0.8"]
+---------------
+
+ +-------------------------+
+ +----|SNAPBUILD_START |-------------+
+ | +-------------------------+ |
+ | | |
+ | | |
+ | running_xacts with running xacts |
+ | | |
+ | | |
+ | v |
+ | +-------------------------+ v
+ | |SNAPBUILD_FULL_SNAPSHOT |------------>|
+ | +-------------------------+ |
+XLOG_RUNNING_XACTS | saved snapshot
+ with zero xacts | at running_xacts's lsn
+ | | |
+ | all running toplevel TXNs finished |
+ | | |
+ | v |
+ | +-------------------------+ |
+ +--->|SNAPBUILD_CONSISTENT |<------------+
+ +-------------------------+
+
+---------------
+
+== Snapshot Management ==
+
+Whenever a transaction is detected as having started during decoding in
++SNAPBUILD_FULL_SNAPSHOT+ state, we distribute the currently maintained
+snapshot to it (i.e. call ReorderBufferSetBaseSnapshot). This serves as its
+initial snapshot. Unless there are concurrent catalog changes that snapshot
+will be used for the decoding the entire transaction's changes.
+
+Whenever a transaction-with-catalog-changes commits, we iterate over all
+concurrently active transactions and add a new SnapshotNow to it
+(ReorderBufferAddSnapshot(current_lsn)). This is required because any row
+written from now that point on will have used the changed catalog contents.
+
+When decoding a transaction that made catalog changes itself we tell that
+transaction that (ReorderBufferAddNewCommandId(current_lsn)) which will cause
+the decoding to use the appropriate command id from that point on.
+
+SnapshotNow's need to be setup globally so the syscache and other pieces access
+it transparently. This is done using two new tqual.h functions:
+SetupDecodingSnapshots() and RevertFromDecodingSnapshots().
+
+== Catalog/User Table Detection ==
+
+Since we only want to store committed transactions that actually modified the
+catalog we need a way to detect that from WAL:
+
+Right now, we assume that every transaction that commits before we reach
++SNAPBUILD_CONSISTENT+ state has made catalog modifications since we can't rely
+on having seen the entire transaction before that. That's not harmful beside
+incurring some price in memory usage and runtime.
+
+After having reached consistency we recognize catalog modifying transactions
+via HEAP2_NEW_CID and HEAP_INPLACE that are logged by catalog modifying
+actions.
+
+== mixed DDL/DML transaction handling ==
+
+When a transactions uses DDL and DML in the same transaction things get a bit
+more complicated because we need to handle CommandIds and ComboCids as we need
+to use the correct version of the catalog when decoding the individual tuples.
+
+For that we emit the new HEAP2_NEW_CID records which contain the physical tuple
+location, cmin and cmax when the catalog is modified. If we need to detect
+visibility of a catalog tuple that has been modified in our own transaction -
+which we can detect via xmin/xmax - we look in a hash table using the location
+as key to get correct cmin/cmax values.
+From those values we can also extract the commandid that generated the record.
+
+All this only needs to happen in the transaction performing the DDL.
+
+== Cache Handling ==
+
+As we allow usage of the normal {sys,cat,rel,..}cache we also need to integrate
+cache invalidation. For transactions that only do DDL thats easy as everything
+is already provided by HS. Everytime we read a commit record we apply the
+sinval messages contained therein.
+
+For transactions that contain DDL and DML cache invalidation needs to happen
+more frequently because we need to all tore down all caches that just got
+modified. To do that we simply apply all invalidation messages that got
+collected at the end of transaction and apply them everytime we've decoded
+single change. At some point this can get optimized by generating new local
+invalidation messages, but that seems too complicated for now.
+
+XXX: talk about syscache handling of relmapped relation.
+
+== xmin Horizon Handling ==
+
+Reusing MVCC for timetravel access has one obvious major problem: VACUUM. Rows
+we still need for decoding cannot be removed but at the same time we cannot
+keep data in the catalog indefinitely.
+
+For that we peg the xmin horizon that's used to decide which rows can be
+removed. We only need to prevent removal of those rows for catalog like
+relations, not for all user tables. For that reason a separate xmin horizon
+RecentGlobalDataXmin got introduced.
+
+Since we need to persist that knowledge across restarts we keep the xmin for a
+in the logical slots which are safed in a crashsafe manner. They are restored
+from disk into memory at server startup.
+
+== Restartable Decoding ==
+
+As we want to generate a consistent stream of changes we need to have the
+ability to start from a previously decoded location without waiting possibly
+very long to reach consistency. For that reason we dump the current visibility
+information to disk everytime we read an xl_running_xacts record.
+
--
1.8.4.21.g992c386.dirty
On Tue, Sep 17, 2013 at 11:31 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-17 09:45:28 -0400, Peter Eisentraut wrote:
On 9/15/13 11:30 AM, Andres Freund wrote:
On 2013-09-15 11:20:20 -0400, Peter Eisentraut wrote:
On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote:
Attached you can find the newest version of the logical changeset
generation patchset.You probably have bigger things to worry about, but please check the
results of cpluspluscheck, because some of the header files don't
include header files they depend on.Hm. I tried to get that right, but it's been a while since I last
checked. I don't regularly use cpluspluscheck because it doesn't work in
VPATH builds... We really need to fix that.I'll push a fix for that to the git tree, don't think that's worth a
resend in itself.This patch set now fails to apply because of the commit "Rename various
"freeze multixact" variables".And I am even partially guilty for that patch...
Rebased patches attached.
When I applied all the patches and do the compile, I got the following error:
gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -I. -I../../../../src/include -D_GNU_SOURCE -c -o
snapbuild.o snapbuild.c
snapbuild.c:187: error: redefinition of typedef 'SnapBuild'
../../../../src/include/replication/snapbuild.h:45: note: previous
declaration of 'SnapBuild' was here
make[4]: *** [snapbuild.o] Error 1
When I applied only
0001-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patch,
compiled the source, and set up the asynchronous replication, I got
the segmentation
fault.
LOG: server process (PID 12777) was terminated by signal 11:
Segmentation fault
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-09-19 14:08:36 +0900, Fujii Masao wrote:
When I applied all the patches and do the compile, I got the following error:
gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -I. -I../../../../src/include -D_GNU_SOURCE -c -o
snapbuild.o snapbuild.c
snapbuild.c:187: error: redefinition of typedef 'SnapBuild'
../../../../src/include/replication/snapbuild.h:45: note: previous
declaration of 'SnapBuild' was here
make[4]: *** [snapbuild.o] Error 1
Hm. Somebody had reported that previously and I tried to fix it but
obviously I failed. Unfortunately I don't see that warning in any of the
gcc versions I have tried locally.
Hopefully fixed.
When I applied only
0001-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patch,
compiled the source, and set up the asynchronous replication, I got
the segmentation
fault.
Fixed, I mismerged something, sorry for that.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0003-wal_decoding-Add-information-about-a-tables-primary-.patch.gzapplication/x-patch-gzipDownload
�d�:R 0003-wal_decoding-Add-information-about-a-tables-primary-.patch �WmS����'�Pl��^c���I����)��{�d2�,�����]i�����v�L���x�]I���y9+�9�G�D��/�:��D�h(5���d4�?<���~�
]�%�����{�$:�czcT�-��4�^
��ZQ��u+}�N���lSr�y�{�>%��~�����z�]��_�t�������?�~w���Ev���Uj����f�s�R�%�y�H��:,�t.�%��eD.'��R:�Eg�8�Q�S�������dn�H�%7�t�*�'$�(�@��.�4�o����*m\:YB�H�Y976�=8#2r�t ����(4���?(�T�5Pf0T�-*\�J��.}�,�,��<U��Y�IZXG���w�)Z�fmJ�s�R��
�Lg!
�B%H���������5�r����9���������*���:������Fw"+���?hw����5[d�V��?w��Z����f�%�f:[L��R�vA�,��8H�^dx����V�a�qp~�z7Xm���R�w�]����g�Y������b>�MiY��ZF�R�-������k����y�)X@0�d����r�b�?�X�=&!��/���q��{�E���j/s��ak #V�M6���d=�A�oe�h �RVS���mu:��l!�c!o��n���v��3�-t�_bI�A����o�/52+����S<���?)Y�QX�,Sm&p���l�{�6������V�t2�Ng
kE�+L��(D~<�r(TJ�T��7�[_%+�M_'��k����6����^�z!
/��WR��3Y@�k?j��.�n�\���o�����e�N�Ny��h ax'��+���
.n����������:*U�+�Q�'���|�������������T��~@@� �������%�?|[b���$G��}w�O���?r^��e�5_���Y�b
@������ g�!�"���>.�|%s����A���pB�7_x���?lx�#����W����bQ���L�f.=Z��l;�*O�'��+�P���8��
� t�L��[�m���)F8�^Q�{���m�ij�J������<% Pb�����'fz����z�\�"���r>F�x�����nR������������pu�6W�g@�
�k2���-���bz����(b��c���G-� ���k&��=c��?Ux*�_d�b�����k����z�_�T� ��������Y��\�!+��Oa]�������5������L���y��^s��
��c����7q��>O/�$����J��;�O��}�o���7�~��4�PK9�F��."�/3������`n���� ��/�m������5��sL�M�:��a?JV�����?br��P�Y[���7eM�#9��_���z0[OT�z�����������'e���;�A���@�g+��b�b����B��W�:���
����[bn5
7P�J�9����X!z`�J�z���d��a�y4$t�d�[��!r����j�6�}.o[�bw�A�r�+ ��2�o��:49�u\�B���n�m����I����?�S�9�#y$Fq�����`����F���l����$(����d�5�G�'[,j\�t��w�J��5�*����Ym�����O6��`�@j�Gp+F ��v��vV��H�l��|f�'Y��`l��62/0-�$y[�}�X�@�w�oy��z�����Myr��:��w�����\��ow����W-��c���f=f�
����2"A����������t?���_��b9����-��-h�.r�C��y���:��+v\V�[��)��g$K�`�
��D�6�~��EI<�q?��GG}�?�*-�2��-@��� 0004-wal_decoding-Introduce-wal-decoding-via-catalog-time.patch.gzapplication/x-patch-gzipDownload
�e�:R 0004-wal_decoding-Introduce-wal-decoding-via-catalog-time.patch �\is�F��L���k�!%�����ldI�Y��������P 0$� �CG���}�g H��?��5��tOOO���6�g��:����j����~���m�][�F��w��G=���e���r.���:�����o��X�xv C�6��g����g�3��6m�S�����N��h1�N����q�u�w �[�W��G�I+:�^�O�����g�`��--�v����{Q���%�]$���1�eF��O�"rf2
�{�V���
'!
E(�e �h
����%~��2���~3�t�~�XSH3�l��,���"��B1&�"��c&�1�!�p�����&2$��g�G����[1��a"����� ���HD���1h������.�G�o�1('4��Ha�I�sw��M�q�Z�����hs�c2�S:�S3�T\g�����hGb��6N4{��uG�uV�89��$������fx�C���������b+��9�%F���X6=��0�h*���G��q�x�����O�4@��?4W���'���z���������Fv���q]���5t�HzQ�~C���x�@B*�zb��^��@�](�LA�.��`��hNQ�d��O�8���V���`W�p$����f��`�7��J+aV-D�P�g��BK+3��� �hA(%(|��}����"^�HN�t,�H�BfM$�*��-,��$�E`K������p�P.qu�{�ul3���}5����ds�T�|0
��G&f���Iy2�C���l0�y�����L���|��20[h�v��Y��?�#�h`�����%�K^G�C6;t�J��> g�:� >?L�'�~M(��i�Ll���{�qC���jn��#����z-I��������U?��'�Ny��$�H�7���^:ww�8�g�a���6��q���~�d4u~s"qk�vn�W�@1�+���c2�[gC�q&��zaA�-$����oK���bt���������(pF�Z��8�i`:�g�'����=t�HZ�����0�v)@���E��K�{��ti ��i�����[b��r*�9���,MJ�����+���A�I�������:0�G�Z�%���� bgh���ay��h��)�#������R��l7z��S8NNb�\)nB�o,��H{���B�R�%�D���yR^�RBM���4|B����a3��-���L����O��qKZ4!����HWP�'�5������V�8MBZ��<�k��T��4j�����9f>�p��A3�s����cyq#%e)5�)p�5-�[%C����r���c����=��S��k��b1�=��#$����f�9��/��G�[��<���a>{�m�)�l O��O��<(Z^�4�Efl�y������:�� ���R�P����#5�%r�����]gn�?g����$ c���j����Y�0����}�b`�I�E(2�)� ����"SdI���CINl���H��$�Il�m�����w��}o�D����3e{����c;FV-�P�:����TID�9@�Hio�v�H���Me)�(�. ]d���V�(�x�*��"sfp&��
�w��D��x�,to��J���*�R9�����%R���U�� ��e�Q6r�� %V�<u5�~�r)�������B��^I��A%���D�-,i6�\XjG��:�$
.�Rt8�t�,�J��A���T\C�y3�B�����9�n�#�����������k�Ms
L����RsH�h����:���~�5$����[wk� nB��N��Vb� i����A�L��5d+�U�HM+����
���*��7�f��I�)�|at���n����T�����]�8�V$ab^N�����x�>F�Oh��E���\�r6]'��F����
B��.4�
q�j
�e�{��v�!����t�j��W���I��J`��~��1j�&��q
���F���b ��V_I�.�Q���J��"]��_E����6��U����������a�k�a�*�3�holw[f�)�r�T�QMm�1��X�����bg��@��k�'�[�����f�8��H�j��~����W/�^�+���JG�-�9r���z:�������lF7Q��*��8 ��T*ON)2��������������������GcJ����X.���bl�Q<�&����3����L���}!��3��P��_
^M��X����>���>�5I���f�F�B��30��EC��*t�\6��a4A/u��C���md��W i(�8p�'c�������Q����ug�!^4��v���s�vx��|�X�5Kl�>u��?W�on�7WW�/V�P���Y�������x-�fJeu����Z#M5�K/����W�zj��\�+�.�;�����G�f�<����zg��J��w#�:�k�f���.�}�t(�7�������q
�RP��������V*���K���w��)�j�W���B������FZ�Q@�i��orCc�7k��QW��<�@m������^�ft�f�$���N��V��t� �1�%] �v9����c?,�v��l@��X���}��n�5;2|5�v?��_��R+�&�?���f��#��>\%��{��N>��K�&��lA�!),�XVS�>t.����B�l���:��\��$V��F���r"��4hYg�#�i�������&x�1<����F�K�
��<�4D��i�F��P��f,��B&ZN�]��M�P���J3��\ �8�0P��h:Q-�]��sc��tq��1V$��C��������<��V1�`����x�7�Rh}%�"{[jh���
s6����(�hpA��cR����-Vy��������F�E:'a&2�c[c/�L�L���z5������l�`G}(�����-qM���6��t1�.y����%AAS�1�7�g5Mf�> 9��,���r"ff��;��i�xu�g���8|J9�F����F@���3��29Q���|n�h.K^_���A?<������M�g����I#��6.@a��$2+�^N��_t���B+�8��8�y����
�M��9�Az]��m|8j����Pl3Bd
kj(f�>�y��}Q�-1l�4�D�u� �F+�Cx�����XQXb�$'������qj]#�a��^��
�wb���VF���i�!;?)v�D�w2��|M��� ��d�`���?N�c(t�s�J.�4��^u���l+� SGMS����lzZ�������b6��o���S����=���`DWqT�a�/X&a���J���r��Q���T��.L,��A����R�o����@��z'��2W������R��5v��v��V�v����Rk�1.C��52,�#����>�r]�v������(�����?��N9~�~f5�>3�k�z�BM��G���*SV9���U���������������<W����]��aH���������M([���4������E��#����?��8v]���6�M_��;�=m3#�� K ��=���}���B�������._����������d:���=����zS�h�7�9��v��l�3S��e�"�Xm��Y���aU�����w������t]=(���f���0P*u��a"�Zbz����W'��/g�J�7��;d���f l�h%�,����$,�%��������w���fpn��On����hi� N� 64�L�Q��l���=#yz&I��n�-Z��Z�5Kl��K����;�F�������=�_�c5~�t����>.|���u��Wc�=��i�__�p��/>\6����b���qJ.
�F�6� ����|���eM�t>kd@����A�������Xy���F��
�����g���Yl�K�Y
u��,'?�^TJR�+
Q D�00dX�c�T�aE
;�r��pZ����S|g��;��/R�'��&�l���@-j$�eL'@�1�����Q���t�I��?P��}/)�!^����hE1gi�uU;� �,ob(��u�1*IC��jJ�'j�ZK�e�_~��_�yN�J�l���M~�W��d@����1��zp���Js;7A�$K02���j�2v�P�C'P�6'H_�+D��p�C��`~�3��/0!�t{����B ����L;a��H�J�J��<�sk�0��5(����;9@�c_�nN�~-��.��4g�F��Kw������s����V���weW}=Y��� Zqv5�%���\h���'o(� ��������;���z�2�F��R?bPO�,XG.����R%�J%D�Hy;7-�.U��o�&{^���f�I�ZN��PK
�/En������r����%]5Gc�]~�)`�CF������������Z������v]�Ee����-�Bv*eh��`��
1]��;��/�a?����*����
�+7�����������{�����j"��6���[S������yETc��g"f��t���SZf5�Fw~�:�P}��Z_�N#�������
A������x�zm�xvz��F��]�k��:�8�
�;?�f��?_����<S�W��.(v���EfH�@���$�+����U���z�f��<U5�v��U�����ENK^�3>@�%����Z���R�o�n�/���Q���v�jt����,���%Q���PP���(��w�m?
��N��{�
�pjP������M��>�8���i-7�.��E�CW�C�&�8��kT�*;��D�F�\C�K%{8|t��j�o��lE%�k���hIp�Rj�%��[�&G�� V��{{G��;������2X�1�X.z��o
������_�X@B'�����<j���_U1�mC��R�������]*�93����|J���C��9�H��Gr���^���V�a��k8n��.�(I�^���T�;n��x�J�$[�RQ���-����X��MU�h��|��R+���V�F|��JssS��1�#%�#�~+��P�4���]�7R�Tj�>����x>�CS������i�����l2��>�*�N����|������:���������7����S��mmq.����]C�t;��P�����2�L)|��wg]�[��suhUM���R�r�V����B��(�*%��)�R����AG/����ya�T)+�k X��S���-�a�����noa0_[�L�@�DfRW����rm]�6�G�A
�)v���$�%|if�)S^I=3��W�t������x�����kp��I *���V=�X���*Y�:=_�) �
�%���v\P�����K*�9�hR7.%�W3W*�[��W�3�������������|Q��Y�����5U�T�����.�~�E������v+�5>�.�a���;����i F����*�T�{W���U���+�D��q�F`�����:��T�����9�S�\�[��B/{fD��q�~��~9f�[B������m!�|7�����{ge���H���-�c{�w;��E�JK 1�
��R��_:j�3����&�C.�H��@������\7���.~@���C��c�����b65�s�7`f��`
N���,n*�F��g��>� ���f�k��R����r)����yv��8o�7���B����(�B�bV���J�5�JU�����Y��dx����6����A���?oh#K�����S���-�������-��{����H)���RJ���gc=K�I-��}g��;��2O�5N��8�����<��~}
����3�H����GO�n=�v���@�h�Z2&JVzq?N���P�i
��z}#M��0@ s
���t�.���_t���D1c�dq�+�#i�d�2�:�:3�����~����
k���A��A��.�T�����_9�
G$z�WU�Y�e���k�R�0��
BX��
M<a_�GO��`�$���������;3l��h�*����a���E�2��=,��������j�Z�s��������q�89>=�4NO�O�l�AN��Q�����i���<m�I�X{�l�Im{����BK��a+rE��V�W����Mk�C����L����u�7�"��X����AV]E�����Z��3U]�0�4��- �/�ks��-���"�>S_��D�.b���-�� t$f������D�G�� ��G��m������Qd~�c_�
���A
�����X�?���v��v�I��h)��c�c�
��y�����`+�)����c�@��'Haj�#�b9B���V���2X���t�^��/Z�c `�\��c��t�����n7
&�V�+�-�+��b���e�4kf����3��(��~�w�����;tn����%�f/!)�~��u�����mPg<<��O^��i����L��n����x�I�/��kQ��R_2.aS�#����N$�����r�[a]z�}w��.l�&a�����P�3�f�?�8�
�h����V��ksuy�����3R �[��1boJ�����@�����b���]�W� t��O�7{��-��4>�^ �)����>��I$v�@��s��z�S]�����:M���������~��Y�7�����u����uw�t��c[�1�|�r@(@��/�>���r-�KS�'+��
�������'�U���<�Lc���/0;y��r���Gruv�c��X���������`8}��-�{:�Z�
�_��D$��I�#��u���!O�h*B��q����T�x����V���]_3=����l�f�AE�B��c8���fs��va��4����� ����e�wg�d����X���q�4�������C����}Q]�
e���QkY���rR'^����N�%h�/���=y�|[H�hS�� ��I�J���O*��C�4ZK�}���@d����0~�Z\��;���H��xb�6��%�{��K0�������&�^|�\o�?���K(�y����%����q���^n��u��������"����70���m���F=r��)���������2%"d��[�'��������m��-H�|��A��.��|Cw������|��mj�}�r��"��i|� �Qx�hUU&���|�M����fm�d����h�>s���3�p�K�X��d�O�+����=�2��_�t�#���r����P�tW��Y�AG�jr^�O�~�~ ����{ �=.���/gt����=�1��B1c8��o�)�T�=�+3�6f5����S�^/��:�8Y������R=hy����)�!�S�
C��:��X���������k�����}�9m#���Mwa���F&��;��4\���U"�Z �1��rG�]������7�b8P/*�xq����-3��m�������:
��Z0���^�47(���#�H+����Q)�������/w�{�zr�������{q��y��b���z� ��Y�$a\
}�)
�-c�LFw��
�Yu�? H3
��0�eF��u-I9|>��<�b?:��9\��������m�T��wh{R��Y��
N��N��.gO
�zjC��HmO�L?����O��<����������1/���+�X,�|��;�xA�*��]
G��ot������kRVWG� ������tF���u�]��P�E]|��j��rW� .j���x 0�����t��V�u 0&�7��������8X��:7���c9������E�����s�N�� �-�w���<l��q�i���� ������b�k�;F�L-,k���y8[�xn��g#����)��:�CEx��\E�s�<��3�qL�_kFE�M��z>�PM�Y�����{j��#��nL�%]�`�aA�v��'��Oxr���B!�\�z� .�8ja28d+�^|��Pl�s�J�������@mF0��PE![�����;n�����K��,!��|������`���P�]���8��� sG3 V��EFn������g����O����{�3eb��Up4�I�4��b�)���b8-n�s��CW��A��@����f-ah���
5�A�3�0N��t�\�8�-R��M��1>��o����[��6��n4�<����4��"kU�:%��ZG!���hs��X�4�u��
|����}�f�'0����i�����/�JB��?�vCE(D9��������(n��H`a;��u���j9��P nX0�@�����>L��7^��px=�J%r��G&p�U��?���E+{�f�A j7���R��&��������g��a*�r����kTo���#����l��V�����kA�)�J��Yq���A����V&ng�] U��X���{��_�bo�����1��=h����0jCjT��sP�� P����v����������"�������@��Z����A��������������%�E<~R����MVsvW����d���T�BBj#�y0J$�w��3�"�y}@���q�)����!������;Q&�RH�]���_S���WY ����9�'_e�zc/�57Npw��[Q������q-�@V��L�n�HZ��U��^�������G'| ���h{��M�A�)����n�rG d�Q>n�����"���[/h��m?�����>}�<8}�G@�8��V�2"�XE�]d^6�*\B�
B�i�L=��:v����e�z����;����E`(����Fc^����d�fK�Cy^�<�y}�,?����d����6��=}���[�����=���R:�m�c�OG�[������b+?p���-@m��S9�^��g���=������V������'�I�
H#�*%�%����7�$*_d��
���1��E���W��N3����Z����3J���x2���dlo�����MXd�������j8|����=��0����O�ux��oo��mC��;����c3uJ�l�����HJ���=G����
�������L�
Y��@a,y��
���8�>���?D)�d��J*�S}+Bt��NE�Rfc��t)�M07%���z��k�fH�_FynO+�Kz��m%wK��������{� �Z��`����^&l�2p<v���^_�Z�wRxN���y��0}��&o��Z��4$W���(��u-����\�m�L0���SH����n���^���q%7�E{�a,T0P�-&���_jis�+�P���ni�q���knR`�?�
-w��������������n{k��m��0�@��4��g�D�b>���d�Y)Y��ut��@�&N�E���Kj��A������pnu��=�m���f���-���M�0����=����x���������%+��Y!��P�R�N.�$_�X�PO��� �ql�;4�3�����*�=!�7/������-�9A"��@�B������a�
^<�����S`�D������)%��%���x�^�I^tm�����gz�?�$Q���'�v.�Q(����K6v�m'���t�z��U�����
��r�0~�,�&��'K���<�Dve�l�*�r��6�
iBU����<��A�����k���t �Y!`�c� ����2F+�����_�^R�
^�^�P���c�NS�S?r��{M/�0�{�y��(�u�eW^�;�6f�[{bb�#M��"}�B�U^�4ba&k*�DG�i��x��(Oi���
/�<$�����KS@8��M�la�E����7�fy�:���dz]T���\"��H��}%��q ;(<{g�}[@���b��� 9��9������1��/pf5O���&F@�x�D��(c���.Z�/^<v�z�d��������e�
S����)�#����JX�r6�� }�(���@l�cK�p�W5"'�
���~0A� �6+�0���U9S��6b��]i��S���b��m(��F�Mj����vp��)�C���3G��+�!��@h��k����?�lg��x������C�;�:2���6�������G$���R] ���pmA!���GO;/0q�����d���j6�:����NY#h
�F~�&-�"�����t��xtk��$�4e����W0�b/��pg�����=H8�7�� �/�Wp�t����~A��8���+����yv�{�������Y���A��L��C�3X_�f-�������
��.����/�=�<o����O�'����T�������IQy���!qE���y�E i@��.���7�q��r�U�^6���|����0*�l�`s$���b7j[_���d\pd���.�g#���y8y�.�~2"�4����,��S��J��s��������=J������K��<2-8�F���
Ti��6�}��M.pz3��=y{rz�G�J{�{�����?��U�����@-��/P���J�,2��f0�T�r`�d����O6xfY� �����f�(.#�i������N�~���3>��MR���$���?������B��W����Jx����������n�,�|A�����=�=j�T��c���[�8��"��1�l� ����_#�i4���L�~
�1��/{B�m�DCo�JG���6�zi�r�����i�4��y�q�_��)���.���
�q��@88zA5|
~���[�]o=�U��`s��}���fM��E����n���J�g:W�8�\X�Pn��M�e�940�/������6���tv�����
}�=���S�i��J�~�%��:���0s)1��!����G&��x����������)�����X��ni�YYZ>�F6�����S����]"@�?,@�oP�=����YlY�
' �ZDR=��Kx3����d]g���G����P:����2A�P��������WBk��Zl��d�J����V���~g[�De�_1w�6�&�(�$�b��Lh4�Z��:�=\�{���
K�#�@7���<]p�W�O~]<�JVe�_X��<�9w.(pi���u|��Z�������#����}�0���Q�~�q,"���B�����g��Wcv���J6p�Nwp������u�pm�Dqpf����x��`���e��i7�pf����z}��������+[Z�'OX}�2����'(�Pi� =��V~�wb���05�h������AVo�(��'l���o�D�kD����3��.����w�[$�$r��OIHx���������^��W��R���=���y��r������(�o������$�1kA��!�]���� ���G�����&�\�!�$d�2�$�|"���R���`��1�A�t��m%��v^R�E�I��Z���.��6���Q�S�1#N\�>,4�.e�!n� cP���A/�$�^���� ����^� R/�=E��/�E����)L��S����)�v'o�8��+���3����S�
��A� :��������fdR���5{�����E���<y�c��dl��:���DI�9Y�����%�n��o1G�u�H�&���c��b@�RyV�@q6{�d��Z�D���~���i�?���|d�:P���d���
����8���(����Mq�6����'�vS[9k&G�=]�\[��O�yO�>[�G(���3�TUy�,��^��G�i
�H*,���z �{�}(F�� ��g���~$*.*������<I��<�s2�?y��:�i��EN�|��bN�h�/&��sY��|BV�����knj��t�����E?�Y�Y���}��;��m�k���������y.9�R��&S!��=��9�QTp�(�}��n�����y�"�(_E�}���Xu>C���������
u0�,Z���?�nFP2��Da�qn����"����_&=V6��b�����p�e���e��z��I�i����x���=�e�k�m��r$����U����g �o��>����3�r���k��l%�i�V�\�z�e��eD�+�����6�������t�
��d�H.��8p"-������;������ ������8������#���>�3�i��f �w�_�>�@��� D���J��@��N��`i���f�����Uj6��� �-����o�f F"b��&�&����OFz�FD^<�_q:�~39G��-'}aJ�gG���Zt��K���
'Z
�������k��8��9��;i7��UP�!��]�SX6���O|�NB����9&�A��#umg/�=3�
�b���3=?���Gg�n�O������6�psO
f����F��4���h���K��
�]W�
�_�E�G�\8�9_������pzH"^�I���_)'�|�#�>�P��,�(H�8MC��F�]��I4�O���>}�M,�%a��SK�J�w��9������x����R��U��&V��9���~�,~V�o��:���s�XC��*�!f�>���S����q�$_U�}qD���w6l��
�B1A�]'�.Z�./d���C��#�
���A#�4-��-[�h�Y�F��l�^��p�����Y'�uLZ�(��[��n�0v��������s�c���]<h���5�y-0���s� g�8��Z�S8�#���
�������G�Q/�j���<s���Z�������Q�<����_$���sl��|sh �\�z��}�_��D�j�'i� ��c�R���"����
i���Gv`�
YL\�� ��7����V���=!�������J*2�#��N�p�o.D�#S�����GV�I�aoxfzo[����h����~��� ����4|�4���n��m�����g|��J����<��y�MB_`���'����m�X6P�S���s�{���Z`q����b|��`&�^_j���b?�}����� ��D�b�yt���vsrz$=�������@�d�7��nk�C-�a��/g��kj/��n�et��B�G �s���ksQ�w�W�6�O��iP��\��5���uV�~����|1a7O�=�o���[q������PQ9�)%f�Ty�g������y#��A�#�ZAl:�]�{�L��5N�Yd5����u�d�=�_�</��Cz��%�%���,M��
����CP��o��a=�������3�N����r�^�d],<FNYx8b��t����\����{3���8�&��~WDH���^�i��{���|��l�)D��3�F���������_<��b��^o>i%�B,V �b!�o�d�\�M��u�����=D�
�m�4��vE���M��� ��\r M9���Ca�Y���B:"��Q6cT.7�$���!��&�S�<�?����
<���`f�4'��y#��Rp����y8��5Q�<���N;�#]��OU5t11�8�S�)�F�����NT��b���4�V�b
U�B�{>����9��o�w�_��G�5.5������-�7�5�a~!:@?f`kv�&'�����&,`�"�7����8DG��t! **�]��������X2�E�P?� �^�LN�=�
��5���$�)D`
��?M^��z���v�Y�|"�UL![��GO� k��7I��)���s����+��D,e����'�_��19���ukm�
2M�{�n����}� "Q�`yn�* ����r ~4�6��>!WV�m�8qK���Y��@h��B}�h-r��N����N�^�TfS��r���BO���$~��^�d�yk���dj�BN�g�b74=���+
3�}� 1��w1 ��S<����"�����/�(�����eE��C���2�������\+��"�G���2�Q�&W��5��V��CO�>�k��jd"������@J����c��p�X�g>
g��e���,�,D�<�(�����`S�<$��-3������%}�a�J
�4�E��G�� �
qM�#�\��6�v���Zs�;�-�p4hOZ��q=�bQ8L��-�~��SZj�DC{o����7h.Up�y��E�2��wok�MZ��1a$���5���LT��/y1?#B��,f�e�ft�8�|R����f�5����2�)I9��{�^�e�����
S=�/m�$�t���:����YVB��\�mo�]������<4��9WTXr?���*�����R�yI��Wy�.�"$�M����v���)�I��s��A����Y�� ����N�^Z958�g`��OMQG��h�gak����?������y�d��n�o��z����$)�}����S�}��ZG�-����������T����]VV�c�#��nT���&�J/��9JH�� �h��?V��n�?VV�U#����&~���Y�*�D*[����`6��`_�J����TB��.��%2�P+w���wn�?��2m���A.q�r�����<�c�/]cgd@��2�<�J6�l\/'s�q5��:Gq�"*������V���l=J����J��,�n<�����f��;9ys���,��e�~�o�������U�
P����sl�z�"��3��2��B�vj2�����a�y9��ge�����>x�����)6 ���2=�����W�c��gB< @�&�4��'o����]b�:�\��-�tg�e�]F���!IVU��� 1lm����h�7���
�%�����pm7k����-�_�s{�{������-z�&I3!d���S��`�qt~��`o�j���Iu|�a-�e��(����G���:._�o��-��T���N���K�(�F�uP^���6�je���7J#��������7�������������>��\;������.^�ig������9I�k����$���G����+�� ��:���^|#����(8���z�5q�,E�>�'����I/�s?�`�>z���QJ=rI�qe�D=�(�,����7~�����Q������ _^�99[!��_P�;W��|]��b��{����G,b�z���M���~���,&����d*�+���R�2�BLN�1��kdL�e$akSg#�:�[������Q�\u�xjEq�#J�su����
��#
R������OA[�zT#yV��?#&��q�CrE};L�3jz\����L%���>���*�d�_��~�n�Y+��l��R-����/6\�q�s�1������q�
(�q���p��]��-�j4��"by����;p ��X(�.�,�\��%%��V^�n�|���V$hy�~m���J��T��+h���������"���r�i��[�NN`�V9�v�:�����m��P��������k�U��3���'&J�r�l��0�]�v���F�w-���}�����#�w�f�P�5���u�1yl�Ux�F,�'��
� *6|qFf( _���C �|]����b��\#Q�H/)v����q10#^u\�Y�M���Y�P��x:+��9{3b{7�g����� a��`%^�A��hS�Q���u���x�}5u��%[TBE��H�b1x����|�V�iQBF�vd��m����SDk�TN��A;�":����M�9��6(�B�0 c�R�c���B�K����#�������@���lo�*$%���s~��z{���UG���%�s�.������3���;V8-�)}����>��{��?<}xv��������� ���q��N�>}}~�(��n��]��������B+g��(<:q�a\��Q��}M �lo�V&���y����W+�m����n�������_�<l�=(WI�?�t��jJMy�I��=!���O���e~S������6-(*���������o���6�@m�Y�X"���t�������pp��|�����?�����{Ns��+F��L�z�����)e�j��=���6Dq�����s^C�������oO��E���MN6���)��]Nqm�"�yM{L�IX
�s���zU�(~7�z��������i��L_*S:�uiS�}`�<���9����C�}N��sM��F-�R�,�\��;��:B������X�q�����i��+�<9m� >���o����%�;��]��? ��X�B\�z0d|E�����������i�����7��jh'�,���n�A8JF#8}��_5������GGo�c�{K}�n���$�#u����)�e�lg�N�< ��fM!���������Y&��Y{���r&+uL��T����;��s�<������x�P�Fg�� �o���9���l�3�gZ�V��Yq�g}�Z`X���������(��"��6���������E4
��p�1�a:u��;c��u������G?o�bJP���s�� ��g�JzX32_e��=����$j��1"\g��h�"}�Z�e� �:���M�������~���O��"�����p��#���X?%��n�.��t
kN���U��,jmaMa.��n����#<�h��xL�NF�E�N�h6@
)S%!�]b �3�o�,�h9��]R�S�U@o��T?12+7ATRW�iw�0�e=����V������c l70]�8~
�����ggR3�~@�����������y��������G?�%r��h�,�w$ �s$ ����
/�<�)r����+�N����
�A�eL=�@�5���S���K�����Pk(�o�����L2tN)���E�L1�yug ������Q����D�F�pq0+�o�A�O��y@�P��������k�b�������I��zr��exg~<8�{�{�zw��'H6���w�7��0������������i�I��{���c��7'�3��_������) o��j�0��^�����Yp��|\�_i���D���M�Gg
�����#��e�4z���
|r�2�x���s�����#'����&50����_���,�0d54��Vq����W�iGk�<��6P� ��:�bP�E�����C�~�q�C����{��\���l�������������Q������$���<`#$�{�����q��"��1����>���zB ��_��_K�������3�g5������)`�jT�`c�G�0�^����O^��9������=3a�Ge�n���d
'�� C�� qMteei���� ��Q���A�p���h�)b2�������y��@zP�-�c��� �+'�R�C���k0/�A���Y��S�#A��m�Pwt��j�j�����A��������U!�j(g� �*����u�\���.�����X��"��,(���J`��w���C��=q�
��{�������F�?C�6e���-��'�<H���;?���dSx���;~<��8w,���iBc��]/A����*���,Y����n~�E���m����^7�Yl�5CD����;��t�����/Z���~%����+�(��Lh]�:���x@ J1o"�X��h������t$�&�{\�:��� ����r���2 �bos�1���{�.���!Fv�9�336���/N�c�uf�b'��A+o-�A������ Y��`A������6�~��f�F�I4�j����$Z+ s���� M}��37K�{ejq��2�4��V���
r��5��H�p��]���-;g��
+���W�y��A��%�%��Z���=�����9YiZ��z�Zb�>g
�M7��:��h@�/z={�6;�!g1`��ufEDgREE��C�)�K]��i*�va��&$�As�/�UA��*�.c6D��-��N�nN�X���:�<b�&��`��J��
2-w�p8�.c�"�P�I����_ ���������o����A/������!(�}�|P�|MA�O�����H��SSS����B�$��89l�T��@,�Wpz@+����D���G��]�5a��5s����������4��Ln����m�����W-�C�1I&I`Q=�
�os��6�����`bq�L_\�Z����v�����p�Ft�ewWa;j�a��Zr����z�-����<g����������� ��R�d���9[�"���������[��uL}+�c���v�M�q�`��5��;��(z/S��B3����������5�-���>�3-���S7�14����Wa
�i�������f����>�>O�{��b<�M�|���U�O�����^��������0o����(/A~>���+S�<2�]���2wf�L�+��_���trM�TM�G���<T�����(y`c��b;�������Z^�2Y.w��K��D����F��f ���3�F?U�s���'p��#�8������W��&Y������*�$0Q|�������3��g����_u�#n����N��-�2S�]�)�]�&3+�-a/�4��������e`��+����,�lq���,�������h�,t�"w��N�H���X�����.�1�V���)��h0��T�1��
*��|{-��.�{I��"�F���&lH��5Y_���x#����hj��=����P$��T ���V ~I��Wd]D���}&���L�\)��y�%#�'�[���s����Y���6]�8�3�����������G��Q�O�@����'9��\�`�Ud�N��t��>|��I/MB\���D5Jb�
z��s�5���1�1]���up��L'���%y :�5`���`�tlF�)�v��H�6<�3����0�Avp����52+�{���C��49Z#<���K�fF�Q�6
��gk/u4}�3����'��o���)������?�\unh�!�y`�d�~��]��<dA�!�z8E~�{�u7��Wi#s�yG���d�w��C�l-�
[{���/����E���&�������q�����Q��'�3z��K]P���$;Je� ��M������m�����Q�����#����Y?|&dg����}f3���s�f��6�u1�/��������������O���e[r��6?��0�l���W�F���Y���#�MNR������7�����;Ta>0Pp���~�����W���6C�1=�������sC�3����H��U�o���(�-y�)�-M�i7� ��?r��K��V��J�Ks���g�_���8C�MF����5����i9��J=�����c��6�d1C(���|��2bo�f�~��Dy�|��.m�lm>����"����1�1��H(p7����%����\X6o�w)�j��E�����x5������q���hBV�������6B]���1����������,B�<��Y�3�!pv�����������8�+��O�xq�Tk�$�)]��!�ec�waPw�(;��������rcAC��Lyt!��~�CC�����w��@���+@ ��OA���9wo@�������(a#�o�]�����t=4�(J���I�an�{i�Dxc�S��Y�d�>����py���<��U���H��/�v��V�?je��b$��dVUD�{)�K����)���� �x�2�����e�T�Z����T��v�~{���AT2� z���g��++*Qh�����$C�P5���i��?P����EnN�e�Rr���Mv�������h!wI�.��j��Aq��}x�h�U�rZX4�&���))*�����sjB���=���X���k���l��1wg���A,Hn�j {'���
�zN��������Q��u�$�n����R ����8][�s4g ��c�4����tX+qC*��\ �8������f0">�5���,�� <�{/
z�aY�]�lWyP2af���>E�r���_(��|�3'�4���� ��S��Z�%{�O�1 J(5G���������������u����(�}�nJ9k`��� �;3�I^+�~�z< E9�4�Z��,B��~g����������?���E+<�Z.��?�G�?m���Z2K���v5�iy�*��k��zw~u������AG��|�����A���f�I��<�����������5��}�e�^������]�����8�Bt����t���� j0���b����j?����y�������]4�G5���������p�.��a����-��vvx~o��_�������
>t^
m+����jw�Q����}8`�B�t�����
R�>�L��%r�����]�J���8�������V�m���k]V[B�a��_�� �u6��<��D�-4��Fx���@$�����
#��\V`j"h�x�`��$�2����,���f9�Y�F�xu1�
q/�1�(����l����{)T�G��m����y.L��(��7�W��E��h�Z�H�� _�P���T�&���mf�q��0E�5%��[������qN��i�q�����a�|��
P���H>t�������F�1�H��Y9���+(�u@DE�����s��N�V�C-T�^�����l���]�%1K�����,�e����������/���
�f��O#T�J��Y�F8Yf����D���6U`h;��sj�SO�pS_}�e�� �Bc�������k��[2�����X�o]��<����(7�Ua2�����2���,|�+����a�^5��=������������6�#����������������}�w���u<��!������mio�N���\�l��W��?���u��T�B�gM���4 ��%.�3.0�����
��� �������3�����a��-�a� �h���=��N3��>�E���O�r����������z
,`c�������&�g����Z��S+�,�-��EQ�����f$�92�
��/p���6����SsQ�4#���K��xK,��k:-��613�RT����c'��r<�v�����BW^��&�/�h� ��z)�e�2�"�*
O���q��*2�S[��EZ�<����"���S.&iyr�1��������O9��l ���O�6��[LbLo�������F����i��7����G��y���g�
�P<o�5����-��|)�V�n����}q������)���a�yS��q�^�~����|��W��9�*;�*��~���,���D1eB��>��>*��)�C#f�iyxw]
F�o>EH(��=���;b����k�"�;�46�i'�h�XC�����K��^�����go��7��J#���1���9����Q_4�cO]���1�v����� ����~�X<����:Z�a����{6a ����m���/�'M����|>����pr��H�AeF�0:�=L���@k����r?��
9<(������K�v�8����`+/_�&�a�oH!�.��q�YK~���#I�n�����j=� �.��Mp��Hg����(��E��E����^>��47���9x��������^ �@����/D-5�t�$���T�Gdl�
�kO��[K+^��j2�\���_��+�n��������g��06!��'�[a9��� }P�3�1a
L���&���=`�]��'@[^�����c=�Iy�����
k��~�(�|�^��>�g8�])M��El��G�[���19�j��m�}c�|��qD���)�G[�@_�c��Ni;�O���j��Ni��a�M���,�.*op�����)��/t��E~�}��y��sC��-]��fOg�!�t�_��.�`��eC�s�u��"Cf)E��^� ���|W[�;-w�P��!KY��g��X�+c�n�]�f���F�u������[�<����I�?NX�$�-��j�
�3��jOZ<o~N[&a_qE�@�o��V����wUT��B/�j������"M� s���d����/\0����xp�z<�j�~� J_����p����aT��/����wA�� 9�z��u����*;��e����G� ��:^Zc5p���t��>[��:8�?��M������R��R������CrW�7w�t��.b�j��
�~A�����T��4����ytO]Y@�G�����U��S��Xp�!��+B��6+���i�[4�p��xsp�����2q�S)���l�����3��q���[��qp��pV����E�{(���{�t�������E�J��\�.0TA�j"����.�^�<��m]��z�q����|��0��n���s�}�I�'x|z����i��d�s,�'B�O�=E�����!��@j����Sd�����%�"�`@�����Q���krG���Pj�8�����H��P�����E�����h��A�H�s��3nI8$f3 ��W8��.�� ������l�����(�k �Z(FvgS�+�$ ��]a��O]�R����tx�]��1���rt��d�Q��Q�����p.�cWR<�uI���>����{+%;�m���.p������iOT_��n"����F����Xte�E��\!���)��I|�FH�\Q7���s���Q��:�2,T�+p�gd?k�M[B���l������&�1!�`o$�q���D���J����PRl���;���V�K����.�'"hV�>b��~e���<�@9g�<�ia��3�-r"`�����;l�l$WK?N��%B�v�JB�#�p���5!#�� g>F���J�#��40 ��9&��a��{^��� I
�rb��������v:s ���u
�6�[�^er/N{��n�E�t/�q4>�%>���^�����%%�f�M��3* ���OG���e~^rZ=�/��'���m,�TR��wU����8-G�"��|8��W�0R���e�0_N��S:q|o�T�!ecp�;�`�j����Q��lFC�fp����:��c�����P9z������8�J�T' �%J�������'���ld�p4���k�����i�0n]t|�zc8��S2�y��(���Dn��&9�=:�3!�z~6A�j�����'*�?��nN�F]�[97����>Z���!�YO�u�r
��i�:J��\���c(��.�F��y��<�(�5�n-'�>��.� ��<7R`
�'��p�d �A}b��~����~�G�Y$j��-S��7,�4
.?����J��iO�8EB������Km�>k�~����'.��$� q�hg�t�0}���������� o�Zj�������^8����k���!4���OG=JuS�A:�
����#(�cf�Gp :H�w����+��q���1��t|k�S��KJ���Lc������oDF@�x��.
z�����C���
�@ux��%�^�P)��b0���5tq�J�������Y������q��R��T��A(��a��;��LD�4���0������y���77����������'o���vO��~�����)���>�_�a�������B���y�YkX'<tep�`���qD�A7v}Y���xD�q��!�l��|e��n �8����I���������F��bNL;�����K/~0o8�~i�\W�u����`�!
?��R��u�h
���8��d��������oC��M�11w�c��O��8�)� (z�=������S���;��>�~L"�f�)��_��i*�?�b����,�)����(<�s�,�� �9?�t *'<�'���P�W3X\���Rh�����;��������A
%>
��f��:<7U���S�'h�F��?a��0
�����]�d���dGf
����*���?+��re���,Y�����v��\
o�_��e�|� L+'�'�!�FX����35-��/OT6�
V�w���
�'B��
V6W���qt�K��x'����+CQ���'M������|���h$�d�5
M�������~ABw<7��K�= b���~�+wq��C�'U����<$��I�)b����\�T�������� �:��{�������)N��4����AQJ���VL����L�����F��S�.�|�]���Y��o�u��o�<73���s
78�F{��S��.�a��WHq��w����w���dj��\���zn ��O�H����&��t�h��� �=B,���]n�������b"���~���w��4��.�t�t�SK�� �������Hc�O{��V�Ss
��/���y��)����Q���v[�����|�/;� _~n���]� �*u6sn{KnA$����H������u�������}�) �G�9�X����gW�q{p��M��!mj�w���k�f�V%P;����� Q_���w�����?k���
DD|����w1�f9��nz�bQ���`��T�OS=�f:n�.� � 3�����B���
L�j��h�
�"�������Us�I[��|.�&����b�>pu����z���� u��N,��p�nP&�`78�����
�g��6${�/�j���wE��)8���wp�QQA-�5��W1���"%K�nX�0�����|�6������w� ��M���Q�O��sDf?���
���<�+�-
'a3����a2��X%D�|�OG�*|V���mx�`p��/��m�3J�bV��[��c��%KQw���(UY���3���HjnS�Q���������4�jP=����Of�}G�����E.�`Y������)weV�F�=J3:
��v��r(���t�h��4MfKY��� �����t��~�K"��{�~D��d�g ����$������������u{O����Sx1�����"�MS�w�.-���q|:K�R��r>z�;~$K+-�.&K����{g���~�������������lL-���3����eH�v�=��~�Q`���"��<��a�#����:�{��T8B��VQ/�+�V;���^�(�G]�������~��R��vj��
KR,�m�17R/[�!�hR���J'h�[)�G���U�>� �\�r r�����L\�2���r��ny�I1M�.�~�3��Riz�F�$|�r[�:�5)� ~����?��_����SP��T-vd���H�'�N�(���6M�I�*�C�`�#��% J@��I@Vq&���������U���_d��>!�<L��j���I�����T"�Z��%�;�/�6����Yw�5��RU�W��i+E���aq�l�a}��H���lZ�����uO�)/X!~��l��lSQ_aJ�(�;�y�t�E�W�@v9���0( ���$n��L�_��{w��T���H��vO^88�G<���������]3~���-��K>�+����������K�s�=z��g���V�$`g�!%}��*S�������G����lF./l �S3|��_!�QQ�;��;2V�������.���mQf
��@�I�a�'����7�����=}{��C�!���7��&r��eb�S�a3\�kbUbQlO���!���E"lOztg�K�1�Z��9����
4�`p1�x�+�
�j��}��z. �A����3JZC��r���������z���"p��o��wX�Yry4����Y��V��c�{()����!��}@AN��#$Z�0����W�o��F5h�V��J.uLK�=�3N�]�Q_����LN&���y�� �{��^��E+�m�NEu����3���X\ ��0 ����+U���,��Dp���R�K���'���Wiv��������>r��u��-����z�8�^����E��z)28���{�h!�>�����t�!�.����z1F,�N]�]����������g&c#��;K��j�����F����d/��pHKt�s�l��E���)����� :w���S���H�d�g_5�['e�"`�8[t��@��43U�S��np��@���
�4�O]p��0�����u����2�p9��W�u����rI�1���g�|e��m�����W}�.+s0 ��kv�k��������w��4F��A�����e���P�]�*�>���WZ��5�u������S;Z�?�*�v�WV��M�MtI(!I�ZOH��V�����b�@��
:8 k=L��r�C�[��b�Rc������@�GN���!���W<���sGB���������K;V��2�����Zo���'<*�����l�>�r���$�
(U��9'A�>�T�It��#����:<�#�p����,4M���m��,�$��H�1!�Q#���PK<7�S�6D�eY����O`�q��c0�����u1�aDDc� k���H:����7Q�v����$���~
����H��!R�f���z�dP���{�JQg�N<�i����r�~���$���b����^�W��>���fa%����9�m"pb��/)!�B�-�5�����|w�8�M)���������/�kr�s�s|$j�����y���~�&'K�=�t��d�5�EQA�F�mGMn����+�^���n���c'�����'
�������c�*���p��������
�����7,�
���@����yiZM��9�U����'_+��IS��I_ ��NG��O��}!��,�6��3��B$�a�ilal\���'�Qn��R���R/� Kk'���8�����a��zUD��G�F
�pS|p�/5�����,���]5s(��$�\����2S����<[���#W$u~�������P�M`_D���wv�M?�H2�wr��Z�%k_ ���&m�Yk����G���m1Qq�M���8W55��� �[���jf��&'Z)I�M�D��*�v�Cng|�O:��(�t�C/���&5��&����i�Ys:�������?x7Pl� 5V�Y'7e��������V����.L3��1\f�0�k`��C�I2G�#���%y������6���f}q>���x���*�����$���eA�>��� ������;����w��x����F�������[��)G�'rpnb��_����
��j��[U�h��"�PnS�|��v}��9t�C]W ��k����5!�Z�_��l��L��:}+5��:���������������8/H�o��S���s��@/������v� �v�� *#�%N��U�"�K��-�L�]�4Mi'P�p�jg��"�7���������\�g��e�����%c������n}����8:�&x�� ���I����UxG?���3LF�_'����7�E/�I����K]����4���@C�$�pB����G�n9�2`�'DK�A�+�k�7�����4���� �������� P?l�u�����@>��C2�x}�!eH��X!M��Yz��
�b���{�
�:{s����"9g�wC�������bf:��#+�?T[�NFjb�����t���!i��!,m1\�g�f`��_��x���:Y�D�����G��
Z*/��j�|����C������� g��d~��f��!�@%yZqfh6�3����{��V�@~7�^����
�Y �e)����4�SWM^�C��5�l^���?��%t���0�q�op'�*v�5�_��s�=8N��!�h|8:��B�+-��&@U��U�$h4�$nd ���$�d&a*G�4E� �'�P�����m�E�@���E��\�����L9}0M7�Q@��7��,�L�r�$+S������b��G��5���D��>R�1���)�)�L�'e�2S&�v��p2��+��j�1������6��dkj�-��C�n tD��R=��D�}1�+���v�����\���9�H�O?�������s�w�������:�%��iq���Q�@�8�o��Wo�x��0��$�v�%mO�G�B�v���0jB*���|�9J����c�a,K��t6�-�X��,�����'�\�`�9&�V�I�5�&�xEy]�$��&���l#m�;M/�,#_��1K���j�V>� �J]�)����3�y�VZ�N=�A�U�U6��7R�3���c:���s���
gD�\�% xA~���9�W1�)��Qu����]E��;�pWvc���H��k�� �k�{���r�D`�R]�����0�,m�MDP4&��T!4[��),��_��_d:�GGTG�Z��6�KV%����{�#@�����^�3�'IKH�����vI������hHh�������)���w_��HW�B������(]G�'����C������SzSg EW���C��1t�������MS�g,g�C�X�+�gG�I%��;���0u<"+�������0�����z������6���`���0/�Cj���t������
��8�u�z�g_��S$�>C<j��q�CR�t���H���E� -���)��
��
�:4RS�c��3uF�������M<e�B�X��J�u��l�W����y�h��t5��Y�30����=@\�.��YS� ���4kbr�P$���[�`0\e�
%�[�*fT�w��J���\h$�1
e]P(��2��{��K]I���I�t'���c�5�'�Ev�1(7��[�b�U�Ny ��nZ�ht��Ser��5�afx��������*��t��9K�\� ~����z��H�6��B����4�3�OZ/�3,/��\�%����[|��=]t�)�����iLsx�%�r�f{�^�D���u �D��I�4��G�?ue>�!m�L�<��]|�%P�T�p~ �y1h����%��vr1�,{I_��P�i�KT'��M�d��E��u��*�r���V���A]���-�;���$�e�"m���I2�4��3�����6<c{85{i�$2����y�;����qz���d�����8l�7�6#��L������L�5` �\���%A��w���P��p�{H�����R�����O����|Z1%%��> O�u���9�1��2hoT�sn����\B@����OB;�L l��v���v��Y960��8m��n�f8zHS��C���`B�T��{����
��>^��m��q|��8m����
���n��m�f�G��M^������s���� ��;�������w����s>�pr��;�_^�Z6��Z�#� V�|�NW�v7&��:�D��� n���j'#_M�kg$ID>����A���2�M�kg9����;����ttXr�T_�F�0M����+��_���o�8���+M�=�"�~B
�H+����_bt�
��8�Q�d�O�|4�|�me��y6�VV�2�0l�%�P�I|�{"|�� ��F��R'�����^Y6�:����Q�r�)Q]�
��q�I\�Rg����������C4�0�QM8h_�;2s��0��r�����M �a�-����8�k�6��������m�d?�9l|:x}�h����������u����aP�t�X��������i���H|m*Q�Mb#u}#�n�C�c~�%��D�o��j�]#"��m��VY��P��A<����q��P%T&b6�k�����hv�ag��*�9�O��3M| 'HL��i�n��/�xS��E}~��z�LI��D���m'���2���%I�����j;&"���K���T��5��x����%�(�w��D��������JXg2]$.��$��L�By�l��Z��%-����C�����`��;eEI{1omn?������=�~O�~=�������i�C���Z.8u�*[DU�1���{�����3��Lp���j�{!�]l�����S�T-�}�k�� ��c5�� ������'�8k���7(��Y��>f�c�Nx�^�,���cQ�C��]�91,3�o���h�4B��O��5U��LO/A#���K�-|�t�I�+���f�����"����j��,I�S7������������i�$� F�Zl�l�)3��"�L���xw�R����SIq���w��$wdC�j������� %,��t\/�:Ls��������`{@n�� �J����iv�V����#,o�E���Z�{��S#������s�|���5��9�S�����(d�bw:{�*/��[��w�������'6-��0����?���un�w<jG���g�"����)���yrLF�&yE������?=����~M}R�x��]�]��D��{���N1I�{�=���� �x����'��U �$���YZA��ek/�Z8�+�C��C��.��q�Icm��*�C�$�>�p1'�z�� P��{�n��pi �o���[�l�^4
LX�\Z� aW������<�8�D���0hP�8g���f���������f�����p[�6�.K��4�PLv����a��!��Ud��O��F�
5���������^�y#��Ik�����L-&�����d��K�X���������M������������v2C_���KL��J
��`N�9�+��M�������h���c �O��P�M���=��=�����i������,e�0%\�����0��v#�B4����`��`���ThMz���y��`��|�*+�B��)U������@p�Pb�S(tiZ��|�eL�������;)X:�|�����emut�rAf��\���H�R�<LF��sU�D#��+^�8E������#Y&}4y��O
�%o���t�(�g��rXj8����e �(�I.�Uy��urA�]Z`f���{�HK���
�a��������!�������V=@�&�Q|���C��!��!���\�,!ehs��=�#Za��K�o�W��F��M?������E���Q������ �W�sI`��d�k�;��X���s��� 'x�u��st|@�%^�����R'��aW�0���9�2��b�K���G���a�$����ht�Z������@��n�|=����q�43�P��b[��<X����K�m8�W��Q'dsO`>pw�nl�d�������m���)(� (���{���#�����n�����m��a��3����?�v���f-�y�_f�
���Usp��)��R�
�7���`Cs0~.�(��>�e�N���s
���A��w��K0�WU���zN�h0�;mE���]����������2ka`�\���-�����Y���&-=������L��c��
N]���.������k}�m��u�q�u}�o}���g��!�8kR�oc�v{l>��� ok��_�w�Z�r����pz'�\v�sRJb*F%��NI�V�X����m��9t�1����Ksh~��$�:7�]Pc�
3��djR����d��QH���$�0��I��SR��)*��
N�_�:G}8k4w���u7�~:8z�lcr�8�n���$��J��'�v� )`2����������SH�g�$u���P)��P��������rR��t'|%�&kY�����X�vf��T)��h�p��$qs�}
���nYPBM��e�%QsV��2sq}J��r�'��a��� �\���C/yu��,�5�!�mV�>�{xv$u�c������%����
G �\s(V�!�x��QF��!L_r��n.jP���4��iY��=��y���
/�H�&�(��8��Y ���dC���� �)tOV��?��,E��q�qy0����7�g���K�nZ�`O��j��A���e��2�0Q�Xy,��?aYlcU���qM\���h�.<�����jzyN}�����%�E2����_����J��l��9 I+�W.��<�
�������H��U�������S�:
=� ���Qm_��=`P^�����U����m�t1n�����[�
Ku�*���"�;25�4��\�����%���"9A����8�d����z�
��<�.�U�sP��~���Z���A`�_���w��d�
��]��eL"��F
�������8�,R>/��N��`!lm:�KL��
i��0z����]�6�����>�(���cs�%��
��N7�#�c($�����l��
�c���K��r�7~�&0�����hi������y+�����2�ak�w����P������S+[���?&���4���Aa~�� 7��R�`���@)L����rCI�L�x�X���2��&@�����P�6�m�[| Gu4��e�Gn:���!S�#��2�J����#t����]*qd�Y�q�f��i��������{����������7��S�K+�� �g��K�x����_4���T��3���R�����8���P��gn����/�|=n�i��B���Q�I���-��sp�G�M.0x������JE�P=�hp�K��.^�|���Pe�Lz��m�h����E��x0<�F 5F�/%�`�>��:�RhN�� C>#�><o)n1��>��~��i�WC��5�K/{WiwN�H�5�W8A�sbQ%(Z�{`� �����
-|~<��rg�78��S<��Hu�Z����(?������ �t�"�M��4�S���D�v��� >A
�����w�������Lu����Ph�Z��h���H��j��O�
^c�c��!�8C��;���k�5��jzw��%����`�$_'7�����������_���?��*,F�x(��v����2N
S����**�&3�
b�+I��)8�� ���V:���f��h�~��%�Y�(�����J�".���"K��>M��.�X ��`z��~C�D����b�����(
�H��dD�0z���T�9h��4EXf,�?�E�D��#���w���18��%Q[�S����{jg8�62��"�I;��-"���U��K�sy`�����eWwPc�K������3<�4
tLF +LbdF�>���:i?��f����PY�������\����T��e[���e��{���)��ia���������.��O�T�)�/�g��@�bfb����h�����m��;J�r�z`
F�.h����"@l�2� Z���o��0��>{�������c��q���n���X�g��e�^;cA��x�f�(���� J���������m��Z� �`cb<���@�h�z�����.��� k���p���g���[D�?����_��r�^����|*���1��]uj�3lJ�^�m:je1�Ni��y�B1��1��d*����Z0�
�!�����ph3S������H�Z���"jw3%k3#np�H������I��M&Au������u��L�m�.����3 WQ�u�P�Y���i0���+�;�kI���B�q���������wn{V1u���T�oY2s��a����J�I5.E�!xc���|����h�N��
�`�& ����"���V��9�����
�P�R�-���{��n�
$���1HPKs�,[�i�?���s\0!���;q��j�6gq������3�2s<�0=�;.�,\�|O�S����H��u�S'AA����_�����|O��������6�p�����5�l��\ A�07�,�l�'�_����4���� I�j��Q�}�O|����/4'�h�����,�d���'�R`F��� ��)�����`������Q�|�\>�v}�u�?z�x�{<���E���%� �x���eLt�2-��;�~v,�v��e�j����I)i��y�.��[r���nJ�8����b��y3V���. e�.����#������E�+%��(y&)K�<j
�m������D�K���(|A_����O �J�z�-��������j����n3~��J��� �
mt, �� zo`����a
N�w������G�&;�#�X�^�{�2������3�~�����qr>�vs������.%u��<`0�TD�D���C�pd����^2 Ni.g���2��$�6�K�\��q�S��/�]s��m������uD��� ����?O�� D�\d&U��`/$��}cF6�#<u��zmF`�2��W���qt����rov�wk���+�l(�X���c�S�GTe���8N�)�V���5�4�dm \y�I�����`v&S�4��$4�[�� ��s���f@�5�m��o{^�>�G����jrh�K9�
7�,5�L������-4;,�Jf�Q5M��G�J�|x��m|�f0�^��`*\���
"D���8k����5��� �t�^Q��X�W����am���������B�iKQ04-��NA`���l����1�0��g"���>�����|)��'���h�g�S���{�����_���WQ�k1��5e�X� ���"�M��_�A=��.E����[W���g�j�!���e W[�� ns�G���8���9�|�����b7��Kd2��9����Z1b��"7[B��HE+w!�@��A�6KkSv�Evg��C�U
�= >\vo�tL�]�4��w�@���\r�����Xa�v�B�����a2ZfE*1�|��R��y�h�W�zuS$���LahP��Y*��;�?���}��lBy,:�.�;HN���w�,�Mu��(�[��B��~�������T:{�����b�MB�2b��x��A6�E�o�Ew�"[{��v6UP���f5�L�V�VOP���g^�����S����&