logical changeset generation v5
Hi!
I am rather pleased to announce the next version of the changeset
extraction patchset. Thanks to help from a large number of people I
think we are slowly getting to the point where it is getting
committable.
Since the last submitted version
(20121115002746.GA7692@awork2.anarazel.de) a large number of fixes and
the result of good amount of review has been added to the tree. All
bugs known to me have been fixed.
Fixes include:
* synchronous replication support
* don't peg the xmin for user tables, do it only for catalog ones.
* arbitrarily large transaction support by spilling large transactions
to disk
* spill snapshots to disk, so we can restart without waiting for a new
snapshot to be built
* Don't read all WAL from the establishment of a logical slot
* tests via SQL interface to changeset extraction
The todo list includes:
* morph the "logical slot" interface into being "replication slots" that
can also be used by streaming replication
* move some more code from snapbuild.c to decode.c to remove a largely
duplicated switch
* do some more header/comment cleanup & clarification
* move pg_receivellog into its own directory in src/bin or contrib/.
* user/developer level documentation
The patch series currently has two interfaces to logical decoding. One -
which is primarily useful for pg_regress style tests and playing around
- is SQL based, the other one uses a walsender replication connection.
A quick demonstration of the SQL interface (server needs to be started
with wal_level = logical and max_logical_slots > 0):
=# CREATE EXTENSION test_logical_decoding;
=# SELECT * FROM init_logical_replication('regression_slot', 'test_decoding');
slotname | xlog_position
-----------------+---------------
regression_slot | 0/17D5908
(1 row)
=# CREATE TABLE foo(id serial primary key, data text);
=# INSERT INTO foo(data) VALUES(1);
=# UPDATE foo SET id = -id, data = ':'||data;
=# DELETE FROM foo;
=# DROP TABLE foo;
=# SELECT * FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '0');
location | xid | data
-----------+-----+--------------------------------------------------------------------------------
0/17D59B8 | 695 | BEGIN
0/17D59B8 | 695 | COMMIT
0/17E8B58 | 696 | BEGIN
0/17E8B58 | 696 | table "foo": INSERT: id[int4]:1 data[text]:1
0/17E8B58 | 696 | COMMIT
0/17E8CA8 | 697 | BEGIN
0/17E8CA8 | 697 | table "foo": UPDATE: old-pkey: id[int4]:1 new-tuple: id[int4]:-1 data[text]::1
0/17E8CA8 | 697 | COMMIT
0/17E8E50 | 698 | BEGIN
0/17E8E50 | 698 | table "foo": DELETE: id[int4]:-1
0/17E8E50 | 698 | COMMIT
0/17E9058 | 699 | BEGIN
0/17E9058 | 699 | COMMIT
(13 rows)
=# SELECT * FROM pg_stat_logical_decoding ;
slot_name | plugin | database | active | xmin | restart_decoding_lsn
-----------------+---------------+----------+--------+------+----------------------
regression_slot | test_decoding | 12042 | f | 695 | 0/17D58D0
(1 row)
=# SELECT * FROM stop_logical_replication('regression_slot');
stop_logical_replication
--------------------------
0
The walsender interface has the same calls
INIT_LOGICAL_REPLICATION 'slot' 'plugin';
START_LOGICAL_REPLICATION 'slot' restart_lsn [(option value)*];
STOP_LOGICAL_REPLICATION 'slot';
The only difference is that START_LOGICAL_REPLICATION can stream changes
and it can support synchronous replication.
The output seen in the 'data' column is produced by a so called 'output
plugin' which users of the facility can write to suit their needs. They
can be written by implementing 5 functions in the shared object that's
passed to init_logical_replication() above:
* pg_decode_init (optional)
* pg_decode_begin_txn
* pg_decode_change
* pg_decode_commit_txn
* pg_decode_cleanup (optional)
The most interesting function pg_decode_change get's passed a structure
containing old/new versions of the row, the 'struct Relation' belonging
to it and metainformation about the transaction.
The output plugin can rely on syscache lookups et al. to decode the
changed tuple in whatever fashion it wants.
I'd like to invite reviewers to first look at:
* the output plugin interface
* the walsender/SRF interface
* patch 12 which contains most of the code
When reading the code, the information flow during decoding might be
interesting:
---------------
+---------------+
| XLogReader |
+---------------+
|
XLOG Records
|
v
+---------------+
| decode.c |
+---------------+
| |
| |
v |
+---------------+ |
| snapbuild.c | HeapTupleData
+---------------+ |
| |
catalog snapshots |
| |
v v
+---------------+
|reorderbuffer.c|
+---------------+
|
HeapTuple & Metadata
|
v
+---------------+
| Output Plugin |
+---------------+
|
Whatever you want
|
v
+---------------+
| Output Handler|
| |
|WalSnd or SRF |
+---------------+
---------------
Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains
the tests for the feature. Uses 0014
0017: Snapshot building docs; not required
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4
On 2013-06-15 00:48:17 +0200, Andres Freund wrote:
Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains
the tests for the feature. Uses 0014
0017: Snapshot building docs; not required
Version v5-01 attached
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0010-wal_decoding-Log-xl_running_xact-s-at-a-higher-frequ.patchtext/x-patch; charset=us-asciiDownload
>From a691315e7bc4523fc743a826049daa0680c50933 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 10/17] wal_decoding: Log xl_running_xact's at a higher
frequency than checkpoints are done
Do so in the background writer which seems to be the best choice as its
regularly running and shouldn't be busy for too long without getting back into
its main loop.
Also mark xl_standby records as being relevant for async commit so the wal
writer writes them out soonish.
This might also be beneficial for HS as it would make it faster to hit a spot
where no (old) transactions are running anymroe.
---
src/backend/postmaster/bgwriter.c | 47 +++++++++++++++++++++++++++++++++++++++
src/backend/storage/ipc/standby.c | 22 +++++++++++++++---
src/include/storage/standby.h | 2 +-
3 files changed, 67 insertions(+), 4 deletions(-)
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 286ae86..2adb36f 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -54,9 +54,11 @@
#include "storage/shmem.h"
#include "storage/smgr.h"
#include "storage/spin.h"
+#include "storage/standby.h"
#include "utils/guc.h"
#include "utils/memutils.h"
#include "utils/resowner.h"
+#include "utils/timestamp.h"
/*
@@ -76,6 +78,10 @@ int BgWriterDelay = 200;
static volatile sig_atomic_t got_SIGHUP = false;
static volatile sig_atomic_t shutdown_requested = false;
+static TimestampTz last_logged_snap_ts;
+static XLogRecPtr last_logged_snap_recptr = InvalidXLogRecPtr;
+static uint32 log_snap_interval_ms = 15000;
+
/* Signal handlers */
static void bg_quickdie(SIGNAL_ARGS);
@@ -142,6 +148,12 @@ BackgroundWriterMain(void)
CurrentResourceOwner = ResourceOwnerCreate(NULL, "Background Writer");
/*
+ * We just started, assume there has been either a shutdown or
+ * end-of-recovery snapshot.
+ */
+ last_logged_snap_ts = GetCurrentTimestamp();
+
+ /*
* Create a memory context that we will do all our work in. We do this so
* that we can reset the context during error recovery and thereby avoid
* possible memory leaks. Formerly this code just ran in
@@ -276,6 +288,41 @@ BackgroundWriterMain(void)
}
/*
+ * Log a new xl_running_xacts every now and then so replication can get
+ * into a consistent state faster and clean up resources more
+ * frequently. The costs of this are relatively low, so doing it 4
+ * times a minute seems fine.
+ *
+ * We assume the interval for writing xl_running_xacts is significantly
+ * bigger than BgWriterDelay, so we don't complicate the overall
+ * timeout handling but just assume we're going to get called often
+ * enough even if hibernation mode is active. It's not that important
+ * that log_snap_interval_ms is met strictly.
+ *
+ * We do this logging in the bgwriter as its the only process thats run
+ * regularly and returns to its mainloop all the
+ * time. E.g. Checkpointer, when active, is barely every in its
+ * mainloop.
+ */
+ if (XLogStandbyInfoActive() && !RecoveryInProgress())
+ {
+ TimestampTz timeout = 0;
+ timeout = TimestampTzPlusMilliseconds(last_logged_snap_ts,
+ log_snap_interval_ms);
+
+ /*
+ * only log if enough time has passed and some xlog record has been
+ * inserted.
+ */
+ if (GetCurrentTimestamp() >= timeout &&
+ last_logged_snap_recptr != GetXLogInsertRecPtr())
+ {
+ last_logged_snap_recptr = LogStandbySnapshot();
+ last_logged_snap_ts = GetCurrentTimestamp();
+ }
+ }
+
+ /*
* Sleep until we are signaled or BgWriterDelay has elapsed.
*
* Note: the feedback control loop in BgBufferSync() expects that we
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index c704412..e85733b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -42,7 +42,7 @@ static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlis
ProcSignalReason reason);
static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
-static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
+static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
@@ -853,10 +853,13 @@ standby_redo(XLogRecPtr lsn, XLogRecord *record)
* currently running xids, performed by StandbyReleaseOldLocks().
* Zero xids should no longer be possible, but we may be replaying WAL
* from a time when they were possible.
+ *
+ * Returns the RecPtr of the last inserted record.
*/
-void
+XLogRecPtr
LogStandbySnapshot(void)
{
+ XLogRecPtr recptr;
RunningTransactions running;
xl_standby_lock *locks;
int nlocks;
@@ -877,8 +880,11 @@ LogStandbySnapshot(void)
*/
running = GetRunningTransactionData();
LogCurrentRunningXacts(running);
+
/* GetRunningTransactionData() acquired XidGenLock, we must release it */
LWLockRelease(XidGenLock);
+
+ return recptr;
}
/*
@@ -889,7 +895,7 @@ LogStandbySnapshot(void)
* is a contiguous chunk of memory and never exists fully until it is
* assembled in WAL.
*/
-static void
+static XLogRecPtr
LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
{
xl_running_xacts xlrec;
@@ -939,6 +945,16 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
CurrRunningXacts->oldestRunningXid,
CurrRunningXacts->latestCompletedXid,
CurrRunningXacts->nextXid);
+
+ /*
+ * Ensure running xact information is synced to disk not too far in the
+ * future, logical standby's need this soon after initialization. We don't
+ * want to stall anything though, so we let the wal writer do it during
+ * normal operation.
+ */
+ XLogSetAsyncXactLSN(recptr);
+
+ return recptr;
}
/*
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 7f3f051..d4a8fe4 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -113,6 +113,6 @@ typedef RunningTransactionsData *RunningTransactions;
extern void LogAccessExclusiveLock(Oid dbOid, Oid relOid);
extern void LogAccessExclusiveLockPrepare(void);
-extern void LogStandbySnapshot(void);
+extern XLogRecPtr LogStandbySnapshot(void);
#endif /* STANDBY_H */
--
1.8.2.rc2.4.g7799588.dirty
0011-wal_decoding-copydir-make-fsync_fname-public.patchtext/x-patch; charset=us-asciiDownload
>From 302aa05b8f4501cccde2ee909349b04b4469e093 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 11/17] wal_decoding: copydir: make fsync_fname public
This probably should be somewhere else, its a generally useful function, not
really related to copying directories. fd.[ch]?
---
src/backend/storage/file/copydir.c | 5 +----
src/include/storage/copydir.h | 1 +
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 391359c..93ca13f 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -27,9 +27,6 @@
#include "miscadmin.h"
-static void fsync_fname(char *fname, bool isdir);
-
-
/*
* copydir: copy a directory
*
@@ -215,7 +212,7 @@ copy_file(char *fromfile, char *tofile)
* Try to fsync directories but ignore errors that indicate the OS
* just doesn't allow/require fsyncing directories.
*/
-static void
+void
fsync_fname(char *fname, bool isdir)
{
int fd;
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index a087cce..3bccf3b 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -15,5 +15,6 @@
extern void copydir(char *fromdir, char *todir, bool recurse);
extern void copy_file(char *fromfile, char *tofile);
+extern void fsync_fname(char *fname, bool isdir);
#endif /* COPYDIR_H */
--
1.8.2.rc2.4.g7799588.dirty
0012-wal_decoding-Add-information-about-a-tables-primary-.patchtext/x-patch; charset=us-asciiDownload
>From 5f5072e4abf92e33a2629ab86766dbe48da141f6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 12/17] wal_decoding: Add information about a tables primary
key to struct RelationData
'rd_primary' now contains the Oid of an index over uniquely identifying
columns. Several types of indexes are interesting and are collected in that
order:
* Primary Key
* oid index
* the first (OID order) unique, immediate, non-partial and
non-expression index over one or more NOT NULL'ed columns
To gather rd_primary value RelationGetIndexList() needs to have been called.
This is helpful because for logical replication we frequently - on the sending
and receiving side - need to lookup that index and RelationGetIndexList already
gathers all the necessary information.
This could be used to replace tablecmd.c's transformFkeyGetPrimaryKey, but
would change the meaning of that, so it seems to require additional discussion.
---
src/backend/utils/cache/relcache.c | 52 +++++++++++++++++++++++++++++++++++---
src/include/utils/rel.h | 12 +++++++++
2 files changed, 61 insertions(+), 3 deletions(-)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index f114038..3f7386e 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3463,7 +3463,9 @@ RelationGetIndexList(Relation relation)
ScanKeyData skey;
HeapTuple htup;
List *result;
- Oid oidIndex;
+ Oid oidIndex = InvalidOid;
+ Oid pkeyIndex = InvalidOid;
+ Oid candidateIndex = InvalidOid;
MemoryContext oldcxt;
/* Quick exit if we already computed the list. */
@@ -3520,17 +3522,61 @@ RelationGetIndexList(Relation relation)
Assert(!isnull);
indclass = (oidvector *) DatumGetPointer(indclassDatum);
+ if (!IndexIsValid(index))
+ continue;
+
/* Check to see if it is a unique, non-partial btree index on OID */
- if (IndexIsValid(index) &&
- index->indnatts == 1 &&
+ if (index->indnatts == 1 &&
index->indisunique && index->indimmediate &&
index->indkey.values[0] == ObjectIdAttributeNumber &&
indclass->values[0] == OID_BTREE_OPS_OID &&
heap_attisnull(htup, Anum_pg_index_indpred))
oidIndex = index->indexrelid;
+
+ if (index->indisunique &&
+ index->indimmediate &&
+ heap_attisnull(htup, Anum_pg_index_indpred))
+ {
+ /* always prefer primary keys */
+ if (index->indisprimary)
+ pkeyIndex = index->indexrelid;
+ else if (!OidIsValid(pkeyIndex)
+ && !OidIsValid(oidIndex)
+ && !OidIsValid(candidateIndex))
+ {
+ int key;
+ bool found = true;
+ for (key = 0; key < index->indnatts; key++)
+ {
+ int16 attno = index->indkey.values[key];
+ Form_pg_attribute attr;
+ /* internal column, like oid */
+ if (attno <= 0)
+ continue;
+
+ attr = relation->rd_att->attrs[attno - 1];
+ if (!attr->attnotnull)
+ {
+ found = false;
+ break;
+ }
+ }
+ if (found)
+ candidateIndex = index->indexrelid;
+ }
+ }
}
systable_endscan(indscan);
+
+ if (OidIsValid(pkeyIndex))
+ relation->rd_primary = pkeyIndex;
+ /* prefer oid indexes over normal candidate ones */
+ else if (OidIsValid(oidIndex))
+ relation->rd_primary = oidIndex;
+ else if (OidIsValid(candidateIndex))
+ relation->rd_primary = candidateIndex;
+
heap_close(indrel, AccessShareLock);
/* Now save a copy of the completed list in the relcache entry. */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 58cc3f7..bd2466e 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -111,6 +111,18 @@ typedef struct RelationData
TriggerDesc *trigdesc; /* Trigger info, or NULL if rel has none */
/*
+ * The 'best' primary or candidate key that has been found, only set
+ * correctly if RelationGetIndexList has been called/rd_indexvalid > 0.
+ *
+ * Indexes are chosen in the following order:
+ * * Primary Key
+ * * oid index
+ * * the first (OID order) unique, immediate, non-partial and
+ * non-expression index over one or more NOT NULL'ed columns
+ */
+ Oid rd_primary;
+
+ /*
* rd_options is set whenever rd_rel is loaded into the relcache entry.
* Note that you can NOT look into rd_rel for this data. NULL means "use
* defaults".
--
1.8.2.rc2.4.g7799588.dirty
0013-wal_decoding-Introduce-wal-decoding-via-catalog-time.patchtext/x-patch; charset=us-asciiDownload
>From f13829d20b493a3642082ea9119444495ac75996 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 13/17] wal_decoding: Introduce wal decoding via catalog
timetravel
This introduces several things:
* 'reorderbuffer' module which reassembles transactions from a stream of interspersed changes
* 'snapbuilder' which builds catalog snapshots so that tuples from wal can be understood
* logging more data into wal to facilitate logical decoding
* wal decoding into an reorderbuffer
* shared library output plugins with 5 callbacks
* init
* begin
* change
* commit
* walsender infrastructur to stream out changes and to keep the global xmin low enough
* INIT_LOGICAL_REPLICATION $plugin; waits till a consistent snapshot is built and returns
* initial LSN
* replication slot identifier
* id of a pg_export() style snapshot
* START_LOGICAL_REPLICATION $id $lsn; streams out changes
* uses named output plugins for output specification
Todo:
* better integrated testing infrastructure
* more docs about the internals
Lowlevel:
* resource owner handling is suboptimal
* invalidations from uninteresting transactions (e.g. from other databases, old ones)
need to be processed anyway
* error handling in walsender is suboptimal
* pg_receivellog needs to send a reply immediately when postgres is shutting down
Input, Testing and Review by:
Heikki Linnakangas
Kevin Grittner
Michael Paquier
Abhijit Menon-Sen
Peter Gheogegan
Robert Haas
Simon Riggs
Steve Singer
Code By:
Andres Freund
With code contributions by:
Abhijit Menon-Sen
Craig Ringer
Alvaro Herrera
---
src/backend/access/common/reloptions.c | 10 +
src/backend/access/heap/heapam.c | 466 ++++-
src/backend/access/heap/pruneheap.c | 2 +
src/backend/access/index/indexam.c | 14 +-
src/backend/access/rmgrdesc/heapdesc.c | 9 +
src/backend/access/rmgrdesc/xlogdesc.c | 1 +
src/backend/access/transam/twophase.c | 4 +-
src/backend/access/transam/xact.c | 22 +-
src/backend/access/transam/xlog.c | 12 +-
src/backend/catalog/catalog.c | 14 +-
src/backend/catalog/index.c | 14 +-
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/analyze.c | 2 +-
src/backend/commands/cluster.c | 2 +
src/backend/commands/trigger.c | 3 +-
src/backend/commands/vacuum.c | 5 +-
src/backend/commands/vacuumlazy.c | 5 +-
src/backend/postmaster/postmaster.c | 7 +-
src/backend/replication/Makefile | 2 +
src/backend/replication/logical/Makefile | 19 +
src/backend/replication/logical/decode.c | 556 +++++
src/backend/replication/logical/logical.c | 1047 ++++++++++
src/backend/replication/logical/logicalfuncs.c | 361 ++++
src/backend/replication/logical/reorderbuffer.c | 2449 +++++++++++++++++++++++
src/backend/replication/logical/snapbuild.c | 1930 ++++++++++++++++++
src/backend/replication/repl_gram.y | 75 +-
src/backend/replication/repl_scanner.l | 55 +-
src/backend/replication/walreceiver.c | 2 +-
src/backend/replication/walsender.c | 738 ++++++-
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procarray.c | 58 +-
src/backend/storage/ipc/standby.c | 17 +-
src/backend/utils/cache/inval.c | 4 +-
src/backend/utils/cache/relcache.c | 113 +-
src/backend/utils/misc/guc.c | 12 +
src/backend/utils/misc/postgresql.conf.sample | 11 +-
src/backend/utils/time/snapmgr.c | 5 +-
src/backend/utils/time/tqual.c | 251 ++-
src/bin/initdb/initdb.c | 4 +-
src/bin/pg_controldata/pg_controldata.c | 2 +
src/include/access/heapam_xlog.h | 59 +-
src/include/access/transam.h | 5 +
src/include/access/xlog.h | 8 +-
src/include/access/xlogreader.h | 12 +-
src/include/catalog/catalog.h | 1 +
src/include/catalog/pg_proc.h | 6 +
src/include/commands/vacuum.h | 2 +-
src/include/nodes/nodes.h | 3 +
src/include/nodes/replnodes.h | 35 +
src/include/replication/decode.h | 20 +
src/include/replication/logical.h | 198 ++
src/include/replication/logicalfuncs.h | 19 +
src/include/replication/output_plugin.h | 73 +
src/include/replication/reorderbuffer.h | 320 +++
src/include/replication/snapbuild.h | 75 +
src/include/replication/walsender_private.h | 6 +-
src/include/storage/itemptr.h | 3 +
src/include/storage/lwlock.h | 1 +
src/include/storage/procarray.h | 2 +-
src/include/storage/sinval.h | 2 +
src/include/utils/inval.h | 2 +-
src/include/utils/rel.h | 30 +-
src/include/utils/relcache.h | 11 +-
src/include/utils/snapmgr.h | 3 +
src/include/utils/tqual.h | 33 +-
src/test/regress/expected/logical.out | 7 +
src/test/regress/expected/rules.out | 9 +-
src/test/regress/sql/logical.sql | 3 +
src/tools/pgindent/typedefs.list | 40 +
69 files changed, 9101 insertions(+), 203 deletions(-)
create mode 100644 src/backend/replication/logical/Makefile
create mode 100644 src/backend/replication/logical/decode.c
create mode 100644 src/backend/replication/logical/logical.c
create mode 100644 src/backend/replication/logical/logicalfuncs.c
create mode 100644 src/backend/replication/logical/reorderbuffer.c
create mode 100644 src/backend/replication/logical/snapbuild.c
create mode 100644 src/include/replication/decode.h
create mode 100644 src/include/replication/logical.h
create mode 100644 src/include/replication/logicalfuncs.h
create mode 100644 src/include/replication/output_plugin.h
create mode 100644 src/include/replication/reorderbuffer.h
create mode 100644 src/include/replication/snapbuild.h
create mode 100644 src/test/regress/expected/logical.out
create mode 100644 src/test/regress/sql/logical.sql
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c439702..a406979 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -62,6 +62,14 @@ static relopt_bool boolRelOpts[] =
},
{
{
+ "treat_as_catalog_table",
+ "Treat table as a catalog table for the purpose of logical replication",
+ RELOPT_KIND_HEAP
+ },
+ false
+ },
+ {
+ {
"fastupdate",
"Enables \"fast update\" feature for this GIN index",
RELOPT_KIND_GIN
@@ -1152,6 +1160,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_scale_factor)},
{"security_barrier", RELOPT_TYPE_BOOL,
offsetof(StdRdOptions, security_barrier)},
+ {"treat_as_catalog_table", RELOPT_TYPE_BOOL,
+ offsetof(StdRdOptions, treat_as_catalog_table)},
};
options = parseRelOptions(reloptions, validate, kind, &numoptions);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fdf0ccd..e3213fa 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -85,12 +85,14 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
- HeapTuple newtup, bool all_visible_cleared,
- bool new_all_visible_cleared);
+ HeapTuple newtup, HeapTuple old_idx_tup,
+ bool all_visible_cleared, bool new_all_visible_cleared);
static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
- Bitmapset *hot_attrs, Bitmapset *key_attrs,
- bool *satisfies_hot, bool *satisfies_key,
- HeapTuple oldtup, HeapTuple newtup);
+ Bitmapset *hot_attrs,
+ Bitmapset *key_attrs, Bitmapset *ckey_attrs,
+ bool *satisfies_hot, bool *satisfies_key,
+ bool *satisfies_ckey,
+ HeapTuple oldtup, HeapTuple newtup);
static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
uint16 old_infomask2, TransactionId add_to_xmax,
LockTupleMode mode, bool is_update,
@@ -108,6 +110,8 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
static bool ConditionalMultiXactIdWait(MultiXactId multi,
MultiXactStatus status, int *remaining,
uint16 infomask);
+static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
+static HeapTuple ExtractKeyTuple(Relation rel, HeapTuple tup);
/*
@@ -339,8 +343,10 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- Assert(TransactionIdIsValid(RecentGlobalXmin));
- heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
+ if (IsSystemRelation(scan->rs_rd) || RelationIsDoingTimetravel(scan->rs_rd))
+ heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
+ else
+ heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalDataXmin);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1726,10 +1732,16 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
*/
if (!skip)
{
+ /* setup the redirected t_self for the benefit of timetravel access */
+ ItemPointerSet(&(heapTuple->t_self), BufferGetBlockNumber(buffer), offnum);
+
/* If it's visible per the snapshot, we must return it */
valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);
CheckForSerializableConflictOut(valid, relation, heapTuple,
buffer, snapshot);
+ /* reset original, non-redirected, tid */
+ heapTuple->t_self = *tid;
+
if (valid)
{
ItemPointerSetOffsetNumber(tid, offnum);
@@ -2084,11 +2096,24 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- XLogRecData rdata[3];
+ XLogRecData rdata[4];
Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
+ bool need_tuple_data;
+
+ /*
+ * For logical replication, we need the tuple even if we're doing a
+ * full page write, so make sure to log it separately. (XXX We could
+ * alternatively store a pointer into the FPW).
+ *
+ * Also, if this is a catalog, we need to transmit combocids to
+ * properly decode, so log that as well.
+ */
+ need_tuple_data = RelationIsLogicallyLogged(relation);
+ if (RelationIsDoingTimetravel(relation))
+ log_heap_new_cid(relation, heaptup);
- xlrec.all_visible_cleared = all_visible_cleared;
+ xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
xlrec.target.node = relation->rd_node;
xlrec.target.tid = heaptup->t_self;
rdata[0].data = (char *) &xlrec;
@@ -2107,18 +2132,35 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
*/
rdata[1].data = (char *) &xlhdr;
rdata[1].len = SizeOfHeapHeader;
- rdata[1].buffer = buffer;
+ rdata[1].buffer = need_tuple_data ? InvalidBuffer : buffer;
rdata[1].buffer_std = true;
rdata[1].next = &(rdata[2]);
/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
rdata[2].data = (char *) heaptup->t_data + offsetof(HeapTupleHeaderData, t_bits);
rdata[2].len = heaptup->t_len - offsetof(HeapTupleHeaderData, t_bits);
- rdata[2].buffer = buffer;
+ rdata[2].buffer = need_tuple_data ? InvalidBuffer : buffer;
rdata[2].buffer_std = true;
rdata[2].next = NULL;
/*
+ * add record for the buffer without actual content thats removed if
+ * fpw is done for that buffer
+ */
+ if (need_tuple_data)
+ {
+ rdata[2].next = &(rdata[3]);
+
+ rdata[3].data = NULL;
+ rdata[3].len = 0;
+ rdata[3].buffer = buffer;
+ rdata[3].buffer_std = true;
+ rdata[3].next = NULL;
+
+ xlrec.flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+ }
+
+ /*
* If this is the single and first tuple on page, we can reinit the
* page instead of restoring the whole thing. Set flag, and hide
* buffer references from XLogInsert.
@@ -2127,7 +2169,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
{
info |= XLOG_HEAP_INIT_PAGE;
- rdata[1].buffer = rdata[2].buffer = InvalidBuffer;
+ rdata[1].buffer = rdata[2].buffer = rdata[3].buffer = InvalidBuffer;
}
recptr = XLogInsert(RM_HEAP_ID, info, rdata);
@@ -2253,6 +2295,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
Page page;
bool needwal;
Size saveFreeSpace;
+ bool need_tuple_data = RelationIsLogicallyLogged(relation);
+ bool need_cids = RelationIsDoingTimetravel(relation);
needwal = !(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation);
saveFreeSpace = RelationGetTargetPageFreeSpace(relation,
@@ -2339,7 +2383,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
{
XLogRecPtr recptr;
xl_heap_multi_insert *xlrec;
- XLogRecData rdata[2];
+ XLogRecData rdata[3];
uint8 info = XLOG_HEAP2_MULTI_INSERT;
char *tupledata;
int totaldatalen;
@@ -2369,7 +2413,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
/* the rest of the scratch space is used for tuple data */
tupledata = scratchptr;
- xlrec->all_visible_cleared = all_visible_cleared;
+ xlrec->flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
xlrec->node = relation->rd_node;
xlrec->blkno = BufferGetBlockNumber(buffer);
xlrec->ntuples = nthispage;
@@ -2401,6 +2445,13 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
datalen);
tuphdr->datalen = datalen;
scratchptr += datalen;
+
+ /*
+ * We don't use heap_multi_insert for catalog tuples yet, but
+ * better be prepared...
+ */
+ if (need_cids)
+ log_heap_new_cid(relation, heaptup);
}
totaldatalen = scratchptr - tupledata;
Assert((scratchptr - scratch) < BLCKSZ);
@@ -2412,17 +2463,33 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
rdata[1].data = tupledata;
rdata[1].len = totaldatalen;
- rdata[1].buffer = buffer;
+ rdata[1].buffer = need_tuple_data ? InvalidBuffer : buffer;
rdata[1].buffer_std = true;
rdata[1].next = NULL;
/*
+ * add record for the buffer without actual content thats removed if
+ * fpw is done for that buffer
+ */
+ if (need_tuple_data)
+ {
+ rdata[1].next = &(rdata[2]);
+
+ rdata[2].data = NULL;
+ rdata[2].len = 0;
+ rdata[2].buffer = buffer;
+ rdata[2].buffer_std = true;
+ rdata[2].next = NULL;
+ xlrec->flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+ }
+
+ /*
* If we're going to reinitialize the whole page using the WAL
* record, hide buffer reference from XLogInsert.
*/
if (init)
{
- rdata[1].buffer = InvalidBuffer;
+ rdata[1].buffer = rdata[2].buffer = InvalidBuffer;
info |= XLOG_HEAP_INIT_PAGE;
}
@@ -2542,6 +2609,9 @@ heap_delete(Relation relation, ItemPointer tid,
bool have_tuple_lock = false;
bool iscombo;
bool all_visible_cleared = false;
+ bool need_tuple_data = RelationNeedsWAL(relation) &&
+ RelationIsLogicallyLogged(relation);
+ HeapTuple idx_tuple = NULL; /* primary key of the tuple */
Assert(ItemPointerIsValid(tid));
@@ -2715,6 +2785,15 @@ l1:
/* replace cid with a combo cid if necessary */
HeapTupleHeaderAdjustCmax(tp.t_data, &cid, &iscombo);
+ /*
+ * Compute primary key tuple before entering the critical section so we
+ * don't PANIC uppon a memory allocation failure.
+ */
+ if (need_tuple_data)
+ {
+ idx_tuple = ExtractKeyTuple(relation, &tp);
+ }
+
START_CRIT_SECTION();
/*
@@ -2767,9 +2846,13 @@ l1:
{
xl_heap_delete xlrec;
XLogRecPtr recptr;
- XLogRecData rdata[2];
+ XLogRecData rdata[4];
- xlrec.all_visible_cleared = all_visible_cleared;
+ /* For logical decode we need combocids to properly decode the catalog */
+ if (RelationIsDoingTimetravel(relation))
+ log_heap_new_cid(relation, &tp);
+
+ xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
xlrec.infobits_set = compute_infobits(tp.t_data->t_infomask,
tp.t_data->t_infomask2);
xlrec.target.node = relation->rd_node;
@@ -2786,6 +2869,34 @@ l1:
rdata[1].buffer_std = true;
rdata[1].next = NULL;
+ /*
+ * Log primary key of the deleted tuple
+ */
+ if (need_tuple_data && idx_tuple != NULL)
+ {
+ xl_heap_header xlhdr;
+
+ xlhdr.t_infomask2 = idx_tuple->t_data->t_infomask2;
+ xlhdr.t_infomask = idx_tuple->t_data->t_infomask;
+ xlhdr.t_hoff = idx_tuple->t_data->t_hoff;
+
+ rdata[1].next = &(rdata[2]);
+ rdata[2].data = (char*)&xlhdr;
+ rdata[2].len = SizeOfHeapHeader;
+ rdata[2].buffer = InvalidBuffer;
+ rdata[2].next = NULL;
+
+ rdata[2].next = &(rdata[3]);
+ rdata[3].data = (char *) idx_tuple->t_data
+ + offsetof(HeapTupleHeaderData, t_bits);
+ rdata[3].len = idx_tuple->t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+ rdata[3].buffer = InvalidBuffer;
+ rdata[3].next = NULL;
+
+ xlrec.flags |= XLOG_HEAP_CONTAINS_OLD_KEY;
+ }
+
recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE, rdata);
PageSetLSN(page, recptr);
@@ -2915,9 +3026,11 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
TransactionId xid = GetCurrentTransactionId();
Bitmapset *hot_attrs;
Bitmapset *key_attrs;
+ Bitmapset *ckey_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
+ HeapTuple old_idx_tuple = NULL;
Page page;
BlockNumber block;
MultiXactStatus mxact_status;
@@ -2933,6 +3046,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool iscombo;
bool satisfies_hot;
bool satisfies_key;
+ bool satisfies_ckey;
bool use_hot_update = false;
bool key_intact;
bool all_visible_cleared = false;
@@ -2960,8 +3074,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
* Note that we get a copy here, so we need not worry about relcache flush
* happening midway through.
*/
- hot_attrs = RelationGetIndexAttrBitmap(relation, false);
- key_attrs = RelationGetIndexAttrBitmap(relation, true);
+ hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL);
+ key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
+ ckey_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_CANDIDATE_KEY);
block = ItemPointerGetBlockNumber(otid);
buffer = ReadBuffer(relation, block);
@@ -3019,9 +3135,9 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitiously arrive at the same key values.
*/
- HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs,
+ HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs, ckey_attrs,
&satisfies_hot, &satisfies_key,
- &oldtup, newtup);
+ &satisfies_ckey, &oldtup, newtup);
if (satisfies_key)
{
*lockmode = LockTupleNoKeyExclusive;
@@ -3491,6 +3607,12 @@ l2:
PageSetFull(page);
}
+ /* compute tuple for loggical logging */
+ if (!satisfies_ckey && RelationIsLogicallyLogged(relation))
+ {
+ old_idx_tuple = ExtractKeyTuple(relation, &oldtup);
+ }
+
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -3566,11 +3688,20 @@ l2:
/* XLOG stuff */
if (RelationNeedsWAL(relation))
{
- XLogRecPtr recptr = log_heap_update(relation, buffer,
- newbuf, &oldtup, heaptup,
- all_visible_cleared,
- all_visible_cleared_new);
+ XLogRecPtr recptr;
+ /* For logical decode we need combocids to properly decode the catalog */
+ if (RelationIsDoingTimetravel(relation))
+ {
+ log_heap_new_cid(relation, &oldtup);
+ log_heap_new_cid(relation, heaptup);
+ }
+
+ recptr = log_heap_update(relation, buffer,
+ newbuf, &oldtup, heaptup,
+ old_idx_tuple,
+ all_visible_cleared,
+ all_visible_cleared_new);
if (newbuf != buffer)
{
PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -3722,18 +3853,23 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* modify columns used in the key.
*/
static void
-HeapSatisfiesHOTandKeyUpdate(Relation relation,
- Bitmapset *hot_attrs, Bitmapset *key_attrs,
+HeapSatisfiesHOTandKeyUpdate(Relation relation, Bitmapset *hot_attrs,
+ Bitmapset *key_attrs, Bitmapset *ckey_attrs,
bool *satisfies_hot, bool *satisfies_key,
+ bool *satisfies_ckey,
HeapTuple oldtup, HeapTuple newtup)
{
int next_hot_attnum;
int next_key_attnum;
+ int next_ckey_attnum;
bool hot_result = true;
bool key_result = true;
- bool key_done = false;
+ bool ckey_result = true;
bool hot_done = false;
+ Assert(bms_is_subset(ckey_attrs, key_attrs));
+ Assert(bms_is_subset(key_attrs, hot_attrs));
+
next_hot_attnum = bms_first_member(hot_attrs);
if (next_hot_attnum == -1)
hot_done = true;
@@ -3742,28 +3878,25 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
next_hot_attnum += FirstLowInvalidHeapAttributeNumber;
next_key_attnum = bms_first_member(key_attrs);
- if (next_key_attnum == -1)
- key_done = true;
- else
+ if (next_key_attnum != -1)
/* Adjust for system attributes */
next_key_attnum += FirstLowInvalidHeapAttributeNumber;
+ next_ckey_attnum = bms_first_member(ckey_attrs);
+ if (next_ckey_attnum != -1)
+ /* Adjust for system attributes */
+ next_ckey_attnum += FirstLowInvalidHeapAttributeNumber;
+
for (;;)
{
int check_now;
bool changed;
- /* both bitmapsets are now empty */
- if (key_done && hot_done)
+ /* bitmapsets are now empty, hot includes others */
+ if (hot_done)
break;
- /* XXX there's probably an easier way ... */
- if (hot_done)
- check_now = next_key_attnum;
- if (key_done)
- check_now = next_hot_attnum;
- else
- check_now = Min(next_hot_attnum, next_key_attnum);
+ check_now = next_hot_attnum;
changed = !heap_tuple_attr_equals(RelationGetDescr(relation),
check_now, oldtup, newtup);
@@ -3773,11 +3906,15 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
hot_result = false;
if (check_now == next_key_attnum)
key_result = false;
+ if (check_now == next_ckey_attnum)
+ ckey_result = false;
}
/* if both are false now, we can stop checking */
- if (!hot_result && !key_result)
+ if (!hot_result && !key_result && !ckey_result)
+ {
break;
+ }
if (check_now == next_hot_attnum)
{
@@ -3791,16 +3928,22 @@ HeapSatisfiesHOTandKeyUpdate(Relation relation,
if (check_now == next_key_attnum)
{
next_key_attnum = bms_first_member(key_attrs);
- if (next_key_attnum == -1)
- key_done = true;
- else
+ if (next_key_attnum != -1)
/* Adjust for system attributes */
next_key_attnum += FirstLowInvalidHeapAttributeNumber;
}
+ if (check_now == next_ckey_attnum)
+ {
+ next_ckey_attnum = bms_first_member(ckey_attrs);
+ if (next_ckey_attnum != -1)
+ /* Adjust for system attributes */
+ next_ckey_attnum += FirstLowInvalidHeapAttributeNumber;
+ }
}
*satisfies_hot = hot_result;
*satisfies_key = key_result;
+ *satisfies_ckey = ckey_result;
}
/*
@@ -5822,15 +5965,21 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
static XLogRecPtr
log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
+ HeapTuple idx_tuple,
bool all_visible_cleared, bool new_all_visible_cleared)
{
xl_heap_update xlrec;
- xl_heap_header xlhdr;
+ xl_heap_header_len xlhdr;
uint8 info;
XLogRecPtr recptr;
XLogRecData rdata[4];
Page page = BufferGetPage(newbuf);
+ /*
+ * Just as for XLOG_HEAP_INSERT we need to make sure the tuple
+ */
+ bool need_tuple_data = RelationIsLogicallyLogged(reln);
+
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
@@ -5845,9 +5994,12 @@ log_heap_update(Relation reln, Buffer oldbuf,
xlrec.old_infobits_set = compute_infobits(oldtup->t_data->t_infomask,
oldtup->t_data->t_infomask2);
xlrec.new_xmax = HeapTupleHeaderGetRawXmax(newtup->t_data);
- xlrec.all_visible_cleared = all_visible_cleared;
+ xlrec.flags = 0;
+ if (all_visible_cleared)
+ xlrec.flags |= XLOG_HEAP_ALL_VISIBLE_CLEARED;
xlrec.newtid = newtup->t_self;
- xlrec.new_all_visible_cleared = new_all_visible_cleared;
+ if (new_all_visible_cleared)
+ xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
rdata[0].data = (char *) &xlrec;
rdata[0].len = SizeOfHeapUpdate;
@@ -5860,33 +6012,80 @@ log_heap_update(Relation reln, Buffer oldbuf,
rdata[1].buffer_std = true;
rdata[1].next = &(rdata[2]);
- xlhdr.t_infomask2 = newtup->t_data->t_infomask2;
- xlhdr.t_infomask = newtup->t_data->t_infomask;
- xlhdr.t_hoff = newtup->t_data->t_hoff;
+ xlhdr.header.t_infomask2 = newtup->t_data->t_infomask2;
+ xlhdr.header.t_infomask = newtup->t_data->t_infomask;
+ xlhdr.header.t_hoff = newtup->t_data->t_hoff;
+ xlhdr.t_len = newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
- /*
- * As with insert records, we need not store the rdata[2] segment if we
- * decide to store the whole buffer instead.
- */
rdata[2].data = (char *) &xlhdr;
- rdata[2].len = SizeOfHeapHeader;
- rdata[2].buffer = newbuf;
+ rdata[2].len = SizeOfHeapHeaderLen;
+ rdata[2].buffer = need_tuple_data ? InvalidBuffer : newbuf;
rdata[2].buffer_std = true;
rdata[2].next = &(rdata[3]);
/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
- rdata[3].data = (char *) newtup->t_data + offsetof(HeapTupleHeaderData, t_bits);
+ rdata[3].data = (char *) newtup->t_data
+ + offsetof(HeapTupleHeaderData, t_bits);
rdata[3].len = newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
- rdata[3].buffer = newbuf;
+ rdata[3].buffer = need_tuple_data ? InvalidBuffer : newbuf;
rdata[3].buffer_std = true;
rdata[3].next = NULL;
+ /*
+ * separate storage for the buffer reference of the new page in the
+ * wal_level >= logical case
+ */
+ if(need_tuple_data)
+ {
+ XLogRecData rdata_logical[4];
+
+ rdata[3].next = &(rdata_logical[0]);
+
+ rdata_logical[0].data = NULL,
+ rdata_logical[0].len = 0;
+ rdata_logical[0].buffer = newbuf;
+ rdata_logical[0].buffer_std = true;
+ rdata_logical[0].next = NULL;
+ xlrec.flags |= XLOG_HEAP_CONTAINS_NEW_TUPLE;
+
+ /* candidate key changed and we have a candidate key */
+ if (idx_tuple)
+ {
+ /* don't really need this, but its more comfy */
+ xl_heap_header_len xlhdr_idx;
+ xlhdr_idx.header.t_infomask2 = idx_tuple->t_data->t_infomask2;
+ xlhdr_idx.header.t_infomask = idx_tuple->t_data->t_infomask;
+ xlhdr_idx.header.t_hoff = idx_tuple->t_data->t_hoff;
+ xlhdr_idx.t_len = idx_tuple->t_len;
+
+ rdata_logical[0].next = &(rdata_logical[1]);
+ rdata_logical[1].data = (char *) &xlhdr_idx;
+ rdata_logical[1].len = SizeOfHeapHeaderLen;
+ rdata_logical[1].buffer = InvalidBuffer;
+ rdata_logical[1].next = &(rdata_logical[2]);
+
+ /* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
+ rdata_logical[2].data = (char *) idx_tuple->t_data
+ + offsetof(HeapTupleHeaderData, t_bits);
+ rdata_logical[2].len = idx_tuple->t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+ rdata_logical[2].buffer = InvalidBuffer;
+ rdata_logical[2].next = NULL;
+ xlrec.flags |= XLOG_HEAP_CONTAINS_OLD_KEY;
+ }
+ }
+
/* If new tuple is the single and first tuple on page... */
if (ItemPointerGetOffsetNumber(&(newtup->t_self)) == FirstOffsetNumber &&
PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
{
+ XLogRecData *rcur = &rdata[0];
info |= XLOG_HEAP_INIT_PAGE;
- rdata[2].buffer = rdata[3].buffer = InvalidBuffer;
+ while (rcur != NULL)
+ {
+ rcur->buffer = InvalidBuffer;
+ rcur = rcur->next;
+ }
}
recptr = XLogInsert(RM_HEAP_ID, info, rdata);
@@ -5993,6 +6192,114 @@ log_newpage_buffer(Buffer buffer)
}
/*
+ * Perform XLogInsert of a XLOG_HEAP2_NEW_CID record
+ *
+ * The HeapTuple really needs to already have a ComboCid set otherwise we
+ * cannot detect combocid/cmin/cmax.
+ *
+ * This is only used in wal_level >= WAL_LEVEL_LOGICAL
+ */
+static XLogRecPtr
+log_heap_new_cid(Relation relation, HeapTuple tup)
+{
+ xl_heap_new_cid xlrec;
+
+ XLogRecPtr recptr;
+ XLogRecData rdata[1];
+ HeapTupleHeader hdr = tup->t_data;
+
+ Assert(ItemPointerIsValid(&tup->t_self));
+ Assert(tup->t_tableOid != InvalidOid);
+
+ xlrec.top_xid = GetTopTransactionId();
+ xlrec.target.node = relation->rd_node;
+ xlrec.target.tid = tup->t_self;
+
+ /*
+ * if the tuple got inserted & deleted in the same TX we definitely have a
+ * combocid.
+ */
+ if (hdr->t_infomask & HEAP_COMBOCID)
+ {
+ xlrec.cmin = HeapTupleHeaderGetCmin(hdr);
+ xlrec.cmax = HeapTupleHeaderGetCmax(hdr);
+ xlrec.combocid = HeapTupleHeaderGetRawCommandId(hdr);
+ }
+ else
+ {
+ /* tuple inserted */
+ if (hdr->t_infomask & HEAP_XMAX_INVALID)
+ {
+ xlrec.cmin = HeapTupleHeaderGetRawCommandId(hdr);
+ xlrec.cmax = InvalidCommandId;
+ }
+ /* tuple from a different tx updated or deleted */
+ else
+ {
+ xlrec.cmin = InvalidCommandId;
+ xlrec.cmax = HeapTupleHeaderGetRawCommandId(hdr);
+
+ }
+ xlrec.combocid = InvalidCommandId;
+ }
+
+ rdata[0].data = (char *) &xlrec;
+ rdata[0].len = SizeOfHeapNewCid;
+ rdata[0].buffer = InvalidBuffer;
+ rdata[0].next = NULL;
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_NEW_CID, rdata);
+
+ return recptr;
+}
+
+static HeapTuple
+ExtractKeyTuple(Relation relation, HeapTuple tp)
+{
+ HeapTuple idx_tuple = NULL;
+ TupleDesc desc = RelationGetDescr(relation);
+ Relation idx_rel;
+ TupleDesc idx_desc;
+ Datum idx_vals[INDEX_MAX_KEYS];
+ bool idx_isnull[INDEX_MAX_KEYS];
+ int natt;
+
+ /* needs to already have been fetched? */
+ if (relation->rd_indexvalid == 0)
+ RelationGetIndexList(relation);
+
+ if (!OidIsValid(relation->rd_primary))
+ {
+ elog(DEBUG1, "Could not find primary key for table with oid %u",
+ RelationGetRelid(relation));
+ }
+ else
+ {
+ idx_rel = RelationIdGetRelation(relation->rd_primary);
+ idx_desc = RelationGetDescr(idx_rel);
+
+ for (natt = 0; natt < idx_desc->natts; natt++)
+ {
+ int attno = idx_rel->rd_index->indkey.values[natt];
+ if (attno == ObjectIdAttributeNumber)
+ {
+ idx_vals[natt] = HeapTupleGetOid(tp);
+ idx_isnull[natt] = false;
+ }
+ else
+ {
+ idx_vals[natt] =
+ fastgetattr(tp, attno, desc, &idx_isnull[natt]);
+ }
+ Assert(!idx_isnull[natt]);
+ }
+ idx_tuple = heap_form_tuple(idx_desc, idx_vals, idx_isnull);
+ RelationClose(idx_rel);
+ }
+ return idx_tuple;
+}
+
+/*
* Handles CLEANUP_INFO
*/
static void
@@ -6353,7 +6660,7 @@ heap_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
Buffer vmbuffer = InvalidBuffer;
@@ -6402,7 +6709,7 @@ heap_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
/* Mark the page as a candidate for pruning */
PageSetPrunable(page, record->xl_xid);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
/* Make sure there is no forward chain link in t_ctid */
@@ -6436,7 +6743,7 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
Buffer vmbuffer = InvalidBuffer;
@@ -6507,7 +6814,7 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
PageSetLSN(page, lsn);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
MarkBufferDirty(buffer);
@@ -6570,7 +6877,7 @@ heap_xlog_multi_insert(XLogRecPtr lsn, XLogRecord *record)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->node);
Buffer vmbuffer = InvalidBuffer;
@@ -6653,7 +6960,7 @@ heap_xlog_multi_insert(XLogRecPtr lsn, XLogRecord *record)
PageSetLSN(page, lsn);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
MarkBufferDirty(buffer);
@@ -6692,7 +6999,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
HeapTupleHeaderData hdr;
char data[MaxHeapTupleSize];
} tbuf;
- xl_heap_header xlhdr;
+ xl_heap_header_len xlhdr;
int hsize;
uint32 newlen;
Size freespace;
@@ -6701,7 +7008,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
BlockNumber block = ItemPointerGetBlockNumber(&xlrec->target.tid);
@@ -6779,7 +7086,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
/* Mark the page as a candidate for pruning */
PageSetPrunable(page, record->xl_xid);
- if (xlrec->all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
/*
@@ -6803,7 +7110,7 @@ newt:;
* The visibility map may need to be fixed even if the heap page is
* already up-to-date.
*/
- if (xlrec->new_all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
BlockNumber block = ItemPointerGetBlockNumber(&xlrec->newtid);
@@ -6861,13 +7168,13 @@ newsame:;
if (PageGetMaxOffsetNumber(page) + 1 < offnum)
elog(PANIC, "heap_update_redo: invalid max offset number");
- hsize = SizeOfHeapUpdate + SizeOfHeapHeader;
+ hsize = SizeOfHeapUpdate + SizeOfHeapHeaderLen;
- newlen = record->xl_len - hsize;
- Assert(newlen <= MaxHeapTupleSize);
memcpy((char *) &xlhdr,
(char *) xlrec + SizeOfHeapUpdate,
- SizeOfHeapHeader);
+ SizeOfHeapHeaderLen);
+ newlen = xlhdr.t_len;
+ Assert(newlen <= MaxHeapTupleSize);
htup = &tbuf.hdr;
MemSet((char *) htup, 0, sizeof(HeapTupleHeaderData));
/* PG73FORMAT: get bitmap [+ padding] [+ oid] + data */
@@ -6875,9 +7182,9 @@ newsame:;
(char *) xlrec + hsize,
newlen);
newlen += offsetof(HeapTupleHeaderData, t_bits);
- htup->t_infomask2 = xlhdr.t_infomask2;
- htup->t_infomask = xlhdr.t_infomask;
- htup->t_hoff = xlhdr.t_hoff;
+ htup->t_infomask2 = xlhdr.header.t_infomask2;
+ htup->t_infomask = xlhdr.header.t_infomask;
+ htup->t_hoff = xlhdr.header.t_hoff;
HeapTupleHeaderSetXmin(htup, record->xl_xid);
HeapTupleHeaderSetCmin(htup, FirstCommandId);
@@ -6889,7 +7196,7 @@ newsame:;
if (offnum == InvalidOffsetNumber)
elog(PANIC, "heap_update_redo: failed to add tuple");
- if (xlrec->new_all_visible_cleared)
+ if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
@@ -7140,6 +7447,9 @@ heap2_redo(XLogRecPtr lsn, XLogRecord *record)
case XLOG_HEAP2_LOCK_UPDATED:
heap_xlog_lock_updated(lsn, record);
break;
+ case XLOG_HEAP2_NEW_CID:
+ /* nothing to do on a real replay, only during logical decoding */
+ break;
default:
elog(PANIC, "heap2_redo: unknown op code %u", info);
}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3b68705..10587b8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -75,6 +75,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, TransactionId OldestXmin)
Page page = BufferGetPage(buffer);
Size minfree;
+ Assert(TransactionIdIsValid(OldestXmin));
+
/*
* Let's see if we really need pruning.
*
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index b878155..3bac4a5 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -67,7 +67,10 @@
#include "access/relscan.h"
#include "access/transam.h"
+#include "access/xlog.h"
+
#include "catalog/index.h"
+#include "catalog/catalog.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -520,8 +523,15 @@ index_fetch_heap(IndexScanDesc scan)
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != scan->xs_cbuf)
- heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
- RecentGlobalXmin);
+ {
+ if (IsSystemRelation(scan->heapRelation)
+ || RelationIsDoingTimetravel(scan->heapRelation))
+ heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
+ RecentGlobalXmin);
+ else
+ heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf,
+ RecentGlobalDataXmin);
+ }
}
/* Obtain share-lock on the buffer so we can examine visibility */
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index bc8b985..c750fef 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -184,6 +184,15 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
xlrec->infobits_set);
out_target(buf, &(xlrec->target));
}
+ else if (info == XLOG_HEAP2_NEW_CID)
+ {
+ xl_heap_new_cid *xlrec = (xl_heap_new_cid *) rec;
+
+ appendStringInfo(buf, "new_cid: ");
+ out_target(buf, &(xlrec->target));
+ appendStringInfo(buf, "; cmin: %u, cmax: %u, combo: %u",
+ xlrec->cmin, xlrec->cmax, xlrec->combocid);
+ }
else
appendStringInfo(buf, "UNKNOWN");
}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 2bad527..f1a75b4 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -28,6 +28,7 @@ const struct config_enum_entry wal_level_options[] = {
{"minimal", WAL_LEVEL_MINIMAL, false},
{"archive", WAL_LEVEL_ARCHIVE, false},
{"hot_standby", WAL_LEVEL_HOT_STANDBY, false},
+ {"logical", WAL_LEVEL_LOGICAL, false},
{NULL, 0, false}
};
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e975f8d..d46a50e 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -47,6 +47,7 @@
#include "access/twophase.h"
#include "access/twophase_rmgr.h"
#include "access/xact.h"
+#include "access/xlog.h"
#include "access/xlogutils.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
@@ -1920,7 +1921,8 @@ RecoverPreparedTransactions(void)
* the prepared transaction generated xid assignment records. Test
* here must match one used in AssignTransactionId().
*/
- if (InHotStandby && hdr->nsubxacts >= PGPROC_MAX_CACHED_SUBXIDS)
+ if (InHotStandby && (hdr->nsubxacts >= PGPROC_MAX_CACHED_SUBXIDS ||
+ XLogLogicalInfoActive()))
overwriteOK = true;
/*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0591f3f..dc093e6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -431,6 +431,7 @@ AssignTransactionId(TransactionState s)
{
bool isSubXact = (s->parent != NULL);
ResourceOwner currentOwner;
+ bool log_unknown_top = false;
/* Assert that caller didn't screw up */
Assert(!TransactionIdIsValid(s->transactionId));
@@ -438,7 +439,7 @@ AssignTransactionId(TransactionState s)
/*
* Ensure parent(s) have XIDs, so that a child always has an XID later
- * than its parent. Musn't recurse here, or we might get a stack overflow
+ * than its parent. May not recurse here, or we might get a stack overflow
* if we're at the bottom of a huge stack of subtransactions none of which
* have XIDs yet.
*/
@@ -456,6 +457,17 @@ AssignTransactionId(TransactionState s)
}
/*
+ * Force the toplevel xid to be logged before suxact's are logged. If
+ * the uppermost level already has an xid that precondition already is
+ * fulfilled.
+ */
+ Assert(parentOffset);
+ if (XLogLogicalInfoActive() && parents[parentOffset - 1]->parent == NULL)
+ {
+ log_unknown_top = true;
+ }
+
+ /*
* This is technically a recursive call, but the recursion will never
* be more than one layer deep.
*/
@@ -519,6 +531,9 @@ AssignTransactionId(TransactionState s)
* top-level transaction that each subxact belongs to. This is correct in
* recovery only because aborted subtransactions are separately WAL
* logged.
+ *
+ * This is correct even for the case where several levels above us didn't
+ * have an xid assigned as we recursed up to them beforehand.
*/
if (isSubXact && XLogStandbyInfoActive())
{
@@ -529,7 +544,8 @@ AssignTransactionId(TransactionState s)
* ensure this test matches similar one in
* RecoverPreparedTransactions()
*/
- if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS)
+ if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS ||
+ log_unknown_top)
{
XLogRecData rdata[2];
xl_xact_assignment xlrec;
@@ -548,7 +564,7 @@ AssignTransactionId(TransactionState s)
rdata[0].next = &rdata[1];
rdata[1].data = (char *) unreportedXids;
- rdata[1].len = PGPROC_MAX_CACHED_SUBXIDS * sizeof(TransactionId);
+ rdata[1].len = nUnreportedXids * sizeof(TransactionId);
rdata[1].buffer = InvalidBuffer;
rdata[1].next = NULL;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ac51193..1ffacde 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
#include "postmaster/startup.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
+#include "replication/logical.h"
#include "storage/bufmgr.h"
#include "storage/fd.h"
#include "storage/ipc.h"
@@ -5195,6 +5196,13 @@ StartupXLOG(void)
XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
XLogCtl->ckptXid = checkPoint.nextXid;
+
+ /*
+ * Startup logical state, needs to be setup now so we have proper data
+ * during restore. XXX
+ */
+ StartupLogicalReplication(checkPoint.redo);
+
/*
* Initialize unlogged LSN. On a clean shutdown, it's restored from the
* control file. On recovery, all unlogged relations are blown away, so
@@ -7165,7 +7173,7 @@ CreateCheckPoint(int flags)
* StartupSUBTRANS hasn't been called yet.
*/
if (!RecoveryInProgress())
- TruncateSUBTRANS(GetOldestXmin(true, false, false));
+ TruncateSUBTRANS(GetOldestXmin(true, true, false, false));
/* Real work is done, but log and update stats before releasing lock. */
LogCheckpointEnd(false);
@@ -7522,7 +7530,7 @@ CreateRestartPoint(int flags)
* this because StartupSUBTRANS hasn't been called yet.
*/
if (EnableHotStandby)
- TruncateSUBTRANS(GetOldestXmin(true, false, false));
+ TruncateSUBTRANS(GetOldestXmin(true, true, false, false));
/* Real work is done, but log and update before releasing lock. */
LogCheckpointEnd(true);
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 41a5da0..48fd182 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -106,7 +106,6 @@ GetDatabasePath(Oid dbNode, Oid spcNode)
return path;
}
-
/*
* IsSystemRelation
* True iff the relation is a system catalog relation.
@@ -123,8 +122,17 @@ GetDatabasePath(Oid dbNode, Oid spcNode)
bool
IsSystemRelation(Relation relation)
{
- return IsSystemNamespace(RelationGetNamespace(relation)) ||
- IsToastNamespace(RelationGetNamespace(relation));
+ return IsSystemRelationId(RelationGetRelid(relation));
+}
+
+/*
+ * IsSystemRelationId
+ * True iff the relation is a system catalog relation.
+ */
+bool
+IsSystemRelationId(Oid relid)
+{
+ return relid < FirstNormalObjectId;
}
/*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index bfad8b1..bcdd305 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2196,9 +2196,19 @@ IndexBuildHeapScan(Relation heapRelation,
}
else
{
+ /*
+ * We can ignore a) pegged xmins b) shared relations if we don't scan
+ * something acting as a catalog.
+ */
+ bool include_systables =
+ IsSystemRelation(heapRelation) ||
+ RelationIsDoingTimetravel(heapRelation);
+
snapshot = SnapshotAny;
/* okay to ignore lazy VACUUMs here */
- OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true,
+ OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared,
+ include_systables,
+ true,
false);
}
@@ -3367,7 +3377,7 @@ reindex_relation(Oid relid, int flags)
/* Ensure rd_indexattr is valid; see comments for RelationSetIndexList */
if (is_pg_class)
- (void) RelationGetIndexAttrBitmap(rel, false);
+ (void) RelationGetIndexAttrBitmap(rel, INDEX_ATTR_BITMAP_ALL);
PG_TRY();
{
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 81d7c4f..e16fcb7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -612,6 +612,16 @@ CREATE VIEW pg_stat_replication AS
WHERE S.usesysid = U.oid AND
S.pid = W.pid;
+CREATE VIEW pg_stat_logical_decoding AS
+ SELECT
+ L.slot_name,
+ L.plugin,
+ L.database,
+ L.active,
+ L.xmin,
+ L.restart_decoding_lsn
+ FROM pg_stat_get_logical_decoding_slots() AS L;
+
CREATE VIEW pg_stat_database AS
SELECT
D.oid AS datid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7968319..7a05cea 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1081,7 +1081,7 @@ acquire_sample_rows(Relation onerel, int elevel,
totalblocks = RelationGetNumberOfBlocks(onerel);
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
- OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true, false);
+ OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true, true, false);
/* Prepare for sampling block numbers */
BlockSampler_Init(&bs, totalblocks, targrows);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 5064081..8c953e1 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -847,6 +847,8 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
*/
vacuum_set_xid_limits(freeze_min_age, freeze_table_age,
OldHeap->rd_rel->relisshared,
+ IsSystemRelation(OldHeap)
+ || RelationIsDoingTimetravel(OldHeap),
&OldestXmin, &FreezeXid, NULL, &MultiXactFrzLimit);
/*
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ed65bab..d348e34 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2355,7 +2355,8 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
* concurrency.
*/
modifiedCols = GetModifiedColumns(relinfo, estate);
- keyCols = RelationGetIndexAttrBitmap(relinfo->ri_RelationDesc, true);
+ keyCols = RelationGetIndexAttrBitmap(relinfo->ri_RelationDesc,
+ INDEX_ATTR_BITMAP_KEY);
if (bms_overlap(keyCols, modifiedCols))
lockmode = LockTupleExclusive;
else
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 924a12e..8aa384a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -381,6 +381,7 @@ void
vacuum_set_xid_limits(int freeze_min_age,
int freeze_table_age,
bool sharedRel,
+ bool catalogRel,
TransactionId *oldestXmin,
TransactionId *freezeLimit,
TransactionId *freezeTableLimit,
@@ -399,7 +400,7 @@ vacuum_set_xid_limits(int freeze_min_age,
* working on a particular table at any time, and that each vacuum is
* always an independent transaction.
*/
- *oldestXmin = GetOldestXmin(sharedRel, true, false);
+ *oldestXmin = GetOldestXmin(sharedRel, catalogRel, true, false);
Assert(TransactionIdIsNormal(*oldestXmin));
@@ -720,7 +721,7 @@ vac_update_datfrozenxid(void)
* committed pg_class entries for new tables; see AddNewRelationTuple().
* So we cannot produce a wrong minimum by starting with this.
*/
- newFrozenXid = GetOldestXmin(true, true, false);
+ newFrozenXid = GetOldestXmin(true, true, true, false);
/*
* Similarly, initialize the MultiXact "min" with the value that would be
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 2ea0590..b650eee 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -44,6 +44,7 @@
#include "access/multixact.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
+#include "catalog/catalog.h"
#include "catalog/storage.h"
#include "commands/dbcommands.h"
#include "commands/vacuum.h"
@@ -202,6 +203,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
vacuum_set_xid_limits(vacstmt->freeze_min_age, vacstmt->freeze_table_age,
onerel->rd_rel->relisshared,
+ IsSystemRelation(onerel)
+ || RelationIsDoingTimetravel(onerel),
&OldestXmin, &FreezeLimit, &freezeTableLimit,
&MultiXactFrzLimit);
scan_all = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
@@ -1722,7 +1725,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId *visibility_cut
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
{
case HEAPTUPLE_LIVE:
{
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 86f0686..6c301b8 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -837,7 +837,7 @@ PostmasterMain(int argc, char *argv[])
(errmsg("WAL archival (archive_mode=on) requires wal_level \"archive\" or \"hot_standby\"")));
if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
ereport(ERROR,
- (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"archive\" or \"hot_standby\"")));
+ (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"archive\", \"logical\" or \"hot_standby\"")));
/*
* Other one-time internal sanity checks can go here, if they are fast.
@@ -1958,9 +1958,8 @@ retry1:
/* Generic Walsender is not related to a particular database */
if (am_walsender && strcmp(port->database_name, "replication") == 0)
port->database_name[0] = '\0';
-
- if (am_walsender)
- elog(WARNING, "connecting to %s", port->database_name);
+ else if (am_walsender)
+ elog(DEBUG1, "WAL sender attaching to database %s", port->database_name);
/*
* Done putting stuff in TopMemoryContext.
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 2dde011..2e13e27 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,8 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
OBJS = walsender.o walreceiverfuncs.o walreceiver.o basebackup.o \
repl_gram.o syncrep.o
+SUBDIRS = logical
+
include $(top_srcdir)/src/backend/common.mk
# repl_scanner is compiled as part of repl_gram
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
new file mode 100644
index 0000000..310a45c
--- /dev/null
+++ b/src/backend/replication/logical/Makefile
@@ -0,0 +1,19 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for src/backend/replication/logical
+#
+# IDENTIFICATION
+# src/backend/replication/logical/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/logical
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
+
+OBJS = decode.o logical.o logicalfuncs.o reorderbuffer.o snapbuild.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
new file mode 100644
index 0000000..a93e48d
--- /dev/null
+++ b/src/backend/replication/logical/decode.c
@@ -0,0 +1,556 @@
+/*-------------------------------------------------------------------------
+ *
+ * decode.c
+ * Decodes wal records from an xlogreader.h callback into an reorderbuffer
+ * while building an appropriate snapshots to decode those
+ *
+ * NOTE:
+ * Its possible that the separation between decode.c and snapbuild.c is a
+ * bit too strict, in the end they just about have the same switch.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logical/decode.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+
+#include "access/heapam.h"
+#include "access/heapam_xlog.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+#include "catalog/pg_control.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+#include "utils/lsyscache.h"
+
+static void DecodeHeapOp(ReorderBuffer *reorder, XLogRecordBuffer *buf,
+ RmgrId rmgr, uint8 info);
+static void DecodeTransactionOp(LogicalDecodingContext *ctx,
+ XLogRecordBuffer *buf);
+static void DecodeXLogTuple(char *data, Size len,
+ ReorderBufferTupleBuf *tuple);
+static void DecodeInsert(ReorderBuffer *reorder, XLogRecordBuffer *buf);
+static void DecodeUpdate(ReorderBuffer *reorder, XLogRecordBuffer *buf);
+static void DecodeDelete(ReorderBuffer *reorder, XLogRecordBuffer *buf);
+static void DecodeMultiInsert(ReorderBuffer *reorder, XLogRecordBuffer *buf);
+static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ TransactionId xid, TransactionId *sub_xids, int nsubxacts);
+static void DecodeAbort(ReorderBuffer *reorder, XLogRecPtr lsn,
+ TransactionId xid, TransactionId *sub_xids, int nsubxacts);
+
+
+void
+DecodeRecordIntoReorderBuffer(LogicalDecodingContext *ctx,
+ XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ uint8 info = r->xl_info & ~XLR_INFO_MASK;
+ ReorderBuffer *reorder = ctx->reorder;
+ SnapBuildAction action;
+
+ /*---------
+ * Call the snapshot builder. It needs to be called before we analyze
+ * tuples for two reasons:
+ *
+ * * Only in the snapshot building logic we know whether we have enough
+ * information to decode a particular tuple
+ *
+ * * The Snapshot/CommandIds computed by the SnapshotBuilder need to be
+ * added to the ReorderBuffer before we add tuples using them
+ *---------
+ */
+ action = SnapBuildProcessRecord(ctx->snapshot_builder, buf);
+
+ if (action == SNAPBUILD_SKIP)
+ return;
+
+ switch (r->xl_rmid)
+ {
+ case RM_HEAP_ID:
+ case RM_HEAP2_ID:
+ DecodeHeapOp(reorder, buf, r->xl_rmid,
+ r->xl_info & XLOG_HEAP_OPMASK);
+ break;
+
+ case RM_XACT_ID:
+ DecodeTransactionOp(ctx, buf);
+ break;
+
+ case RM_XLOG_ID:
+ switch (info)
+ {
+ /* this is also used in END_OF_RECOVERY checkpoints */
+ case XLOG_CHECKPOINT_SHUTDOWN:
+
+ /*
+ * abort all transactions that still are in progress,
+ * they aren't in progress anymore. do not abort
+ * prepared transactions that have been prepared for
+ * commit.
+ *
+ * FIXME: implement.
+ */
+ break;
+ }
+ default:
+ break;
+ }
+}
+
+static void
+DecodeHeapOp(ReorderBuffer *reorder, XLogRecordBuffer *buf, RmgrId rmgr,
+ uint8 info)
+{
+ switch (rmgr)
+ {
+ case RM_HEAP_ID:
+ switch (info)
+ {
+ case XLOG_HEAP_INSERT:
+ DecodeInsert(reorder, buf);
+ break;
+
+ /*
+ * no guarantee that we get an HOT update again, so
+ * handle it as a normal update
+ */
+ case XLOG_HEAP_HOT_UPDATE:
+ case XLOG_HEAP_UPDATE:
+ DecodeUpdate(reorder, buf);
+ break;
+
+ case XLOG_HEAP_NEWPAGE:
+
+ /*
+ * XXX: There doesn't seem to be a usecase for
+ * decoding HEAP_NEWPAGE's. Its only used in various
+ * indexam's and CLUSTER, neither of which should be
+ * relevant for the logical changestream.
+ */
+ break;
+
+ case XLOG_HEAP_DELETE:
+ DecodeDelete(reorder, buf);
+ break;
+ default:
+ break;
+ }
+ break;
+ case RM_HEAP2_ID:
+ switch (info)
+ {
+ case XLOG_HEAP2_MULTI_INSERT:
+ DecodeMultiInsert(reorder, buf);
+ break;
+
+ default:
+
+ /*
+ * everything else here is just physical stuff were
+ * not interested in
+ */
+ break;
+ }
+ break;
+ }
+}
+
+static void
+DecodeTransactionOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+ ReorderBuffer *reorder = ctx->reorder;
+ XLogRecord *r = &buf->record;
+
+ switch (r->xl_info & ~XLR_INFO_MASK)
+ {
+ case XLOG_XACT_COMMIT:
+ {
+ TransactionId *sub_xids = NULL;
+ xl_xact_commit *xlrec;
+
+ xlrec = (xl_xact_commit *) buf->record_data;
+
+ if (xlrec->nsubxacts > 0)
+ sub_xids = (TransactionId *)
+ &(xlrec->xnodes[xlrec->nrels]);
+
+ DecodeCommit(ctx, buf, r->xl_xid, sub_xids, xlrec->nsubxacts);
+
+ break;
+ }
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ TransactionId *sub_xids;
+ xl_xact_commit_prepared *xlrec;
+
+ xlrec = (xl_xact_commit_prepared *) buf->record_data;
+ sub_xids = (TransactionId *)
+ &(xlrec->crec.xnodes[xlrec->crec.nrels]);
+
+ /* r->xl_xid is committed in a separate record */
+ DecodeCommit(ctx, buf, xlrec->xid, sub_xids,
+ xlrec->crec.nsubxacts);
+
+ break;
+ }
+ case XLOG_XACT_COMMIT_COMPACT:
+ {
+ xl_xact_commit_compact *xlrec;
+
+ xlrec = (xl_xact_commit_compact *) buf->record_data;
+
+ DecodeCommit(ctx, buf, r->xl_xid, xlrec->subxacts,
+ xlrec->nsubxacts);
+ break;
+ }
+ case XLOG_XACT_ABORT:
+ {
+ TransactionId *sub_xids;
+ xl_xact_abort *xlrec;
+
+ xlrec = (xl_xact_abort *) buf->record_data;
+
+ sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+
+ DecodeAbort(reorder, buf->origptr, r->xl_xid,
+ sub_xids, xlrec->nsubxacts);
+ break;
+ }
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ TransactionId *sub_xids;
+ xl_xact_abort_prepared *xlrec;
+ xl_xact_abort *arec;
+
+ xlrec = (xl_xact_abort_prepared *) buf->record_data;
+ arec = &xlrec->arec;
+
+ sub_xids = (TransactionId *) &(arec->xnodes[arec->nrels]);
+ /* r->xl_xid is committed in a separate record */
+ DecodeAbort(reorder, buf->origptr, xlrec->xid,
+ sub_xids, arec->nsubxacts);
+ break;
+ }
+
+ case XLOG_XACT_ASSIGNMENT:
+ {
+ int i;
+ TransactionId *sub_xid;
+ xl_xact_assignment *xlrec =
+ (xl_xact_assignment *) buf->record_data;
+
+ sub_xid = &xlrec->xsub[0];
+
+ for (i = 0; i < xlrec->nsubxacts; i++)
+ {
+ ReorderBufferAssignChild(reorder, r->xl_xid,
+ *(sub_xid++), buf->origptr);
+ }
+ break;
+ }
+ case XLOG_XACT_PREPARE:
+
+ /*
+ * XXX: we could replay the transaction and prepare it
+ * as well.
+ */
+ break;
+ default:
+ break;
+ }
+}
+
+static void
+DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, TransactionId xid,
+ TransactionId *sub_xids, int nsubxacts)
+{
+ int i;
+
+ /*
+ * If we are not interested in anything up to this LSN convert the commit
+ * into an ABORT to cleanup.
+ *
+ * FIXME: this needs to replay invalidations anyway!
+ */
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr))
+ {
+ DecodeAbort(ctx->reorder, buf->origptr, xid,
+ sub_xids, nsubxacts);
+ return;
+ }
+
+ for (i = 0; i < nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, *sub_xids,
+ buf->origptr);
+ sub_xids++;
+ }
+
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr);
+}
+
+static void
+DecodeAbort(ReorderBuffer *reorder, XLogRecPtr lsn, TransactionId xid,
+ TransactionId *sub_xids, int nsubxacts)
+{
+ int i;
+
+ for (i = 0; i < nsubxacts; i++)
+ {
+ ReorderBufferAbort(reorder, *sub_xids, lsn);
+ sub_xids++;
+ }
+
+ ReorderBufferAbort(reorder, xid, lsn);
+}
+
+static void
+DecodeInsert(ReorderBuffer *reorder, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_insert *xlrec;
+ ReorderBufferChange *change;
+
+ xlrec = (xl_heap_insert *) buf->record_data;
+
+ /* XXX: nicer */
+ if (xlrec->target.node.dbNode != MyDatabaseId)
+ return;
+
+ change = ReorderBufferGetChange(reorder);
+ change->action = REORDER_BUFFER_CHANGE_INSERT;
+ memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+ {
+ Assert(r->xl_len > (SizeOfHeapInsert + SizeOfHeapHeader));
+
+ change->newtuple = ReorderBufferGetTupleBuf(reorder);
+
+ DecodeXLogTuple((char *) xlrec + SizeOfHeapInsert,
+ r->xl_len - SizeOfHeapInsert,
+ change->newtuple);
+ }
+
+ ReorderBufferAddChange(reorder, r->xl_xid, buf->origptr, change);
+}
+
+static void
+DecodeUpdate(ReorderBuffer *reorder, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_update *xlrec;
+ xl_heap_header_len *xlhdr;
+ ReorderBufferChange *change;
+ char *data;
+
+ xlrec = (xl_heap_update *) buf->record_data;
+ xlhdr = (xl_heap_header_len *) (buf->record_data + SizeOfHeapUpdate);
+
+ /* XXX: nicer */
+ if (xlrec->target.node.dbNode != MyDatabaseId)
+ return;
+
+ change = ReorderBufferGetChange(reorder);
+ change->action = REORDER_BUFFER_CHANGE_UPDATE;
+ memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+ data = (char *) &xlhdr->header;
+
+ /*
+ * FIXME: need to get/save the old tuple as well if we want primary key
+ * changes to work.
+ */
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+ {
+ Assert(r->xl_len > (SizeOfHeapUpdate + SizeOfHeapHeaderLen));
+#if 0
+ elog(WARNING, "xl: %zu tp:%zu",
+ (r->xl_len - SizeOfHeapUpdate - (SizeOfHeapHeaderLen - SizeOfHeapHeader)),
+ xlhdr->t_len + SizeOfHeapHeader);
+#endif
+ change->newtuple = ReorderBufferGetTupleBuf(reorder);
+
+ DecodeXLogTuple(data,
+ xlhdr->t_len + SizeOfHeapHeader,
+ change->newtuple);
+ /* skip over the rest of the tuple header */
+ data += SizeOfHeapHeader;
+ /* skip over the tuple data */
+ data += xlhdr->t_len;
+ }
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_OLD_KEY)
+ {
+ xlhdr = (xl_heap_header_len *) data;
+ change->oldtuple = ReorderBufferGetTupleBuf(reorder);
+ DecodeXLogTuple((char *) &xlhdr->header,
+ xlhdr->t_len + SizeOfHeapHeader,
+ change->oldtuple);
+ data = (char *) &xlhdr->header;
+ data += SizeOfHeapHeader;
+ data += xlhdr->t_len;
+ }
+
+ ReorderBufferAddChange(reorder, r->xl_xid, buf->origptr, change);
+}
+
+static void
+DecodeDelete(ReorderBuffer *reorder, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_delete *xlrec;
+ ReorderBufferChange *change;
+
+ xlrec = (xl_heap_delete *) buf->record_data;
+
+ /* XXX: nicer */
+ if (xlrec->target.node.dbNode != MyDatabaseId)
+ return;
+
+ change = ReorderBufferGetChange(reorder);
+ change->action = REORDER_BUFFER_CHANGE_DELETE;
+
+ memcpy(&change->relnode, &xlrec->target.node, sizeof(RelFileNode));
+
+ /* old primary key stored */
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_OLD_KEY)
+ {
+ Assert(r->xl_len > (SizeOfHeapDelete + SizeOfHeapHeader));
+
+ change->oldtuple = ReorderBufferGetTupleBuf(reorder);
+
+ DecodeXLogTuple((char *) xlrec + SizeOfHeapDelete,
+ r->xl_len - SizeOfHeapDelete,
+ change->oldtuple);
+ }
+ ReorderBufferAddChange(reorder, r->xl_xid, buf->origptr, change);
+}
+
+/*
+ * Decode xl_heap_multi_insert record into multiple changes.
+ */
+static void
+DecodeMultiInsert(ReorderBuffer *reorder, XLogRecordBuffer *buf)
+{
+ XLogRecord *r = &buf->record;
+ xl_heap_multi_insert *xlrec;
+ int i;
+ char *data;
+ bool isinit = (r->xl_info & XLOG_HEAP_INIT_PAGE) != 0;
+
+ xlrec = (xl_heap_multi_insert *) buf->record_data;
+
+ /* XXX: nicer */
+ if (xlrec->node.dbNode != MyDatabaseId)
+ return;
+
+ data = buf->record_data + SizeOfHeapMultiInsert;
+
+ /*
+ * OffsetNumbers (which are not of interest to us) are stored when
+ * XLOG_HEAP_INIT_PAGE is not set -- skip over them.
+ */
+ if (!isinit)
+ data += sizeof(OffsetNumber) * xlrec->ntuples;
+
+ for (i = 0; i < xlrec->ntuples; i++)
+ {
+ ReorderBufferChange *change;
+ xl_multi_insert_tuple *xlhdr;
+ int datalen;
+ ReorderBufferTupleBuf *tuple;
+
+ change = ReorderBufferGetChange(reorder);
+ change->action = REORDER_BUFFER_CHANGE_INSERT;
+ memcpy(&change->relnode, &xlrec->node, sizeof(RelFileNode));
+
+ /*
+ * CONTAINS_NEW_TUPLE will always be set currently as multi_insert
+ * isn't used for catalogs, but better be future proof.
+ *
+ * We decode the tuple in pretty much the same way as DecodeXLogTuple,
+ * but since the layout is slightly different, we can't use it here.
+ */
+ if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
+ {
+ change->newtuple = ReorderBufferGetTupleBuf(reorder);
+
+ tuple = change->newtuple;
+ /* not a disk based tuple */
+ ItemPointerSetInvalid(&tuple->tuple.t_self);
+
+ xlhdr = (xl_multi_insert_tuple *) SHORTALIGN(data);
+ data = ((char *) xlhdr) + SizeOfMultiInsertTuple;
+ datalen = xlhdr->datalen;
+
+ /* we can only figure this out after reassembling the transactions */
+ tuple->tuple.t_tableOid = InvalidOid;
+ tuple->tuple.t_data = &tuple->header;
+ tuple->tuple.t_len = datalen + offsetof(HeapTupleHeaderData, t_bits);
+
+ memset(&tuple->header, 0, sizeof(HeapTupleHeaderData));
+
+ memcpy((char *) &tuple->header + offsetof(HeapTupleHeaderData, t_bits),
+ (char *) data,
+ datalen);
+ data += datalen;
+
+ tuple->header.t_infomask = xlhdr->t_infomask;
+ tuple->header.t_infomask2 = xlhdr->t_infomask2;
+ tuple->header.t_hoff = xlhdr->t_hoff;
+ }
+
+ ReorderBufferAddChange(reorder, r->xl_xid, buf->origptr, change);
+ }
+}
+
+/*
+ * Read a tuple of size 'len' from 'data' into 'tuple'.
+ */
+static void
+DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
+{
+ xl_heap_header xlhdr;
+ int datalen = len - SizeOfHeapHeader;
+
+ Assert(datalen >= 0);
+ Assert(datalen <= MaxHeapTupleSize);
+
+ tuple->tuple.t_len = datalen + offsetof(HeapTupleHeaderData, t_bits);
+
+ /* not a disk based tuple */
+ ItemPointerSetInvalid(&tuple->tuple.t_self);
+
+ /* we can only figure this out after reassembling the transactions */
+ tuple->tuple.t_tableOid = InvalidOid;
+ tuple->tuple.t_data = &tuple->header;
+
+ /* data is not stored aligned */
+ memcpy((char *) &xlhdr,
+ data,
+ SizeOfHeapHeader);
+
+ memset(&tuple->header, 0, sizeof(HeapTupleHeaderData));
+
+ memcpy((char *) &tuple->header + offsetof(HeapTupleHeaderData, t_bits),
+ data + SizeOfHeapHeader,
+ datalen);
+
+ tuple->header.t_infomask = xlhdr.t_infomask;
+ tuple->header.t_infomask2 = xlhdr.t_infomask2;
+ tuple->header.t_hoff = xlhdr.t_hoff;
+}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
new file mode 100644
index 0000000..2fe009b
--- /dev/null
+++ b/src/backend/replication/logical/logical.c
@@ -0,0 +1,1047 @@
+/*-------------------------------------------------------------------------
+ *
+ * logical.c
+ *
+ * Logical decoding shared memory management
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logical/logical.c
+ *
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+#include <sys/stat.h>
+
+#include "access/transam.h"
+
+#include "fmgr.h"
+#include "miscadmin.h"
+
+#include "replication/logical.h"
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/fd.h"
+#include "storage/copydir.h"
+
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+/*
+ * logical replication on-disk data structures.
+ */
+typedef struct LogicalDecodingSlotOnDisk
+{
+ uint32 magic;
+ LogicalDecodingSlot slot;
+} LogicalDecodingSlotOnDisk;
+
+#define LOGICAL_MAGIC 0x1051CA1 /* format identifier */
+
+/* Control array for logical decoding */
+LogicalDecodingCtlData *LogicalDecodingCtl = NULL;
+
+/* My slot for logical rep in the shared memory array */
+LogicalDecodingSlot *MyLogicalDecodingSlot = NULL;
+
+/* user settable parameters */
+int max_logical_slots = 0; /* the maximum number of logical slots */
+
+static void LogicalSlotKill(int code, Datum arg);
+
+/* persistency functions */
+static void RestoreLogicalSlot(const char *name);
+static void CreateLogicalSlot(LogicalDecodingSlot *slot);
+static void SaveLogicalSlot(LogicalDecodingSlot *slot);
+static void SaveLogicalSlotInternal(LogicalDecodingSlot *slot, const char *path);
+static void DeleteLogicalSlot(LogicalDecodingSlot *slot);
+
+
+/* Report shared-memory space needed by LogicalDecodingShmemInit */
+Size
+LogicalDecodingShmemSize(void)
+{
+ Size size = 0;
+
+ if (max_logical_slots == 0)
+ return size;
+
+ size = offsetof(LogicalDecodingCtlData, logical_slots);
+ size = add_size(size,
+ mul_size(max_logical_slots, sizeof(LogicalDecodingSlot)));
+
+ return size;
+}
+
+/* Allocate and initialize walsender-related shared memory */
+void
+LogicalDecodingShmemInit(void)
+{
+ bool found;
+
+ if (max_logical_slots == 0)
+ return;
+
+ LogicalDecodingCtl = (LogicalDecodingCtlData *)
+ ShmemInitStruct("Logical Decoding Ctl", LogicalDecodingShmemSize(),
+ &found);
+
+ if (!found)
+ {
+ int i;
+
+ /* First time through, so initialize */
+ MemSet(LogicalDecodingCtl, 0, LogicalDecodingShmemSize());
+
+ LogicalDecodingCtl->xmin = InvalidTransactionId;
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *slot =
+ &LogicalDecodingCtl->logical_slots[i];
+
+ slot->xmin = InvalidTransactionId;
+ slot->effective_xmin = InvalidTransactionId;
+ SpinLockInit(&slot->mutex);
+ }
+ }
+}
+
+static void
+LogicalSlotKill(int code, Datum arg)
+{
+ /* LOCK? */
+ if (MyLogicalDecodingSlot && MyLogicalDecodingSlot->active)
+ {
+ MyLogicalDecodingSlot->active = false;
+ }
+ MyLogicalDecodingSlot = NULL;
+}
+
+/*
+ * Set the xmin required for catalog timetravel for the specific decoding slot.
+ */
+void
+IncreaseLogicalXminForSlot(XLogRecPtr lsn, TransactionId xmin)
+{
+ Assert(MyLogicalDecodingSlot != NULL);
+
+ SpinLockAcquire(&MyLogicalDecodingSlot->mutex);
+
+ /*
+ * Only increase if the previous values have been applied, otherwise we
+ * might never end up updating if the receiver acks too slowly.
+ */
+ if (MyLogicalDecodingSlot->candidate_lsn == InvalidXLogRecPtr ||
+ (lsn == MyLogicalDecodingSlot->candidate_lsn &&
+ !TransactionIdIsValid(MyLogicalDecodingSlot->candidate_xmin)))
+ {
+ MyLogicalDecodingSlot->candidate_lsn = lsn;
+ MyLogicalDecodingSlot->candidate_xmin = xmin;
+ elog(DEBUG1, "got new xmin %u at %X/%X", xmin,
+ (uint32) (lsn >> 32), (uint32) lsn);
+ }
+ SpinLockRelease(&MyLogicalDecodingSlot->mutex);
+}
+
+void
+IncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart_lsn)
+{
+ Assert(MyLogicalDecodingSlot != NULL);
+ Assert(restart_lsn != InvalidXLogRecPtr);
+ Assert(current_lsn != InvalidXLogRecPtr);
+
+ SpinLockAcquire(&MyLogicalDecodingSlot->mutex);
+
+ /*
+ * Only increase if the previous values have been applied, otherwise we
+ * might never end up updating if the receiver acks too slowly. A missed
+ * value here will just cause some extra effort after reconnecting.
+ */
+ if (MyLogicalDecodingSlot->candidate_lsn == InvalidXLogRecPtr ||
+ (current_lsn == MyLogicalDecodingSlot->candidate_lsn &&
+ MyLogicalDecodingSlot->candidate_restart_decoding == InvalidXLogRecPtr))
+ {
+ MyLogicalDecodingSlot->candidate_lsn = current_lsn;
+ MyLogicalDecodingSlot->candidate_restart_decoding = restart_lsn;
+
+ elog(DEBUG1, "got new restart lsn %X/%X at %X/%X",
+ (uint32) (restart_lsn >> 32), (uint32) restart_lsn,
+ (uint32) (current_lsn >> 32), (uint32) current_lsn);
+
+ }
+ SpinLockRelease(&MyLogicalDecodingSlot->mutex);
+}
+
+void
+LogicalConfirmReceivedLocation(XLogRecPtr lsn)
+{
+ Assert(lsn != InvalidXLogRecPtr);
+
+ /* Do an unlocked check for candidate_lsn first. */
+ if (MyLogicalDecodingSlot->candidate_lsn != InvalidXLogRecPtr)
+ {
+ bool updated_xmin = false;
+ bool updated_restart = false;
+
+ /* use volatile pointer to prevent code rearrangement */
+ volatile LogicalDecodingSlot *slot = MyLogicalDecodingSlot;
+
+ SpinLockAcquire(&slot->mutex);
+
+ slot->confirmed_flush = lsn;
+
+ /* if were past the location required for bumping xmin, do so */
+ if (slot->candidate_lsn != InvalidXLogRecPtr &&
+ slot->candidate_lsn < lsn)
+ {
+ /*
+ * We have to write the changed xmin to disk *before* we change
+ * the in-memory value, otherwise after a crash we wouldn't know
+ * that some catalog tuples might have been removed already.
+ *
+ * Ensure that by first writing to ->xmin and only update
+ * ->effective_xmin once the new state is fsynced to disk. After a
+ * crash ->effective_xmin is set to ->xmin.
+ */
+ if (TransactionIdIsValid(slot->candidate_xmin) &&
+ slot->xmin != slot->candidate_xmin)
+ {
+ slot->xmin = slot->candidate_xmin;
+ updated_xmin = true;
+ }
+
+ if (slot->candidate_restart_decoding != InvalidXLogRecPtr &&
+ slot->restart_decoding != slot->candidate_restart_decoding)
+ {
+ slot->restart_decoding = slot->candidate_restart_decoding;
+ updated_restart = true;
+ }
+
+ slot->candidate_lsn = InvalidXLogRecPtr;
+ slot->candidate_xmin = InvalidTransactionId;
+ slot->candidate_restart_decoding = InvalidXLogRecPtr;
+ }
+
+ SpinLockRelease(&slot->mutex);
+
+ /* first write new xmin to disk, so we know whats up after a crash */
+ if (updated_xmin || updated_restart)
+ /* cast away volatile, thats ok. */
+ SaveLogicalSlot((LogicalDecodingSlot *) slot);
+
+ /*
+ * now the new xmin is safely on disk, we can let the global value
+ * advance
+ */
+ if (updated_xmin)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->effective_xmin = slot->xmin;
+ SpinLockRelease(&slot->mutex);
+
+ ComputeLogicalXmin();
+ }
+ }
+ else
+ {
+ volatile LogicalDecodingSlot *slot = MyLogicalDecodingSlot;
+
+ SpinLockAcquire(&slot->mutex);
+ slot->confirmed_flush = lsn;
+ SpinLockRelease(&slot->mutex);
+ }
+}
+
+/*
+ * Compute the xmin between all of the decoding slots and store it in
+ * WalSndCtlData.
+ */
+void
+ComputeLogicalXmin(void)
+{
+ int i;
+ TransactionId xmin = InvalidTransactionId;
+ LogicalDecodingSlot *slot;
+
+ Assert(LogicalDecodingCtl);
+
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&slot->mutex);
+ if (slot->in_use &&
+ TransactionIdIsValid(slot->effective_xmin) && (
+ !TransactionIdIsValid(xmin) ||
+ TransactionIdPrecedes(slot->effective_xmin, xmin))
+ )
+ {
+ xmin = slot->effective_xmin;
+ }
+ SpinLockRelease(&slot->mutex);
+ }
+ LogicalDecodingCtl->xmin = xmin;
+ LWLockRelease(ProcArrayLock);
+
+ elog(DEBUG1, "computed new global xmin for decoding: %u", xmin);
+}
+
+/*
+ * Make sure the current settings & environment are capable of doing logical
+ * replication.
+ */
+void
+CheckLogicalReplicationRequirements(void)
+{
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ /* XXX invent class 51 for code 51028? */
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication requires wal_level=logical")));
+
+ if (MyDatabaseId == InvalidOid)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication requires to be connected to a database")));
+
+ if (max_logical_slots == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ (errmsg("logical replication requires needs max_logical_slots > 0"))));
+}
+
+/*
+ * Search for a free slot, mark it as used and acquire a valid xmin horizon
+ * value.
+ */
+void
+LogicalDecodingAcquireFreeSlot(const char *name, const char *plugin)
+{
+ LogicalDecodingSlot *slot;
+ bool name_in_use;
+ int i;
+
+ Assert(!MyLogicalDecodingSlot);
+
+ CheckLogicalReplicationRequirements();
+
+ LWLockAcquire(LogicalReplicationCtlLock, LW_EXCLUSIVE);
+
+ /* First, make sure the requested name is not in use. */
+
+ name_in_use = false;
+ for (i = 0; i < max_logical_slots && !name_in_use; i++)
+ {
+ LogicalDecodingSlot *s = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&s->mutex);
+ if (s->in_use && strcmp(name, NameStr(s->name)) == 0)
+ name_in_use = true;
+ SpinLockRelease(&s->mutex);
+ }
+
+ if (name_in_use)
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("There already is a logical slot named \"%s\"", name)));
+
+ /* Find the first available (not in_use (=> not active)) slot. */
+
+ slot = NULL;
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *s = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&s->mutex);
+ if (!s->in_use)
+ {
+ Assert(!s->active);
+ /* NOT releasing the lock yet */
+ slot = s;
+ break;
+ }
+ SpinLockRelease(&s->mutex);
+ }
+
+ LWLockRelease(LogicalReplicationCtlLock);
+
+ if (!slot)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+ errmsg("couldn't find free logical slot. free one or increase max_logical_slots")));
+
+ MyLogicalDecodingSlot = slot;
+
+ /* Lets start with enough information if we can */
+ if (!RecoveryInProgress())
+ slot->restart_decoding = LogStandbySnapshot();
+ else
+ slot->restart_decoding = GetRedoRecPtr();
+
+ slot->in_use = true;
+ slot->active = true;
+ slot->database = MyDatabaseId;
+ /* XXX: do we want to use truncate identifier instead? */
+ strncpy(NameStr(slot->plugin), plugin, NAMEDATALEN);
+ NameStr(slot->plugin)[NAMEDATALEN - 1] = '\0';
+ strncpy(NameStr(slot->name), name, NAMEDATALEN);
+ NameStr(slot->name)[NAMEDATALEN - 1] = '\0';
+
+ /* Arrange to clean up at exit/error */
+ on_shmem_exit(LogicalSlotKill, 0);
+
+ /* release slot so it can be examined by others */
+ SpinLockRelease(&slot->mutex);
+
+ /* XXX: verify that the specified plugin is valid */
+
+ /*
+ * Acquire the current global xmin value and directly set the logical xmin
+ * before releasing the lock if necessary. We do this so wal decoding is
+ * guaranteed to have all catalog rows produced by xacts with an xid >
+ * walsnd->xmin available.
+ *
+ * We can't use ComputeLogicalXmin here as that acquires ProcArrayLock
+ * separately which would open a short window for the global xmin to
+ * advance above walsnd->xmin.
+ */
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+ slot->effective_xmin = GetOldestXmin(true, true, true, true);
+ slot->xmin = slot->effective_xmin;
+
+ if (!TransactionIdIsValid(LogicalDecodingCtl->xmin) ||
+ NormalTransactionIdPrecedes(slot->effective_xmin, LogicalDecodingCtl->xmin))
+ LogicalDecodingCtl->xmin = slot->effective_xmin;
+ LWLockRelease(ProcArrayLock);
+
+ Assert(slot->effective_xmin <= GetOldestXmin(true, true, true, false));
+
+ LWLockAcquire(LogicalReplicationCtlLock, LW_EXCLUSIVE);
+ CreateLogicalSlot(slot);
+ LWLockRelease(LogicalReplicationCtlLock);
+}
+
+/*
+ * Find an previously initiated slot and mark it as used again.
+ */
+void
+LogicalDecodingReAcquireSlot(const char *name)
+{
+ LogicalDecodingSlot *slot;
+ int i;
+
+ CheckLogicalReplicationRequirements();
+
+ Assert(!MyLogicalDecodingSlot);
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&slot->mutex);
+ if (slot->in_use && strcmp(name, NameStr(slot->name)) == 0)
+ {
+ MyLogicalDecodingSlot = slot;
+ /* NOT releasing the lock yet */
+ break;
+ }
+ SpinLockRelease(&slot->mutex);
+ }
+
+ if (!MyLogicalDecodingSlot)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("couldn't find logical slot \"%s\"", name)));
+
+ slot = MyLogicalDecodingSlot;
+
+ if (slot->active)
+ {
+ SpinLockRelease(&slot->mutex);
+ MyLogicalDecodingSlot = NULL;
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg("slot already active")));
+ }
+
+ slot->active = true;
+ /* now that we've marked it as active, we release our lock */
+ SpinLockRelease(&slot->mutex);
+
+ /* Don't let the user switch the database... */
+ if (slot->database != MyDatabaseId)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->active = false;
+ SpinLockRelease(&slot->mutex);
+
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ (errmsg("START_LOGICAL_REPLICATION needs to be run in the same database as INIT_LOGICAL_REPLICATION"))));
+ }
+
+ /* Arrange to clean up at exit */
+ on_shmem_exit(LogicalSlotKill, 0);
+
+ SaveLogicalSlot(slot);
+}
+
+/*
+ * Temporarily remove a logical decoding slot, this or another backend can
+ * reacquire it later.
+ */
+void
+LogicalDecodingReleaseSlot(void)
+{
+ LogicalDecodingSlot *slot;
+
+ CheckLogicalReplicationRequirements();
+
+ slot = MyLogicalDecodingSlot;
+
+ Assert(slot != NULL && slot->active);
+
+ SpinLockAcquire(&slot->mutex);
+ slot->active = false;
+ SpinLockRelease(&slot->mutex);
+
+ MyLogicalDecodingSlot = NULL;
+
+ SaveLogicalSlot(slot);
+
+ cancel_shmem_exit(LogicalSlotKill, 0);
+}
+
+/*
+ * Permanently remove a logical decoding slot.
+ */
+void
+LogicalDecodingFreeSlot(const char *name)
+{
+ LogicalDecodingSlot *slot = NULL;
+ int i;
+
+ CheckLogicalReplicationRequirements();
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ SpinLockAcquire(&slot->mutex);
+ if (slot->in_use && strcmp(name, NameStr(slot->name)) == 0)
+ {
+ /* NOT releasing the lock yet */
+ break;
+ }
+ SpinLockRelease(&slot->mutex);
+ slot = NULL;
+ }
+
+ if (!slot)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("couldn't find logical slot \"%s\"", name)));
+
+ if (slot->active)
+ {
+ SpinLockRelease(&slot->mutex);
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg("cannot free active logical slot \"%s\"", name)));
+ }
+
+ /*
+ * Mark it as as active, so nobody can claim this slot while we are
+ * working on it. We don't want to hold the spinlock while doing stuff
+ * like fsyncing the state file to disk.
+ */
+ slot->active = true;
+
+ SpinLockRelease(&slot->mutex);
+
+ /*
+ * Start critical section, we can't to be interrupted while on-disk/memory
+ * state aren't coherent.
+ */
+ START_CRIT_SECTION();
+
+ DeleteLogicalSlot(slot);
+
+ /* ok, everything gone, after a crash we now wouldn't restore this slot */
+ SpinLockAcquire(&slot->mutex);
+ slot->active = false;
+ slot->in_use = false;
+ SpinLockRelease(&slot->mutex);
+
+ END_CRIT_SECTION();
+
+ /* slot is dead and doesn't nail the xmin anymore */
+ ComputeLogicalXmin();
+}
+
+/*
+ * Load replication state from disk into memory at server startup.
+ */
+void
+StartupLogicalReplication(XLogRecPtr checkPointRedo)
+{
+ DIR *logical_dir;
+ struct dirent *logical_de;
+
+ ereport(DEBUG1,
+ (errmsg("starting up logical decoding from %X/%X",
+ (uint32) (checkPointRedo >> 32), (uint32) checkPointRedo)));
+
+ /* restore all slots */
+ logical_dir = AllocateDir("pg_llog");
+ while ((logical_de = ReadDir(logical_dir, "pg_llog")) != NULL)
+ {
+ if (strcmp(logical_de->d_name, ".") == 0 ||
+ strcmp(logical_de->d_name, "..") == 0)
+ continue;
+
+ /* one of our own directories */
+ if (strcmp(logical_de->d_name, "snapshots") == 0)
+ continue;
+
+ /* we crashed while a slot was being setup or deleted, clean up */
+ if (strcmp(logical_de->d_name, "new") == 0 ||
+ strcmp(logical_de->d_name, "old") == 0)
+ {
+ char path[MAXPGPATH];
+
+ sprintf(path, "pg_llog/%s", logical_de->d_name);
+
+ if (!rmtree(path, true))
+ {
+ FreeDir(logical_dir);
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not remove directory \"%s\": %m",
+ path)));
+ }
+ continue;
+ }
+
+ RestoreLogicalSlot(logical_de->d_name);
+ }
+ FreeDir(logical_dir);
+
+ if (max_logical_slots <= 0)
+ return;
+
+ /* Now that we have recovered all the data, compute logical xmin */
+ ComputeLogicalXmin();
+
+ ReorderBufferStartup();
+}
+
+/* ----
+ * Manipulation of ondisk state of logical slots
+ * ----
+ */
+static void
+CreateLogicalSlot(LogicalDecodingSlot *slot)
+{
+ char tmppath[MAXPGPATH];
+ char path[MAXPGPATH];
+
+ START_CRIT_SECTION();
+
+ sprintf(tmppath, "pg_llog/new");
+ sprintf(path, "pg_llog/%s", NameStr(slot->name));
+
+ if (mkdir(tmppath, S_IRWXU) < 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m",
+ tmppath)));
+
+ fsync_fname(tmppath, true);
+
+ SaveLogicalSlotInternal(slot, tmppath);
+
+ if (rename(tmppath, path) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+ tmppath, path)));
+ }
+
+ fsync_fname(path, true);
+
+ END_CRIT_SECTION();
+}
+
+static void
+SaveLogicalSlot(LogicalDecodingSlot *slot)
+{
+ char path[MAXPGPATH];
+
+ sprintf(path, "pg_llog/%s", NameStr(slot->name));
+ SaveLogicalSlotInternal(slot, path);
+}
+
+/*
+ * Shared functionality between saving and creating a logical slot.
+ */
+static void
+SaveLogicalSlotInternal(LogicalDecodingSlot *slot, const char *dir)
+{
+ char tmppath[MAXPGPATH];
+ char path[MAXPGPATH];
+ int fd;
+ LogicalDecodingSlotOnDisk cp;
+
+ /* silence valgrind :( */
+ memset(&cp, 0, sizeof(LogicalDecodingSlotOnDisk));
+
+ sprintf(tmppath, "%s/state.tmp", dir);
+ sprintf(path, "%s/state", dir);
+
+ START_CRIT_SECTION();
+
+ fd = OpenTransientFile(tmppath,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+ if (fd < 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not create logical checkpoint file \"%s\": %m",
+ tmppath)));
+
+ cp.magic = LOGICAL_MAGIC;
+
+ SpinLockAcquire(&slot->mutex);
+
+ cp.slot.xmin = slot->xmin;
+ cp.slot.effective_xmin = slot->effective_xmin;
+
+ strcpy(NameStr(cp.slot.name), NameStr(slot->name));
+ strcpy(NameStr(cp.slot.plugin), NameStr(slot->plugin));
+
+ cp.slot.database = slot->database;
+ cp.slot.confirmed_flush = slot->confirmed_flush;
+ cp.slot.restart_decoding = slot->restart_decoding;
+ cp.slot.candidate_lsn = InvalidXLogRecPtr;
+ cp.slot.candidate_xmin = InvalidTransactionId;
+ cp.slot.candidate_restart_decoding = InvalidXLogRecPtr;
+ cp.slot.in_use = slot->in_use;
+ cp.slot.active = false;
+
+ SpinLockRelease(&slot->mutex);
+
+ if ((write(fd, &cp, sizeof(cp))) != sizeof(cp))
+ {
+ CloseTransientFile(fd);
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not write logical checkpoint file \"%s\": %m",
+ tmppath)));
+ }
+
+ /* fsync the file */
+ if (pg_fsync(fd) != 0)
+ {
+ CloseTransientFile(fd);
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not fsync logical checkpoint \"%s\": %m",
+ tmppath)));
+ }
+
+ CloseTransientFile(fd);
+
+ /* rename to permanent file, fsync file and directory */
+ if (rename(tmppath, path) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+ tmppath, path)));
+ }
+
+ fsync_fname((char *) dir, true);
+ fsync_fname(path, false);
+
+ END_CRIT_SECTION();
+}
+
+
+static void
+DeleteLogicalSlot(LogicalDecodingSlot *slot)
+{
+ char path[MAXPGPATH];
+ char tmppath[] = "pg_llog/old";
+
+ START_CRIT_SECTION();
+
+ sprintf(path, "pg_llog/%s", NameStr(slot->name));
+
+ if (rename(path, tmppath) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename logical checkpoint from \"%s\" to \"%s\": %m",
+ path, tmppath)));
+ }
+
+ /* make sure no partial state is visible after a crash */
+ fsync_fname(tmppath, true);
+ fsync_fname("pg_llog", true);
+
+ if (!rmtree(tmppath, true))
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not remove directory \"%s\": %m",
+ tmppath)));
+ }
+
+ END_CRIT_SECTION();
+}
+
+/*
+ * Load a single ondisk slot into memory.
+ */
+static void
+RestoreLogicalSlot(const char *name)
+{
+ LogicalDecodingSlotOnDisk cp;
+ int i;
+ char path[MAXPGPATH];
+ int fd;
+ bool restored = false;
+ int readBytes;
+
+ START_CRIT_SECTION();
+
+ /* delete temp file if it exists */
+ sprintf(path, "pg_llog/%s/state.tmp", name);
+ if (unlink(path) < 0 && errno != ENOENT)
+ ereport(PANIC, (errmsg("failed while unlinking %s", path)));
+
+ sprintf(path, "pg_llog/%s/state", name);
+
+ elog(DEBUG1, "restoring logical slot from %s", path);
+
+ fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+
+ /*
+ * We do not need to handle this as we are rename()ing the directory into
+ * place only after we fsync()ed the state file.
+ */
+ if (fd < 0)
+ ereport(PANIC, (errmsg("could not open state file %s", path)));
+
+ readBytes = read(fd, &cp, sizeof(cp));
+ if (readBytes != sizeof(cp))
+ {
+ int saved_errno = errno;
+
+ CloseTransientFile(fd);
+ errno = saved_errno;
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not read logical checkpoint file \"%s\": %m, read %d of %zu",
+ path, readBytes, sizeof(cp))));
+ }
+
+ CloseTransientFile(fd);
+
+ if (cp.magic != LOGICAL_MAGIC)
+ ereport(PANIC, (errmsg("Logical checkpoint has wrong magic %u instead of %u",
+ cp.magic, LOGICAL_MAGIC)));
+
+ /* nothing can be active yet, don't lock anything */
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *slot;
+
+ slot = &LogicalDecodingCtl->logical_slots[i];
+
+ if (slot->in_use)
+ continue;
+
+ slot->xmin = cp.slot.xmin;
+ /* XXX: after a crash, always use xmin, not effective_xmin */
+ slot->effective_xmin = cp.slot.xmin;
+ strcpy(NameStr(slot->name), NameStr(cp.slot.name));
+ strcpy(NameStr(slot->plugin), NameStr(cp.slot.plugin));
+ slot->database = cp.slot.database;
+ slot->restart_decoding = cp.slot.restart_decoding;
+ slot->confirmed_flush = cp.slot.confirmed_flush;
+ slot->candidate_lsn = InvalidXLogRecPtr;
+ slot->candidate_xmin = InvalidTransactionId;
+ slot->candidate_restart_decoding = InvalidXLogRecPtr;
+ slot->in_use = true;
+ slot->active = false;
+ restored = true;
+
+ /*
+ * FIXME: Do some validation here.
+ */
+ break;
+ }
+
+ if (!restored)
+ ereport(PANIC,
+ (errmsg("too many logical slots active before shutdown, increase max_logical_slots and try again")));
+
+ END_CRIT_SECTION();
+}
+
+
+static void
+LoadOutputPlugin(OutputPluginCallbacks *callbacks, char *plugin)
+{
+ /* lookup symbols in the shared libarary */
+
+ /* optional */
+ callbacks->init_cb = (LogicalDecodeInitCB)
+ load_external_function(plugin, "pg_decode_init", false, NULL);
+
+ /* required */
+ callbacks->begin_cb = (LogicalDecodeBeginCB)
+ load_external_function(plugin, "pg_decode_begin_txn", true, NULL);
+
+ /* required */
+ callbacks->change_cb = (LogicalDecodeChangeCB)
+ load_external_function(plugin, "pg_decode_change", true, NULL);
+
+ /* required */
+ callbacks->commit_cb = (LogicalDecodeCommitCB)
+ load_external_function(plugin, "pg_decode_commit_txn", true, NULL);
+
+ /* optional */
+ callbacks->cleanup_cb = (LogicalDecodeCleanupCB)
+ load_external_function(plugin, "pg_decode_clean", false, NULL);
+}
+
+/*
+ * Context management functions to make coordination between the different
+ * logical decoding pieces.
+ */
+
+/*
+ * Callbacks for ReorderBuffer which add in some more information and then call
+ * output_plugin.h plugins.
+ */
+static void
+begin_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+
+ ctx->callbacks.begin_cb(ctx, txn);
+}
+
+static void
+commit_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+
+ ctx->callbacks.commit_cb(ctx, txn, commit_lsn);
+}
+
+static void
+change_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+
+ ctx->callbacks.change_cb(ctx, txn, relation, change);
+}
+
+LogicalDecodingContext *
+CreateLogicalDecodingContext(LogicalDecodingSlot *slot,
+ bool is_init,
+ XLogRecPtr start_lsn,
+ List *output_plugin_options,
+ XLogPageReadCB read_page,
+ LogicalOutputPluginWriterPrepareWrite prepare_write,
+ LogicalOutputPluginWriterWrite do_write)
+{
+ MemoryContext context;
+ MemoryContext old_context;
+ TransactionId xmin_horizon;
+ LogicalDecodingContext *ctx;
+
+ context = AllocSetContextCreate(TopMemoryContext,
+ "ReorderBuffer",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ old_context = MemoryContextSwitchTo(context);
+ ctx = palloc0(sizeof(LogicalDecodingContext));
+
+
+ /* load output plugins first, so we detect a wrong output plugin early */
+ LoadOutputPlugin(&ctx->callbacks, NameStr(slot->plugin));
+
+ if (is_init && start_lsn != InvalidXLogRecPtr)
+ elog(ERROR, "cannot initially start at a specified lsn");
+
+ if (is_init)
+ xmin_horizon = slot->xmin;
+ else
+ xmin_horizon = InvalidTransactionId;
+
+ ctx->slot = slot;
+
+ ctx->reader = XLogReaderAllocate(read_page, ctx);
+ ctx->reader->private_data = ctx;
+
+ ctx->reorder = ReorderBufferAllocate();
+ ctx->snapshot_builder =
+ AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn);
+
+ ctx->reorder->private_data = ctx;
+
+ ctx->reorder->begin = begin_txn_wrapper;
+ ctx->reorder->apply_change = change_wrapper;
+ ctx->reorder->commit = commit_txn_wrapper;
+
+ ctx->out = makeStringInfo();
+ ctx->prepare_write = prepare_write;
+ ctx->write = do_write;
+
+ ctx->output_plugin_options = output_plugin_options;
+
+ if (is_init)
+ ctx->stop_after_consistent = true;
+ else
+ ctx->stop_after_consistent = false;
+
+ /* call output plugin initialization callback */
+ if (ctx->callbacks.init_cb != NULL)
+ ctx->callbacks.init_cb(ctx, is_init);
+
+ MemoryContextSwitchTo(old_context);
+
+ return ctx;
+}
+
+void
+FreeLogicalDecodingContext(LogicalDecodingContext *ctx)
+{
+ if (ctx->callbacks.cleanup_cb != NULL)
+ ctx->callbacks.cleanup_cb(ctx);
+}
+
+
+/* has the initial snapshot found a consistent state? */
+bool
+LogicalDecodingContextReady(LogicalDecodingContext *ctx)
+{
+ return SnapBuildCurrentState(ctx->snapshot_builder) == SNAPBUILD_CONSISTENT;
+}
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
new file mode 100644
index 0000000..9837a95
--- /dev/null
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -0,0 +1,361 @@
+/*-------------------------------------------------------------------------
+ *
+ * logicalfuncs.c
+ *
+ * Support functions for using xlog decoding
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logicalfuncs.c
+ *
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "fmgr.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "storage/fd.h"
+
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+
+Datum init_logical_replication(PG_FUNCTION_ARGS);
+Datum stop_logical_replication(PG_FUNCTION_ARGS);
+Datum pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS);
+
+/* FIXME: duplicate code with pg_xlogdump, similar to walsender.c */
+static void
+XLogRead(char *buf, XLogRecPtr startptr, Size count)
+{
+ char *p;
+ XLogRecPtr recptr;
+ Size nbytes;
+
+ static int sendFile = -1;
+ static XLogSegNo sendSegNo = 0;
+ static uint32 sendOff = 0;
+
+ p = buf;
+ recptr = startptr;
+ nbytes = count;
+
+ while (nbytes > 0)
+ {
+ uint32 startoff;
+ int segbytes;
+ int readbytes;
+
+ startoff = recptr % XLogSegSize;
+
+ if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))
+ {
+ char path[MAXPGPATH];
+
+ /* Switch to another logfile segment */
+ if (sendFile >= 0)
+ close(sendFile);
+
+ XLByteToSeg(recptr, sendSegNo);
+
+ XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+ sendFile = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);
+
+ if (sendFile < 0)
+ {
+ if (errno == ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("requested WAL segment %s has already been removed",
+ path)));
+ else
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m",
+ path)));
+ }
+ sendOff = 0;
+ }
+
+ /* Need to seek in the file? */
+ if (sendOff != startoff)
+ {
+ if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0)
+ {
+ char path[MAXPGPATH];
+
+ XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not seek in log segment %s to offset %u: %m",
+ path, startoff)));
+ }
+ sendOff = startoff;
+ }
+
+ /* How many bytes are within this segment? */
+ if (nbytes > (XLogSegSize - startoff))
+ segbytes = XLogSegSize - startoff;
+ else
+ segbytes = nbytes;
+
+ readbytes = read(sendFile, p, segbytes);
+ if (readbytes <= 0)
+ {
+ char path[MAXPGPATH];
+
+ XLogFilePath(path, ThisTimeLineID, sendSegNo);
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read from log segment %s, offset %u, length %lu: %m",
+ path, sendOff, (unsigned long) segbytes)));
+ }
+
+ /* Update state for read */
+ recptr += readbytes;
+
+ sendOff += readbytes;
+ nbytes -= readbytes;
+ p += readbytes;
+ }
+}
+
+int
+logical_read_local_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr,
+ int reqLen, XLogRecPtr targetRecPtr, char *cur_page, TimeLineID *pageTLI)
+{
+ XLogRecPtr flushptr,
+ loc;
+ int count;
+
+ loc = targetPagePtr + reqLen;
+ while (1)
+ {
+ flushptr = GetFlushRecPtr();
+ if (loc <= flushptr)
+ break;
+ pg_usleep(1000L);
+ }
+
+ /* more than one block available */
+ if (targetPagePtr + XLOG_BLCKSZ <= flushptr)
+ count = XLOG_BLCKSZ;
+ /* not enough data there */
+ else if (targetPagePtr + reqLen > flushptr)
+ return -1;
+ /* part of the page available */
+ else
+ count = flushptr - targetPagePtr;
+
+ /* FIXME: more sensible/efficient implementation */
+ XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+
+ return count;
+}
+
+static void
+DummyWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ elog(ERROR, "init_logical_replication shouldn't be writing anything");
+}
+
+Datum
+init_logical_replication(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+ Name plugin = PG_GETARG_NAME(1);
+
+ char xpos[MAXFNAMELEN];
+
+ TupleDesc tupdesc;
+ HeapTuple tuple;
+ Datum result;
+ Datum values[2];
+ bool nulls[2];
+ LogicalDecodingContext *ctx = NULL;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ /* Acquire a logical replication slot */
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingAcquireFreeSlot(NameStr(*name), NameStr(*plugin));
+
+ /* make sure we don't end up with an unreleased slot */
+ PG_TRY();
+ {
+ XLogRecPtr startptr;
+
+ /*
+ * Use the same initial_snapshot_reader, but with our own read_page
+ * callback that does not depend on walsender.
+ */
+ ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, true,
+ InvalidXLogRecPtr, NIL,
+ logical_read_local_xlog_page,
+ DummyWrite, DummyWrite);
+
+ /* setup from where to read xlog */
+ startptr = ctx->slot->restart_decoding;
+
+ /* Wait for a consistent starting point */
+ for (;;)
+ {
+ XLogRecord *record;
+ XLogRecordBuffer buf;
+ char *err = NULL;
+
+ /* the read_page callback waits for new WAL */
+ record = XLogReadRecord(ctx->reader, startptr, &err);
+ if (err)
+ elog(ERROR, "%s", err);
+
+ Assert(record);
+
+ startptr = InvalidXLogRecPtr;
+
+ buf.origptr = ctx->reader->ReadRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+ DecodeRecordIntoReorderBuffer(ctx, &buf);
+
+ /* only continue till we found a consistent spot */
+ if (LogicalDecodingContextReady(ctx))
+ break;
+ }
+
+ /* Extract the values we want */
+ MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+ snprintf(xpos, sizeof(xpos), "%X/%X",
+ (uint32) (MyLogicalDecodingSlot->confirmed_flush >> 32),
+ (uint32) MyLogicalDecodingSlot->confirmed_flush);
+ }
+ PG_CATCH();
+ {
+ LogicalDecodingReleaseSlot();
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ values[0] = CStringGetTextDatum(NameStr(MyLogicalDecodingSlot->name));
+ values[1] = CStringGetTextDatum(xpos);
+
+ memset(nulls, 0, sizeof(nulls));
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ result = HeapTupleGetDatum(tuple);
+
+ LogicalDecodingReleaseSlot();
+
+ PG_RETURN_DATUM(result);
+}
+
+Datum
+stop_logical_replication(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingFreeSlot(NameStr(*name));
+
+ PG_RETURN_INT32(0);
+}
+
+/*
+ * Return one row for each logical replication slot currently in use.
+ */
+
+Datum
+pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS 6
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ MemoryContext per_query_ctx;
+ MemoryContext oldcontext;
+ int i;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("materialize mode required, but it is not " \
+ "allowed in this context")));
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+ tupstore = tuplestore_begin_heap(true, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = tupstore;
+ rsinfo->setDesc = tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ for (i = 0; i < max_logical_slots; i++)
+ {
+ LogicalDecodingSlot *slot = &LogicalDecodingCtl->logical_slots[i];
+ Datum values[PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS];
+ bool nulls[PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS];
+ char location[MAXFNAMELEN];
+ const char *slot_name;
+ const char *plugin;
+ TransactionId xmin;
+ XLogRecPtr last_req;
+ bool active;
+ Oid database;
+
+ SpinLockAcquire(&slot->mutex);
+ if (!slot->in_use)
+ {
+ SpinLockRelease(&slot->mutex);
+ continue;
+ }
+ else
+ {
+ xmin = slot->xmin;
+ active = slot->active;
+ database = slot->database;
+ last_req = slot->restart_decoding;
+ slot_name = pstrdup(NameStr(slot->name));
+ plugin = pstrdup(NameStr(slot->plugin));
+ }
+ SpinLockRelease(&slot->mutex);
+
+ memset(nulls, 0, sizeof(nulls));
+
+ snprintf(location, sizeof(location), "%X/%X",
+ (uint32) (last_req >> 32), (uint32) last_req);
+
+ values[0] = CStringGetTextDatum(slot_name);
+ values[1] = CStringGetTextDatum(plugin);
+ values[2] = database;
+ values[3] = BoolGetDatum(active);
+ values[4] = TransactionIdGetDatum(xmin);
+ values[5] = CStringGetTextDatum(location);
+
+ tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+ }
+
+ tuplestore_donestoring(tupstore);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
new file mode 100644
index 0000000..6d2866d
--- /dev/null
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -0,0 +1,2449 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer.c
+ *
+ * PostgreSQL logical replay "cache" management
+ *
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/replication/reorderbuffer.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "access/heapam.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlog_internal.h"
+
+#include "catalog/catalog.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_control.h"
+
+#include "common/relpath.h"
+
+#include "lib/binaryheap.h"
+
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h" /* just for SnapBuildSnapDecRefcount */
+#include "replication/logical.h"
+
+#include "storage/bufmgr.h"
+#include "storage/fd.h"
+#include "storage/sinval.h"
+
+#include "utils/builtins.h"
+#include "utils/combocid.h"
+#include "utils/memutils.h"
+#include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tqual.h"
+#include "utils/syscache.h"
+
+/*
+ * For efficiency and simplicity reasons we want to keep Snapshots, CommandIds
+ * and ComboCids in the same list with the user visible INSERT/UPDATE/DELETE
+ * changes. We don't want to leak those internal values to external users
+ * though (they would just use switch()...default:) because that would make it
+ * harder to add to new user visible values.
+ *
+ * This needs to be synchronized with ReorderBufferChangeType! Adjust the
+ * StaticAssertExpr's in ReorderBufferAllocate if you add anything!
+ */
+typedef enum
+{
+ REORDER_BUFFER_CHANGE_INTERNAL_INSERT,
+ REORDER_BUFFER_CHANGE_INTERNAL_UPDATE,
+ REORDER_BUFFER_CHANGE_INTERNAL_DELETE,
+ REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT,
+ REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
+ REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID
+} ReorderBufferChangeTypeInternal;
+
+
+/* entry for a hash table we use to map from xid to our transaction state */
+typedef struct ReorderBufferTXNByIdEnt
+{
+ TransactionId xid;
+ ReorderBufferTXN *txn;
+} ReorderBufferTXNByIdEnt;
+
+
+/* data structures for (relfilenode, ctid) => (cmin, cmax) mapping */
+typedef struct ReorderBufferTupleCidKey
+{
+ RelFileNode relnode;
+ ItemPointerData tid;
+} ReorderBufferTupleCidKey;
+
+typedef struct ReorderBufferTupleCidEnt
+{
+ ReorderBufferTupleCidKey key;
+ CommandId cmin;
+ CommandId cmax;
+ CommandId combocid; /* just for debugging */
+} ReorderBufferTupleCidEnt;
+
+
+/* k-way in-order change iteration support structures */
+typedef struct ReorderBufferIterTXNEntry
+{
+ XLogRecPtr lsn;
+ ReorderBufferChange *change;
+ ReorderBufferTXN *txn;
+ int fd;
+ XLogSegNo segno;
+} ReorderBufferIterTXNEntry;
+
+typedef struct ReorderBufferIterTXNState
+{
+ binaryheap *heap;
+ Size nr_txns;
+ dlist_head old_change;
+ ReorderBufferIterTXNEntry entries[FLEXIBLE_ARRAY_MEMBER];
+} ReorderBufferIterTXNState;
+
+
+/* toast datastructures */
+typedef struct ReorderBufferToastEnt
+{
+ Oid chunk_id; /* toast_table.chunk_id */
+ int32 last_chunk_seq; /* toast_table.chunk_seq of the last chunk we
+ * have seen */
+ Size num_chunks; /* number of chunks we've already seen */
+ Size size; /* combined size of chunks seen */
+ dlist_head chunks; /* linked list of chunks */
+ struct varlena *reconstructed; /* reconstructed varlena now pointed
+ * to in main tup */
+} ReorderBufferToastEnt;
+
+
+/* number of changes kept in memory, per transaction */
+const Size max_memtries = 4096;
+
+/* Size of the slab caches used for frequently allocated objects */
+const Size max_cached_changes = 4096 * 2;
+const Size max_cached_tuplebufs = 1024; /* ~8MB */
+const Size max_cached_transactions = 512;
+
+
+/* ---------------------------------------
+ * primary reorderbuffer support routines
+ * ---------------------------------------
+ */
+static ReorderBufferTXN *ReorderBufferGetTXN(ReorderBuffer *buffer);
+static void ReorderBufferReturnTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static ReorderBufferTXN *ReorderBufferTXNByXid(ReorderBuffer *buffer,
+ TransactionId xid, bool create, bool *is_new,
+ XLogRecPtr lsn, bool create_as_top);
+
+static void AssertTXNLsnOrder(ReorderBuffer *buffer);
+
+/* ---------------------------------------
+ * support functions for lsn-order iterating over the ->changes of a
+ * transaction and its subtransactions
+ *
+ * used for iteration over the k-way heap merge of a transaction and its
+ * subtransactions
+ * ---------------------------------------
+ */
+static ReorderBufferIterTXNState *ReorderBufferIterTXNInit(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static ReorderBufferChange *
+ ReorderBufferIterTXNNext(ReorderBuffer *buffer, ReorderBufferIterTXNState *state);
+static void ReorderBufferIterTXNFinish(ReorderBuffer *buffer,
+ ReorderBufferIterTXNState *state);
+static void ReorderBufferExecuteInvalidations(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+
+/*
+ * ---------------------------------------
+ * Disk serialization support functions
+ * ---------------------------------------
+ */
+static void ReorderBufferCheckSerializeTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static void ReorderBufferSerializeTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static void ReorderBufferSerializeChange(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+ int fd, ReorderBufferChange *change);
+static Size ReorderBufferRestoreChanges(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+ int *fd, XLogSegNo *segno);
+static void ReorderBufferRestoreChange(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+ char *change);
+static void ReorderBufferRestoreCleanup(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+
+static void ReorderBufferFreeSnap(ReorderBuffer *buffer, Snapshot snap);
+static Snapshot ReorderBufferCopySnap(ReorderBuffer *buffer, Snapshot orig_snap,
+ ReorderBufferTXN *txn, CommandId cid);
+
+/* ---------------------------------------
+ * toast reassembly support
+ * ---------------------------------------
+ */
+/* Size of an EXTERNAL datum that contains a standard TOAST pointer */
+#define TOAST_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_external))
+
+/* Size of an indirect datum that contains a standard TOAST pointer */
+#define INDIRECT_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_indirect))
+
+static void ReorderBufferToastInitHash(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static void ReorderBufferToastReset(ReorderBuffer *buffer, ReorderBufferTXN *txn);
+static void ReorderBufferToastReplace(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change);
+static void ReorderBufferToastAppendChunk(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change);
+
+
+/*
+ * Allocate a new ReorderBuffer
+ */
+ReorderBuffer *
+ReorderBufferAllocate(void)
+{
+ ReorderBuffer *buffer;
+ HASHCTL hash_ctl;
+ MemoryContext new_ctx;
+
+ StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_INSERT == (int) REORDER_BUFFER_CHANGE_INSERT, "out of sync enums");
+ StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_UPDATE == (int) REORDER_BUFFER_CHANGE_UPDATE, "out of sync enums");
+ StaticAssertExpr((int) REORDER_BUFFER_CHANGE_INTERNAL_DELETE == (int) REORDER_BUFFER_CHANGE_DELETE, "out of sync enums");
+
+ new_ctx = AllocSetContextCreate(TopMemoryContext,
+ "ReorderBuffer",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ buffer = (ReorderBuffer *) MemoryContextAlloc(new_ctx, sizeof(ReorderBuffer));
+
+ memset(&hash_ctl, 0, sizeof(hash_ctl));
+
+ buffer->context = new_ctx;
+
+ hash_ctl.keysize = sizeof(TransactionId);
+ hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt);
+ hash_ctl.hash = tag_hash;
+ hash_ctl.hcxt = buffer->context;
+
+ buffer->by_txn = hash_create("ReorderBufferByXid", 1000, &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+
+ buffer->by_txn_last_xid = InvalidTransactionId;
+ buffer->by_txn_last_txn = NULL;
+
+ buffer->nr_cached_transactions = 0;
+ buffer->nr_cached_changes = 0;
+ buffer->nr_cached_tuplebufs = 0;
+
+ buffer->outbuf = NULL;
+ buffer->outbufsize = 0;
+
+ dlist_init(&buffer->toplevel_by_lsn);
+ dlist_init(&buffer->cached_transactions);
+ dlist_init(&buffer->cached_changes);
+ slist_init(&buffer->cached_tuplebufs);
+
+ return buffer;
+}
+
+/*
+ * Free a ReorderBuffer
+ */
+void
+ReorderBufferFree(ReorderBuffer *buffer)
+{
+ /* FIXME: check for in-progress transactions */
+ /* FIXME: clean up cached transaction */
+ /* FIXME: clean up cached changes */
+ /* FIXME: clean up cached tuplebufs */
+ if (buffer->outbufsize > 0)
+ pfree(buffer->outbuf);
+
+ hash_destroy(buffer->by_txn);
+ pfree(buffer);
+}
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferTXN.
+ */
+static ReorderBufferTXN *
+ReorderBufferGetTXN(ReorderBuffer *buffer)
+{
+ ReorderBufferTXN *txn;
+
+ if (buffer->nr_cached_transactions > 0)
+ {
+ buffer->nr_cached_transactions--;
+ txn = (ReorderBufferTXN *)
+ dlist_container(ReorderBufferTXN, node,
+ dlist_pop_head_node(&buffer->cached_transactions));
+ }
+ else
+ {
+ txn = (ReorderBufferTXN *)
+ MemoryContextAlloc(buffer->context, sizeof(ReorderBufferTXN));
+ }
+
+ memset(txn, 0, sizeof(ReorderBufferTXN));
+
+ dlist_init(&txn->changes);
+ dlist_init(&txn->tuplecids);
+ dlist_init(&txn->subtxns);
+
+ return txn;
+}
+
+/*
+ * Free an ReorderBufferTXN. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+ /* clean the lookup cache if we were cached (quite likely) */
+ if (buffer->by_txn_last_xid == txn->xid)
+ {
+ buffer->by_txn_last_xid = InvalidTransactionId;
+ buffer->by_txn_last_txn = NULL;
+ }
+
+ if (txn->tuplecid_hash != NULL)
+ {
+ hash_destroy(txn->tuplecid_hash);
+ txn->tuplecid_hash = NULL;
+ }
+
+ if (txn->invalidations)
+ {
+ pfree(txn->invalidations);
+ txn->invalidations = NULL;
+ }
+
+ if (buffer->nr_cached_transactions < max_cached_transactions)
+ {
+ buffer->nr_cached_transactions++;
+ dlist_push_head(&buffer->cached_transactions, &txn->node);
+ }
+ else
+ {
+ pfree(txn);
+ }
+}
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferChange.
+ */
+ReorderBufferChange *
+ReorderBufferGetChange(ReorderBuffer *buffer)
+{
+ ReorderBufferChange *change;
+
+ if (buffer->nr_cached_changes)
+ {
+ buffer->nr_cached_changes--;
+ change = (ReorderBufferChange *)
+ dlist_container(ReorderBufferChange, node,
+ dlist_pop_head_node(&buffer->cached_changes));
+ }
+ else
+ {
+ change = (ReorderBufferChange *)
+ MemoryContextAlloc(buffer->context, sizeof(ReorderBufferChange));
+ }
+
+ memset(change, 0, sizeof(ReorderBufferChange));
+ return change;
+}
+
+/*
+ * Free an ReorderBufferChange. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnChange(ReorderBuffer *buffer, ReorderBufferChange *change)
+{
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ if (change->newtuple)
+ {
+ ReorderBufferReturnTupleBuf(buffer, change->newtuple);
+ change->newtuple = NULL;
+ }
+
+ if (change->oldtuple)
+ {
+ ReorderBufferReturnTupleBuf(buffer, change->oldtuple);
+ change->oldtuple = NULL;
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ if (change->snapshot)
+ {
+ ReorderBufferFreeSnap(buffer, change->snapshot);
+ change->snapshot = NULL;
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ break;
+ }
+
+ if (buffer->nr_cached_changes < max_cached_changes)
+ {
+ buffer->nr_cached_changes++;
+ dlist_push_head(&buffer->cached_changes, &change->node);
+ }
+ else
+ {
+ pfree(change);
+ }
+}
+
+
+/*
+ * Get a unused, possibly preallocated, ReorderBufferTupleBuf
+ */
+ReorderBufferTupleBuf *
+ReorderBufferGetTupleBuf(ReorderBuffer *buffer)
+{
+ ReorderBufferTupleBuf *tuple;
+
+ if (buffer->nr_cached_tuplebufs)
+ {
+ buffer->nr_cached_tuplebufs--;
+ tuple = slist_container(ReorderBufferTupleBuf, node,
+ slist_pop_head_node(&buffer->cached_tuplebufs));
+#ifdef USE_ASSERT_CHECKING
+ memset(tuple, 0xdeadbeef, sizeof(ReorderBufferTupleBuf));
+#endif
+ }
+ else
+ {
+ tuple = (ReorderBufferTupleBuf *)
+ MemoryContextAlloc(buffer->context, sizeof(ReorderBufferTupleBuf));
+ }
+
+ return tuple;
+}
+
+/*
+ * Free an ReorderBufferTupleBuf. Deallocation might be delayed for efficiency
+ * purposes.
+ */
+void
+ReorderBufferReturnTupleBuf(ReorderBuffer *buffer, ReorderBufferTupleBuf *tuple)
+{
+ if (buffer->nr_cached_tuplebufs < max_cached_tuplebufs)
+ {
+ buffer->nr_cached_tuplebufs++;
+ slist_push_head(&buffer->cached_tuplebufs, &tuple->node);
+ }
+ else
+ {
+ pfree(tuple);
+ }
+}
+
+/*
+ * Return the ReorderBufferTXN from the given buffer, specified by Xid.
+ * If create is true, and a transaction doesn't already exist, create it
+ * (with the given LSN, and as top transaction if that's specified);
+ * when this happens, is_new is set to true.
+ */
+static ReorderBufferTXN *
+ReorderBufferTXNByXid(ReorderBuffer *buffer, TransactionId xid, bool create,
+ bool *is_new, XLogRecPtr lsn, bool create_as_top)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferTXNByIdEnt *ent;
+ bool found;
+
+ Assert(!create || lsn != InvalidXLogRecPtr);
+
+ /*
+ * Check the one-entry lookup cache first
+ */
+ if (TransactionIdIsValid(buffer->by_txn_last_xid) &&
+ buffer->by_txn_last_xid == xid)
+ {
+ txn = buffer->by_txn_last_txn;
+
+ if (txn != NULL)
+ {
+ /* found it, and it's valid */
+ if (is_new)
+ *is_new = false;
+ return txn;
+ }
+
+ /*
+ * cached as non-existant, and asked not to create? Then nothing else
+ * to do.
+ */
+ if (!create)
+ return NULL;
+ /* otherwise fall through to create it */
+ }
+
+ /*
+ * If the cache wasn't hit or it yielded an "does-not-exist" and we want
+ * to create an entry.
+ */
+
+ /* search the lookup table */
+ ent = (ReorderBufferTXNByIdEnt *)
+ hash_search(buffer->by_txn,
+ (void *) &xid,
+ create ? HASH_ENTER : HASH_FIND,
+ &found);
+ if (found)
+ txn = ent->txn;
+ else if (create)
+ {
+ /* initialize the new entry, if creation was requested */
+ Assert(ent != NULL);
+
+ ent->txn = ReorderBufferGetTXN(buffer);
+ ent->txn->xid = xid;
+ txn = ent->txn;
+ txn->lsn = lsn;
+ txn->restart_decoding_lsn = buffer->current_restart_decoding_lsn;
+
+ if (create_as_top)
+ {
+ dlist_push_tail(&buffer->toplevel_by_lsn, &txn->node);
+ AssertTXNLsnOrder(buffer);
+ }
+ }
+ else
+ txn = NULL; /* not found and not asked to create */
+
+ /* update cache */
+ buffer->by_txn_last_xid = xid;
+ buffer->by_txn_last_txn = txn;
+
+ if (is_new)
+ *is_new = !found;
+
+ Assert(!create || !!txn);
+ return txn;
+}
+
+/*
+ * Queue a change into a transaction so it can be replayed upon commit.
+ */
+void
+ReorderBufferAddChange(ReorderBuffer *buffer, TransactionId xid, XLogRecPtr lsn,
+ ReorderBufferChange *change)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, true, NULL, lsn, true);
+
+ change->lsn = lsn;
+ Assert(InvalidXLogRecPtr != lsn);
+ dlist_push_tail(&txn->changes, &change->node);
+ txn->nentries++;
+ txn->nentries_mem++;
+
+ ReorderBufferCheckSerializeTXN(buffer, txn);
+}
+
+static void
+AssertTXNLsnOrder(ReorderBuffer *buffer)
+{
+#ifdef USE_ASSERT_CHECKING
+ dlist_iter iter;
+ XLogRecPtr last_lsn = InvalidXLogRecPtr;
+
+ dlist_foreach(iter, &buffer->toplevel_by_lsn)
+ {
+ ReorderBufferTXN *cur_txn;
+
+ cur_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
+ Assert(cur_txn->lsn != InvalidXLogRecPtr);
+
+ if (cur_txn->last_lsn != InvalidXLogRecPtr)
+ Assert(cur_txn->lsn <= cur_txn->last_lsn);
+
+ if (last_lsn != InvalidXLogRecPtr)
+ Assert(last_lsn < cur_txn->lsn);
+
+ Assert(!cur_txn->is_known_as_subxact);
+ last_lsn = cur_txn->lsn;
+ }
+#endif
+}
+
+ReorderBufferTXN *
+ReorderBufferGetOldestTXN(ReorderBuffer *buffer)
+{
+ ReorderBufferTXN *txn;
+
+ if (dlist_is_empty(&buffer->toplevel_by_lsn))
+ return NULL;
+
+ AssertTXNLsnOrder(buffer);
+
+ txn = dlist_head_element(ReorderBufferTXN, node, &buffer->toplevel_by_lsn);
+
+ Assert(!txn->is_known_as_subxact);
+ Assert(txn->lsn != InvalidXLogRecPtr);
+ return txn;
+}
+
+void
+ReorderBufferSetRestartPoint(ReorderBuffer *buffer, XLogRecPtr ptr)
+{
+ buffer->current_restart_decoding_lsn = ptr;
+}
+
+void
+ReorderBufferAssignChild(ReorderBuffer *buffer, TransactionId xid,
+ TransactionId subxid, XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferTXN *subtxn;
+ bool new_sub;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, true, NULL, lsn, true);
+ subtxn = ReorderBufferTXNByXid(buffer, subxid, true, &new_sub, lsn, false);
+
+ if (new_sub)
+ {
+ /*
+ * we assign subtransactions to top level transaction even if we don't
+ * have data for it yet, assignment records frequently reference xids
+ * that have not yet produced any records. Knowing those aren't top
+ * level xids allows us to make processing cheaper in some places.
+ */
+ dlist_push_tail(&txn->subtxns, &subtxn->node);
+ txn->nsubtxns++;
+ }
+ else if (!subtxn->is_known_as_subxact)
+ {
+ subtxn->is_known_as_subxact = true;
+
+ /* remove from lsn order list of top-level transactions */
+ dlist_delete(&subtxn->node);
+
+ /* add to toplevel transaction */
+ dlist_push_tail(&txn->subtxns, &subtxn->node);
+ txn->nsubtxns++;
+ }
+}
+
+/*
+ * Associate a subtransaction with its toplevel transaction at commit
+ * time. There may be no further changes added after this.
+ */
+void
+ReorderBufferCommitChild(ReorderBuffer *buffer, TransactionId xid,
+ TransactionId subxid, XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferTXN *subtxn;
+ bool top_is_new;
+
+ subtxn = ReorderBufferTXNByXid(buffer, subxid, false, NULL,
+ InvalidXLogRecPtr, false);
+
+ /*
+ * No need to do anything if that subtxn didn't contain any changes
+ */
+ if (!subtxn)
+ return;
+
+ /*
+ * FIXME: Using the subtxn lsn as top lsn isn't great (if we're creating)!
+ */
+ txn = ReorderBufferTXNByXid(buffer, xid, true, &top_is_new, lsn, true);
+
+ subtxn->last_lsn = lsn;
+
+ Assert(!top_is_new || !subtxn->is_known_as_subxact);
+
+ if (!subtxn->is_known_as_subxact)
+ {
+ subtxn->is_known_as_subxact = true;
+
+ /* remove from lsn order list of top-level transactions */
+ dlist_delete(&subtxn->node);
+
+ /* add to subtransaction list */
+ dlist_push_tail(&txn->subtxns, &subtxn->node);
+ txn->nsubtxns++;
+ }
+}
+
+
+/*
+ * Support for efficiently iterating over a transaction's and its
+ * subtransactions' changes.
+ *
+ * We do by doing a k-way merge between transactions/subtransactions. For that
+ * we model the current heads of the different transactions as a binary heap so
+ * we easily know which (sub-)transaction has the change with the smallest lsn
+ * next.
+ *
+ * We assume the changes in individual transactions are already sorted by LSN.
+ */
+
+/*
+ * Binary heap comparison function.
+ */
+static int
+ReorderBufferIterCompare(Datum a, Datum b, void *arg)
+{
+ ReorderBufferIterTXNState *state = (ReorderBufferIterTXNState *) arg;
+ XLogRecPtr pos_a = state->entries[DatumGetInt32(a)].lsn;
+ XLogRecPtr pos_b = state->entries[DatumGetInt32(b)].lsn;
+
+ if (pos_a < pos_b)
+ return 1;
+ else if (pos_a == pos_b)
+ return 0;
+ return -1;
+}
+
+/*
+ * Allocate & initialize an iterator which iterates in lsn order over a
+ * transaction and all its subtransactions.
+ */
+static ReorderBufferIterTXNState *
+ReorderBufferIterTXNInit(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+ Size nr_txns = 0;
+ ReorderBufferIterTXNState *state;
+ dlist_iter cur_txn_i;
+ int32 off;
+
+ /*
+ * Calculate the size of our heap: one element for every transaction that
+ * contains changes. (Besides the transactions already in the reorder
+ * buffer, we count the one we were directly passed.)
+ */
+ if (txn->nentries > 0)
+ nr_txns++;
+
+ dlist_foreach(cur_txn_i, &txn->subtxns)
+ {
+ ReorderBufferTXN *cur_txn;
+
+ cur_txn = dlist_container(ReorderBufferTXN, node, cur_txn_i.cur);
+
+ if (cur_txn->nentries > 0)
+ nr_txns++;
+ }
+
+ /*
+ * XXX: Add fastpath for the rather common nr_txns=1 case, no need to
+ * allocate/build a heap in that case.
+ */
+
+ /* allocate iteration state */
+ state = (ReorderBufferIterTXNState *)
+ MemoryContextAllocZero(buffer->context,
+ sizeof(ReorderBufferIterTXNState) +
+ sizeof(ReorderBufferIterTXNEntry) * nr_txns);
+
+ state->nr_txns = nr_txns;
+ dlist_init(&state->old_change);
+
+ for (off = 0; off < state->nr_txns; off++)
+ {
+ state->entries[off].fd = -1;
+ state->entries[off].segno = 0;
+ }
+
+ /* allocate heap */
+ state->heap = binaryheap_allocate(state->nr_txns, ReorderBufferIterCompare,
+ state);
+
+ /*
+ * Now insert items into the binary heap, unordered. (We will run a heap
+ * assembly step at the end; this is more efficient.)
+ */
+
+ off = 0;
+
+ /* add toplevel transaction if it contains changes */
+ if (txn->nentries > 0)
+ {
+ ReorderBufferChange *cur_change;
+
+ if (txn->nentries != txn->nentries_mem)
+ ReorderBufferRestoreChanges(buffer, txn, &state->entries[off].fd,
+ &state->entries[off].segno);
+
+ cur_change = dlist_head_element(ReorderBufferChange, node,
+ &txn->changes);
+
+ state->entries[off].lsn = cur_change->lsn;
+ state->entries[off].change = cur_change;
+ state->entries[off].txn = txn;
+
+ binaryheap_add_unordered(state->heap, Int32GetDatum(off++));
+ }
+
+ /* add subtransactions if they contain changes */
+ dlist_foreach(cur_txn_i, &txn->subtxns)
+ {
+ ReorderBufferTXN *cur_txn;
+
+ cur_txn = dlist_container(ReorderBufferTXN, node, cur_txn_i.cur);
+
+ if (cur_txn->nentries > 0)
+ {
+ ReorderBufferChange *cur_change;
+
+ if (txn->nentries != txn->nentries_mem)
+ ReorderBufferRestoreChanges(buffer, cur_txn,
+ &state->entries[off].fd,
+ &state->entries[off].segno);
+
+ cur_change = dlist_head_element(ReorderBufferChange, node,
+ &cur_txn->changes);
+
+ state->entries[off].lsn = cur_change->lsn;
+ state->entries[off].change = cur_change;
+ state->entries[off].txn = cur_txn;
+
+ binaryheap_add_unordered(state->heap, Int32GetDatum(off++));
+ }
+ }
+
+ /* assemble a valid binary heap */
+ binaryheap_build(state->heap);
+
+ return state;
+}
+
+static void
+ReorderBufferRestoreCleanup(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+ XLogSegNo first;
+ XLogSegNo cur;
+ XLogSegNo last;
+
+ XLByteToSeg(txn->lsn, first);
+ XLByteToSeg(txn->last_lsn, last);
+
+ for (cur = first; cur <= last; cur++)
+ {
+ char path[MAXPGPATH];
+ XLogRecPtr recptr;
+
+ XLogSegNoOffsetToRecPtr(cur, 0, recptr);
+
+ sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+ NameStr(MyLogicalDecodingSlot->name), txn->xid,
+ (uint32) (recptr >> 32), (uint32) recptr);
+ if (unlink(path) != 0 && errno != ENOENT)
+ elog(FATAL, "could not unlink file \"%s\": %m", path);
+ }
+}
+
+/*
+ * Return the next change when iterating over a transaction and its
+ * subtransaction.
+ *
+ * Returns NULL when no further changes exist.
+ */
+static ReorderBufferChange *
+ReorderBufferIterTXNNext(ReorderBuffer *buffer, ReorderBufferIterTXNState *state)
+{
+ ReorderBufferChange *change;
+ ReorderBufferIterTXNEntry *entry;
+ int32 off;
+
+ /* nothing there anymore */
+ if (state->heap->bh_size == 0)
+ return NULL;
+
+ off = DatumGetInt32(binaryheap_first(state->heap));
+ entry = &state->entries[off];
+
+ if (!dlist_is_empty(&entry->txn->subtxns))
+ elog(LOG, "tx with subtxn %u", entry->txn->xid);
+
+ /* free memory we might have "leaked" in the previous *Next call */
+ if (!dlist_is_empty(&state->old_change))
+ {
+ change = dlist_container(ReorderBufferChange, node,
+ dlist_pop_head_node(&state->old_change));
+ ReorderBufferReturnChange(buffer, change);
+ Assert(dlist_is_empty(&state->old_change));
+ }
+
+ change = entry->change;
+
+ /*
+ * update heap with information about which transaction has the next
+ * relevant change in LSN order
+ */
+
+ /* there are in-memory changes */
+ if (dlist_has_next(&entry->txn->changes, &entry->change->node))
+ {
+ dlist_node *next = dlist_next_node(&entry->txn->changes, &change->node);
+ ReorderBufferChange *next_change =
+ dlist_container(ReorderBufferChange, node, next);
+
+ /* txn stays the same */
+ state->entries[off].lsn = next_change->lsn;
+ state->entries[off].change = next_change;
+
+ binaryheap_replace_first(state->heap, Int32GetDatum(off));
+ return change;
+ }
+
+ /* try to load changes from disk */
+ if (entry->txn->nentries != entry->txn->nentries_mem)
+ {
+ /*
+ * Ugly: restoring changes will reuse *Change records, thus delete the
+ * current one from the per-tx list and only free in the next call.
+ */
+ dlist_delete(&change->node);
+ dlist_push_tail(&state->old_change, &change->node);
+
+ if (ReorderBufferRestoreChanges(buffer, entry->txn, &entry->fd,
+ &state->entries[off].segno))
+ {
+ /* successfully restored changes from disk */
+ ReorderBufferChange *next_change =
+ dlist_head_element(ReorderBufferChange, node,
+ &entry->txn->changes);
+
+ elog(DEBUG2, "restored %zu/%zu changes from disk",
+ entry->txn->nentries_mem, entry->txn->nentries);
+ Assert(entry->txn->nentries_mem);
+ /* txn stays the same */
+ state->entries[off].lsn = next_change->lsn;
+ state->entries[off].change = next_change;
+ binaryheap_replace_first(state->heap, Int32GetDatum(off));
+
+ return change;
+ }
+ }
+
+ /* ok, no changes there anymore, remove */
+ binaryheap_remove_first(state->heap);
+
+ return change;
+}
+
+/*
+ * Deallocate the iterator
+ */
+static void
+ReorderBufferIterTXNFinish(ReorderBuffer *buffer,
+ ReorderBufferIterTXNState *state)
+{
+ int32 off;
+
+ for (off = 0; off < state->nr_txns; off++)
+ {
+ if (state->entries[off].fd != -1)
+ CloseTransientFile(state->entries[off].fd);
+ }
+
+ /* free memory we might have "leaked" in the last *Next call */
+ if (!dlist_is_empty(&state->old_change))
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node,
+ dlist_pop_head_node(&state->old_change));
+ ReorderBufferReturnChange(buffer, change);
+ Assert(dlist_is_empty(&state->old_change));
+ }
+
+ binaryheap_free(state->heap);
+ pfree(state);
+}
+
+/*
+ * Cleanup the contents of a transaction, usually after the transaction
+ * committed or aborted.
+ */
+static void
+ReorderBufferCleanupTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+ bool found;
+ dlist_mutable_iter iter;
+
+ /* cleanup subtransactions & their changes */
+ dlist_foreach_modify(iter, &txn->subtxns)
+ {
+ ReorderBufferTXN *subtxn;
+
+ subtxn = dlist_container(ReorderBufferTXN, node, iter.cur);
+ Assert(subtxn->is_known_as_subxact);
+
+ /*
+ * subtransactions are always associated to the toplevel TXN, even if
+ * they originally were happening inside another subtxn, so we won't
+ * ever recurse more than one level here.
+ */
+ ReorderBufferCleanupTXN(buffer, subtxn);
+ }
+
+ /* cleanup changes in the toplevel txn */
+ dlist_foreach_modify(iter, &txn->changes)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ ReorderBufferReturnChange(buffer, change);
+ }
+
+ /*
+ * cleanup the tuplecids we stored timetravel access. They are always
+ * stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+ Assert(change->action_internal == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+ ReorderBufferReturnChange(buffer, change);
+ }
+
+ if (txn->base_snapshot != NULL)
+ {
+ SnapBuildSnapDecRefcount(txn->base_snapshot);
+ txn->base_snapshot = NULL;
+ }
+
+ /* delete from LSN ordered list of toplevel TXNs */
+ if (!txn->is_known_as_subxact)
+ dlist_delete(&txn->node);
+
+ /* now remove reference from buffer */
+ hash_search(buffer->by_txn,
+ (void *) &txn->xid,
+ HASH_REMOVE,
+ &found);
+ Assert(found);
+
+ /* remove entries spilled to disk */
+ if (txn->nentries != txn->nentries_mem)
+ ReorderBufferRestoreCleanup(buffer, txn);
+
+ /* deallocate */
+ ReorderBufferReturnTXN(buffer, txn);
+}
+
+/*
+ * Build a hash with a (relfilenode, ctid) -> (cmin, cmax) mapping for use by
+ * tqual.c's HeapTupleSatisfiesMVCCDuringDecoding.
+ */
+static void
+ReorderBufferBuildTupleCidHash(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+ dlist_iter iter;
+ HASHCTL hash_ctl;
+
+ if (!txn->does_timetravel || dlist_is_empty(&txn->tuplecids))
+ return;
+
+ memset(&hash_ctl, 0, sizeof(hash_ctl));
+
+ hash_ctl.keysize = sizeof(ReorderBufferTupleCidKey);
+ hash_ctl.entrysize = sizeof(ReorderBufferTupleCidEnt);
+ hash_ctl.hash = tag_hash;
+ hash_ctl.hcxt = buffer->context;
+
+ /*
+ * create the hash with the exact number of to-be-stored tuplecids from
+ * the start
+ */
+ txn->tuplecid_hash =
+ hash_create("ReorderBufferTupleCid", txn->ntuplecids, &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+
+ dlist_foreach(iter, &txn->tuplecids)
+ {
+ ReorderBufferTupleCidKey key;
+ ReorderBufferTupleCidEnt *ent;
+ bool found;
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ Assert(change->action_internal == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* be careful about padding */
+ memset(&key, 0, sizeof(ReorderBufferTupleCidKey));
+
+ key.relnode = change->tuplecid.node;
+
+ ItemPointerCopy(&change->tuplecid.tid,
+ &key.tid);
+
+ ent = (ReorderBufferTupleCidEnt *)
+ hash_search(txn->tuplecid_hash,
+ (void *) &key,
+ HASH_ENTER | HASH_FIND,
+ &found);
+ if (!found)
+ {
+ ent->cmin = change->tuplecid.cmin;
+ ent->cmax = change->tuplecid.cmax;
+ ent->combocid = change->tuplecid.combocid;
+ }
+ else
+ {
+ Assert(ent->cmin == change->tuplecid.cmin);
+ Assert(ent->cmax == InvalidCommandId ||
+ ent->cmax == change->tuplecid.cmax);
+
+ /*
+ * if the tuple got valid in this transaction and now got deleted
+ * we already have a valid cmin stored. The cmax will be
+ * InvalidCommandId though.
+ */
+ ent->cmax = change->tuplecid.cmax;
+ }
+ }
+}
+
+/*
+ * Copy a provided snapshot so we can modify it privately. This is needed so
+ * that catalog modifying transactions can look into intermediate catalog
+ * states.
+ */
+static Snapshot
+ReorderBufferCopySnap(ReorderBuffer *buffer, Snapshot orig_snap,
+ ReorderBufferTXN *txn, CommandId cid)
+{
+ Snapshot snap;
+ dlist_iter iter;
+ int i = 0;
+ Size size;
+
+ size = sizeof(SnapshotData) +
+ sizeof(TransactionId) * orig_snap->xcnt +
+ sizeof(TransactionId) * (txn->nsubtxns + 1);
+
+ elog(DEBUG1, "copying a non-transaction-specific snapshot into timetravel tx %u", txn->xid);
+
+ snap = MemoryContextAllocZero(buffer->context, size);
+ memcpy(snap, orig_snap, sizeof(SnapshotData));
+
+ snap->copied = true;
+ snap->active_count = 0;
+ snap->regd_count = 0;
+ snap->xip = (TransactionId *) (snap + 1);
+
+ memcpy(snap->xip, orig_snap->xip, sizeof(TransactionId) * snap->xcnt);
+
+ /*
+ * ->subxip contains all txids that belong to our transaction which we
+ * need to check via cmin/cmax. Thats why we store the toplevel
+ * transaction in there as well.
+ */
+ snap->subxip = snap->xip + snap->xcnt;
+ snap->subxip[i++] = txn->xid;
+ snap->subxcnt = txn->nsubtxns + 1;
+
+ dlist_foreach(iter, &txn->subtxns)
+ {
+ ReorderBufferTXN *sub_txn;
+
+ sub_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
+ snap->subxip[i++] = sub_txn->xid;
+ }
+
+ /* sort so we can bsearch() later */
+ qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);
+
+ /* store the specified current CommandId */
+ snap->curcid = cid;
+
+ return snap;
+}
+
+/*
+ * Free a previously ReorderBufferCopySnap'ed snapshot
+ */
+static void
+ReorderBufferFreeSnap(ReorderBuffer *buffer, Snapshot snap)
+{
+ if (snap->copied)
+ pfree(snap);
+ else
+ SnapBuildSnapDecRefcount(snap);
+}
+
+/*
+ * Commit a transaction and replay all actions that previously have been
+ * ReorderBufferAddChange'd in the toplevel TX or any of the subtransactions
+ * assigned via ReorderBufferCommitChild.
+ */
+void
+ReorderBufferCommit(ReorderBuffer *buffer, TransactionId xid, XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+ ReorderBufferIterTXNState *iterstate = NULL;
+ ReorderBufferChange *change;
+ CommandId command_id = FirstCommandId;
+ Snapshot snapshot_now;
+ Relation relation = NULL;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* empty transaction */
+ if (!txn)
+ return;
+
+ txn->last_lsn = lsn;
+
+ /* serialize the last bunch of changes if we need start earlier anyway */
+ if (txn->nentries_mem != txn->nentries)
+ ReorderBufferSerializeTXN(buffer, txn);
+
+ /*
+ * If this transaction didn't have any real changes in our database, it's
+ * OK not to have a snapshot.
+ */
+ if (txn->base_snapshot == NULL)
+ return;
+
+ snapshot_now = txn->base_snapshot;
+
+ ReorderBufferBuildTupleCidHash(buffer, txn);
+
+ /* setup initial snapshot */
+ SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+
+ PG_TRY();
+ {
+ buffer->begin(buffer, txn);
+
+ iterstate = ReorderBufferIterTXNInit(buffer, txn);
+ while ((change = ReorderBufferIterTXNNext(buffer, iterstate)))
+ {
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ Assert(snapshot_now);
+
+ relation = LookupRelationByRelFileNode(&change->relnode);
+
+ /*
+ * catalog tuple without data, while catalog has been
+ * rewritten
+ */
+ if (relation == NULL &&
+ change->newtuple == NULL && change->oldtuple == NULL)
+ {
+ continue;
+ }
+ else if (relation == NULL)
+ {
+ elog(ERROR, "could not lookup relation %s",
+ relpathperm(change->relnode, MAIN_FORKNUM));
+ }
+
+ if (RelationIsLogicallyLogged(relation))
+ {
+ /* user-triggered change */
+ if (relation->rd_rel->relkind == RELKIND_SEQUENCE)
+ {
+ }
+ else if (!IsToastRelation(relation))
+ {
+ ReorderBufferToastReplace(buffer, txn, relation, change);
+ buffer->apply_change(buffer, txn, relation, change);
+ ReorderBufferToastReset(buffer, txn);
+ }
+ /* we're not interested in toast deletions */
+ else if (change->action == REORDER_BUFFER_CHANGE_INSERT)
+ {
+ /*
+ * need to reassemble change in memory, ensure it
+ * doesn't get reused till we're done.
+ */
+ dlist_delete(&change->node);
+ ReorderBufferToastAppendChunk(buffer, txn, relation,
+ change);
+ }
+
+ }
+ RelationClose(relation);
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ /* XXX: we could skip snapshots in non toplevel txns */
+
+ /* get rid of the old */
+ RevertFromDecodingSnapshots();
+
+ if (snapshot_now->copied)
+ {
+ ReorderBufferFreeSnap(buffer, snapshot_now);
+ snapshot_now =
+ ReorderBufferCopySnap(buffer, change->snapshot,
+ txn, command_id);
+ }
+
+ /*
+ * restored from disk, we need to be careful not to double
+ * free. We could introduce refcounting for that, but for
+ * now this seems infrequent enough not to care.
+ */
+ else if (change->snapshot->copied)
+ {
+ snapshot_now =
+ ReorderBufferCopySnap(buffer, change->snapshot,
+ txn, command_id);
+ }
+ else
+ {
+ snapshot_now = change->snapshot;
+ }
+
+
+ /* and start with the new one */
+ SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+ break;
+
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ if (!snapshot_now->copied)
+ {
+ /* we don't use the global one anymore */
+ snapshot_now = ReorderBufferCopySnap(buffer, snapshot_now,
+ txn, command_id);
+ }
+
+ command_id = Max(command_id, change->command_id);
+
+ if (command_id != InvalidCommandId)
+ {
+ snapshot_now->curcid = command_id;
+
+ RevertFromDecodingSnapshots();
+ SetupDecodingSnapshots(snapshot_now, txn->tuplecid_hash);
+ }
+
+ /*
+ * everytime the CommandId is incremented, we could see
+ * new catalog contents
+ */
+ ReorderBufferExecuteInvalidations(buffer, txn);
+
+ break;
+
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ elog(ERROR, "tuplecid value in normal queue");
+ break;
+ }
+ }
+
+ ReorderBufferIterTXNFinish(buffer, iterstate);
+
+ /* call commit callback */
+ buffer->commit(buffer, txn, lsn);
+
+
+ ResourceOwnerRelease(CurrentResourceOwner,
+ RESOURCE_RELEASE_BEFORE_LOCKS,
+ true, true);
+
+ AtEOXact_RelationCache(true);
+
+ ResourceOwnerRelease(CurrentResourceOwner,
+ RESOURCE_RELEASE_LOCKS,
+ true, true);
+
+ ResourceOwnerRelease(CurrentResourceOwner,
+ RESOURCE_RELEASE_AFTER_LOCKS,
+ true, true);
+
+ /* cleanup */
+ RevertFromDecodingSnapshots();
+
+ ReorderBufferExecuteInvalidations(buffer, txn);
+
+ if (snapshot_now->copied)
+ ReorderBufferFreeSnap(buffer, snapshot_now);
+
+ ReorderBufferCleanupTXN(buffer, txn);
+ }
+ PG_CATCH();
+ {
+ if (iterstate)
+ ReorderBufferIterTXNFinish(buffer, iterstate);
+
+ RevertFromDecodingSnapshots();
+
+ /* XXX: more cleanup needed */
+
+ if (snapshot_now->copied)
+ ReorderBufferFreeSnap(buffer, snapshot_now);
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Abort a transaction that possibly has previous changes. Needs to be done
+ * independently for toplevel and subtransactions.
+ */
+void
+ReorderBufferAbort(ReorderBuffer *buffer, TransactionId xid, XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* no changes in this commit */
+ if (!txn)
+ return;
+
+ txn->last_lsn = lsn;
+
+ ReorderBufferCleanupTXN(buffer, txn);
+}
+
+/*
+ * Check whether a transaction is already known in this module
+ */
+bool
+ReorderBufferIsXidKnown(ReorderBuffer *buffer, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+ return txn != NULL;
+}
+
+/*
+ * Add a new snapshot to this transaction that is only used after lsn 'lsn'.
+ */
+void
+ReorderBufferAddSnapshot(ReorderBuffer *buffer, TransactionId xid,
+ XLogRecPtr lsn, Snapshot snap)
+{
+ ReorderBufferChange *change = ReorderBufferGetChange(buffer);
+
+ change->snapshot = snap;
+ change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT;
+
+ ReorderBufferAddChange(buffer, xid, lsn, change);
+}
+
+/*
+ * Setup the base snapshot of a transaction. That is the snapshot that is used
+ * to decode all changes until either this transaction modifies the catalog or
+ * another catalog modifying transaction commits.
+ */
+void
+ReorderBufferSetBaseSnapshot(ReorderBuffer *buffer, TransactionId xid,
+ XLogRecPtr lsn, Snapshot snap)
+{
+ ReorderBufferTXN *txn;
+ bool is_new;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, true, &is_new, lsn, true);
+ Assert(txn->base_snapshot == NULL);
+
+ txn->base_snapshot = snap;
+}
+
+/*
+ * Access the catalog with this CommandId at this point in the changestream.
+ *
+ * May only be called for command ids > 1
+ */
+void
+ReorderBufferAddNewCommandId(ReorderBuffer *buffer, TransactionId xid,
+ XLogRecPtr lsn, CommandId cid)
+{
+ ReorderBufferChange *change = ReorderBufferGetChange(buffer);
+
+ change->command_id = cid;
+ change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID;
+
+ ReorderBufferAddChange(buffer, xid, lsn, change);
+}
+
+
+/*
+ * Add new (relfilenode, tid) -> (cmin, cmax) mappings.
+ */
+void
+ReorderBufferAddNewTupleCids(ReorderBuffer *buffer, TransactionId xid,
+ XLogRecPtr lsn, RelFileNode node,
+ ItemPointerData tid, CommandId cmin,
+ CommandId cmax, CommandId combocid)
+{
+ ReorderBufferChange *change = ReorderBufferGetChange(buffer);
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, true, NULL, lsn, true);
+
+ change->tuplecid.node = node;
+ change->tuplecid.tid = tid;
+ change->tuplecid.cmin = cmin;
+ change->tuplecid.cmax = cmax;
+ change->tuplecid.combocid = combocid;
+ change->lsn = lsn;
+ change->action_internal = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
+
+ dlist_push_tail(&txn->tuplecids, &change->node);
+ txn->ntuplecids++;
+}
+
+/*
+ * Setup the invalidation of the toplevel transaction.
+ *
+ * This needs to be done before ReorderBufferCommit is called!
+ */
+void
+ReorderBufferAddInvalidations(ReorderBuffer *buffer, TransactionId xid,
+ XLogRecPtr lsn, Size nmsgs,
+ SharedInvalidationMessage *msgs)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, true, NULL, lsn, true);
+
+ if (txn->ninvalidations != 0)
+ elog(ERROR, "only ever add one set of invalidations");
+
+ txn->ninvalidations = nmsgs;
+ txn->invalidations = (SharedInvalidationMessage *)
+ MemoryContextAlloc(buffer->context,
+ sizeof(SharedInvalidationMessage) * nmsgs);
+ memcpy(txn->invalidations, msgs, sizeof(SharedInvalidationMessage) * nmsgs);
+}
+
+/*
+ * Apply all invalidations we know. Possibly we only need parts at this point
+ * in the changestream but we don't know which those are.
+ */
+static void
+ReorderBufferExecuteInvalidations(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+ int i;
+
+ for (i = 0; i < txn->ninvalidations; i++)
+ LocalExecuteInvalidationMessage(&txn->invalidations[i]);
+}
+
+/*
+ * Mark a transaction as doing timetravel.
+ */
+void
+ReorderBufferXidSetTimetravel(ReorderBuffer *buffer, TransactionId xid,
+ XLogRecPtr lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, true, NULL, lsn, true);
+
+ txn->does_timetravel = true;
+}
+
+/*
+ * Query whether a transaction is already *known* to be doing timetravel. This
+ * can be wrong until directly before the commit!
+ */
+bool
+ReorderBufferXidDoesTimetravel(ReorderBuffer *buffer, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+ if (!txn)
+ return false;
+
+ return txn->does_timetravel;
+}
+
+/*
+ * Have we already added the first snapshot?
+ */
+bool
+ReorderBufferXidHasBaseSnapshot(ReorderBuffer *buffer, TransactionId xid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(buffer, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ if (!txn)
+ return false;
+ return txn->base_snapshot != NULL;
+}
+
+static void
+ReorderBufferSerializeReserve(ReorderBuffer *buffer, Size sz)
+{
+ if (!buffer->outbufsize)
+ {
+ buffer->outbuf = MemoryContextAlloc(buffer->context, sz);
+ buffer->outbufsize = sz;
+ }
+ else if (buffer->outbufsize < sz)
+ {
+ buffer->outbuf = repalloc(buffer->outbuf, sz);
+ buffer->outbufsize = sz;
+ }
+}
+
+typedef struct ReorderBufferDiskChange
+{
+ Size size;
+ ReorderBufferChange change;
+ /* data follows */
+} ReorderBufferDiskChange;
+
+/*
+ * Persistency support
+ */
+static void
+ReorderBufferSerializeChange(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+ int fd, ReorderBufferChange *change)
+{
+ ReorderBufferDiskChange *ondisk;
+ Size sz = sizeof(ReorderBufferDiskChange);
+
+ ReorderBufferSerializeReserve(buffer, sz);
+
+ ondisk = (ReorderBufferDiskChange *) buffer->outbuf;
+ memcpy(&ondisk->change, change, sizeof(ReorderBufferChange));
+
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ {
+ char *data;
+ Size oldlen = 0;
+ Size newlen = 0;
+
+ if (change->oldtuple)
+ oldlen = offsetof(ReorderBufferTupleBuf, data)
+ +change->oldtuple->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ if (change->newtuple)
+ newlen = offsetof(ReorderBufferTupleBuf, data)
+ +change->newtuple->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ sz += oldlen;
+ sz += newlen;
+
+ /* make sure we have enough space */
+ ReorderBufferSerializeReserve(buffer, sz);
+
+ data = ((char *) buffer->outbuf) + sizeof(ReorderBufferDiskChange);
+ /* might have been reallocated above */
+ ondisk = (ReorderBufferDiskChange *) buffer->outbuf;
+
+ if (oldlen)
+ {
+ memcpy(data, change->oldtuple, oldlen);
+ data += oldlen;
+ Assert(&change->oldtuple->header == change->oldtuple->tuple.t_data);
+ }
+
+ if (newlen)
+ {
+ memcpy(data, change->newtuple, newlen);
+ data += newlen;
+ Assert(&change->newtuple->header == change->newtuple->tuple.t_data);
+ }
+ break;
+ }
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ {
+ char *data;
+
+ sz += sizeof(SnapshotData) +
+ sizeof(TransactionId) * change->snapshot->xcnt +
+ sizeof(TransactionId) * change->snapshot->subxcnt
+ ;
+
+ /* make sure we have enough space */
+ ReorderBufferSerializeReserve(buffer, sz);
+ data = ((char *) buffer->outbuf) + sizeof(ReorderBufferDiskChange);
+ /* might have been reallocated above */
+ ondisk = (ReorderBufferDiskChange *) buffer->outbuf;
+
+ memcpy(data, change->snapshot, sizeof(SnapshotData));
+ data += sizeof(SnapshotData);
+
+ if (change->snapshot->xcnt)
+ {
+ memcpy(data, change->snapshot->xip,
+ sizeof(TransactionId) + change->snapshot->xcnt);
+ data += sizeof(TransactionId) + change->snapshot->xcnt;
+ }
+
+ if (change->snapshot->subxcnt)
+ {
+ memcpy(data, change->snapshot->subxip,
+ sizeof(TransactionId) + change->snapshot->subxcnt);
+ data += sizeof(TransactionId) + change->snapshot->subxcnt;
+ }
+ break;
+ }
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ /* ReorderBufferChange contains everything important */
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ /* ReorderBufferChange contains everything important */
+ break;
+ }
+
+ ondisk->size = sz;
+
+ if (write(fd, buffer->outbuf, ondisk->size) != ondisk->size)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to xid data file \"%u\": %m",
+ txn->xid)));
+ }
+
+ Assert(ondisk->change.action_internal == change->action_internal);
+}
+
+static void
+ReorderBufferCheckSerializeTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+ /* FIXME subtxn handling? */
+ if (txn->nentries_mem >= max_memtries)
+ {
+ ReorderBufferSerializeTXN(buffer, txn);
+ Assert(txn->nentries_mem == 0);
+ }
+}
+
+static void
+ReorderBufferSerializeTXN(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+ dlist_iter subtxn_i;
+ dlist_mutable_iter change_i;
+ int fd = -1;
+ XLogSegNo curOpenSegNo = 0;
+ Size spilled = 0;
+ char path[MAXPGPATH];
+
+ elog(DEBUG2, "spill %zu transactions in tx %u to disk",
+ txn->nentries_mem, txn->xid);
+
+ /* do the same to all child TXs */
+ dlist_foreach(subtxn_i, &txn->subtxns)
+ {
+ ReorderBufferTXN *subtxn;
+
+ subtxn = dlist_container(ReorderBufferTXN, node, subtxn_i.cur);
+ ReorderBufferSerializeTXN(buffer, subtxn);
+ }
+
+ /* serialize changestream */
+ dlist_foreach_modify(change_i, &txn->changes)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, change_i.cur);
+
+ /*
+ * store in segment in which it belongs by start lsn, don't split over
+ * multiple segments tho
+ */
+ if (fd == -1 || XLByteInSeg(change->lsn, curOpenSegNo))
+ {
+ XLogRecPtr recptr;
+
+ if (fd != -1)
+ CloseTransientFile(fd);
+
+ XLByteToSeg(change->lsn, curOpenSegNo);
+ XLogSegNoOffsetToRecPtr(curOpenSegNo, 0, recptr);
+
+ sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+ NameStr(MyLogicalDecodingSlot->name), txn->xid,
+ (uint32) (recptr >> 32), (uint32) recptr);
+
+ /* open segment, create it if necessary */
+ fd = OpenTransientFile(path,
+ O_CREAT | O_WRONLY | O_APPEND | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+
+ if (fd < 0)
+ ereport(ERROR, (errmsg("could not open reorderbuffer file %s for writing: %m", path)));
+ }
+
+ ReorderBufferSerializeChange(buffer, txn, fd, change);
+ dlist_delete(&change->node);
+ ReorderBufferReturnChange(buffer, change);
+
+ spilled++;
+ }
+
+ Assert(spilled == txn->nentries_mem);
+ Assert(dlist_is_empty(&txn->changes));
+ txn->nentries_mem = 0;
+
+ if (fd != -1)
+ CloseTransientFile(fd);
+
+ /* issue write barrier */
+ /* serialize main transaction state */
+}
+
+static Size
+ReorderBufferRestoreChanges(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+ int *fd, XLogSegNo *segno)
+{
+ Size restored = 0;
+ XLogSegNo last_segno;
+ dlist_mutable_iter cleanup_iter;
+
+ Assert(txn->lsn != InvalidXLogRecPtr);
+ Assert(txn->last_lsn != InvalidXLogRecPtr);
+
+ /* free current entries, so we have memory for more */
+ dlist_foreach_modify(cleanup_iter, &txn->changes)
+ {
+ ReorderBufferChange *cleanup =
+ dlist_container(ReorderBufferChange, node, cleanup_iter.cur);
+
+ dlist_delete(&cleanup->node);
+ ReorderBufferReturnChange(buffer, cleanup);
+ }
+ txn->nentries_mem = 0;
+ Assert(dlist_is_empty(&txn->changes));
+
+ XLByteToSeg(txn->last_lsn, last_segno);
+
+ while (restored < max_memtries && *segno <= last_segno)
+ {
+ int readBytes;
+ ReorderBufferDiskChange *ondisk;
+
+ if (*fd == -1)
+ {
+ XLogRecPtr recptr;
+ char path[MAXPGPATH];
+
+ /* first time in */
+ if (*segno == 0)
+ {
+ XLByteToSeg(txn->lsn, *segno);
+ elog(LOG, "initial restoring from %zu to %zu",
+ *segno, last_segno);
+ }
+
+ Assert(*segno != 0 || dlist_is_empty(&txn->changes));
+ XLogSegNoOffsetToRecPtr(*segno, 0, recptr);
+
+ sprintf(path, "pg_llog/%s/xid-%u-lsn-%X-%X.snap",
+ NameStr(MyLogicalDecodingSlot->name), txn->xid,
+ (uint32) (recptr >> 32), (uint32) recptr);
+
+ elog(LOG, "opening file %s", path);
+
+ *fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+ if (*fd < 0 && errno == ENOENT)
+ {
+ *fd = -1;
+ (*segno)++;
+ continue;
+ }
+ else if (*fd < 0)
+ ereport(ERROR, (errmsg("could not open reorderbuffer file %s for reading: %m", path)));
+
+ }
+
+ ReorderBufferSerializeReserve(buffer, sizeof(ReorderBufferDiskChange));
+
+
+ /*
+ * read the statically sized part of a change which has information
+ * about the total size. If we couldn't read a record, we're at the
+ * end of this file.
+ */
+
+ readBytes = read(*fd, buffer->outbuf, sizeof(ReorderBufferDiskChange));
+
+ /* eof */
+ if (readBytes == 0)
+ {
+ CloseTransientFile(*fd);
+ *fd = -1;
+ (*segno)++;
+ continue;
+ }
+ else if (readBytes < 0)
+ elog(ERROR, "read failed: %m");
+ else if (readBytes != sizeof(ReorderBufferDiskChange))
+ elog(ERROR, "incomplete read, read %d instead of %zu",
+ readBytes, sizeof(ReorderBufferDiskChange));
+
+ ondisk = (ReorderBufferDiskChange *) buffer->outbuf;
+
+ ReorderBufferSerializeReserve(buffer, sizeof(ReorderBufferDiskChange) + ondisk->size);
+ ondisk = (ReorderBufferDiskChange *) buffer->outbuf;
+
+ readBytes = read(*fd, buffer->outbuf + sizeof(ReorderBufferDiskChange),
+ ondisk->size - sizeof(ReorderBufferDiskChange));
+
+ if (readBytes < 0)
+ elog(ERROR, "read2 failed: %m");
+ else if (readBytes != ondisk->size - sizeof(ReorderBufferDiskChange))
+ elog(ERROR, "incomplete read2, read %d instead of %zu",
+ readBytes, ondisk->size - sizeof(ReorderBufferDiskChange));
+
+ /*
+ * ok, read a full change from disk, now restore it into proper
+ * in-memory format
+ */
+ ReorderBufferRestoreChange(buffer, txn, buffer->outbuf);
+ restored++;
+ }
+
+ return restored;
+}
+
+/*
+ * Convert change from its on-disk format to in-memory format and queue it onto
+ * the TXN's ->changes list.
+ */
+static void
+ReorderBufferRestoreChange(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+ char *data)
+{
+ ReorderBufferDiskChange *ondisk;
+ ReorderBufferChange *change;
+
+ ondisk = (ReorderBufferDiskChange *) data;
+
+ change = ReorderBufferGetChange(buffer);
+
+ /* copy static part */
+ memcpy(change, &ondisk->change, sizeof(ReorderBufferChange));
+
+ data += sizeof(ReorderBufferDiskChange);
+
+ /* restore individual stuff */
+ switch ((ReorderBufferChangeTypeInternal) change->action_internal)
+ {
+ case REORDER_BUFFER_CHANGE_INTERNAL_INSERT:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_UPDATE:
+ /* fall through */
+ case REORDER_BUFFER_CHANGE_INTERNAL_DELETE:
+ if (change->newtuple)
+ {
+ Size len = offsetof(ReorderBufferTupleBuf, data)
+ +((ReorderBufferTupleBuf *) data)->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ change->newtuple = ReorderBufferGetTupleBuf(buffer);
+ memcpy(change->newtuple, data, len);
+ change->newtuple->tuple.t_data = &change->newtuple->header;
+
+ data += len;
+ }
+
+ if (change->oldtuple)
+ {
+ Size len = offsetof(ReorderBufferTupleBuf, data)
+ +((ReorderBufferTupleBuf *) data)->tuple.t_len
+ - offsetof(HeapTupleHeaderData, t_bits);
+
+ change->oldtuple = ReorderBufferGetTupleBuf(buffer);
+ memcpy(change->oldtuple, data, len);
+ change->oldtuple->tuple.t_data = &change->oldtuple->header;
+ data += len;
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ {
+ Snapshot oldsnap = (Snapshot) data;
+ Size size = sizeof(SnapshotData) +
+ sizeof(TransactionId) * oldsnap->xcnt +
+ sizeof(TransactionId) * (oldsnap->subxcnt + 0)
+ ;
+
+ Assert(change->snapshot != NULL);
+
+ change->snapshot = MemoryContextAllocZero(buffer->context, size);
+
+ memcpy(change->snapshot, data, size);
+ change->snapshot->xip = (TransactionId *)
+ (((char *) change->snapshot) + sizeof(SnapshotData));
+ change->snapshot->subxip =
+ change->snapshot->xip + change->snapshot->xcnt + 0;
+ change->snapshot->copied = true;
+ break;
+ }
+ /* nothing needs to be done */
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ break;
+ }
+
+ dlist_push_tail(&txn->changes, &change->node);
+ txn->nentries_mem++;
+}
+
+/*
+ * Delete all data spilled to disk.
+ */
+void
+ReorderBufferStartup(void)
+{
+ DIR *logical_dir;
+ struct dirent *logical_de;
+
+ DIR *spill_dir;
+ struct dirent *spill_de;
+
+ logical_dir = AllocateDir("pg_llog");
+ while ((logical_de = ReadDir(logical_dir, "pg_llog")) != NULL)
+ {
+ char path[MAXPGPATH];
+
+ if (strcmp(logical_de->d_name, ".") == 0 ||
+ strcmp(logical_de->d_name, "..") == 0)
+ continue;
+
+ /* one of our own directories */
+ if (strcmp(logical_de->d_name, "snapshots") == 0)
+ continue;
+
+ /*
+ * ok, has to be a surviving logical slot, iterate and delete
+ * everythign starting with xid-*
+ */
+ sprintf(path, "pg_llog/%s", logical_de->d_name);
+
+ spill_dir = AllocateDir(path);
+ while ((spill_de = ReadDir(spill_dir, "pg_llog")) != NULL)
+ {
+ if (strcmp(spill_de->d_name, ".") == 0 ||
+ strcmp(spill_de->d_name, "..") == 0)
+ continue;
+
+ if (strncmp(spill_de->d_name, "xid", 3) == 0)
+ {
+ sprintf(path, "pg_llog/%s/%s", logical_de->d_name,
+ spill_de->d_name);
+
+ if (unlink(path) != 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not remove xid data file \"%s\": %m",
+ path)));
+ }
+ /* XXX: WARN? */
+ }
+ FreeDir(spill_dir);
+ }
+ FreeDir(logical_dir);
+}
+
+/*
+ * toast support
+ */
+
+/*
+ * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h?
+ */
+#define VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr) \
+do { \
+ varattrib_1b_e *attre = (varattrib_1b_e *) (attr); \
+ Assert(VARATT_IS_EXTERNAL(attre)); \
+ Assert(VARSIZE_EXTERNAL(attre) == sizeof(toast_pointer) + VARHDRSZ_EXTERNAL); \
+ memcpy(&(toast_pointer), VARDATA_EXTERNAL(attre), sizeof(toast_pointer)); \
+} while (0)
+
+#define VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) \
+ ((toast_pointer).va_extsize < (toast_pointer).va_rawsize - VARHDRSZ)
+
+/*
+ * Initialize per tuple toast reconstruction support.
+ */
+static void
+ReorderBufferToastInitHash(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+ HASHCTL hash_ctl;
+
+ Assert(txn->toast_hash == NULL);
+
+ memset(&hash_ctl, 0, sizeof(hash_ctl));
+ hash_ctl.keysize = sizeof(Oid);
+ hash_ctl.entrysize = sizeof(ReorderBufferToastEnt);
+ hash_ctl.hash = tag_hash;
+ hash_ctl.hcxt = buffer->context;
+ txn->toast_hash = hash_create("ReorderBufferToastHash", 5, &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+}
+
+/*
+ * Per toast-chunk handling for toast reconstruction
+ *
+ * Appends a toast chunk so we can reconstruct it when the tuple "owning" the
+ * toasted Datum comes along.
+ */
+static void
+ReorderBufferToastAppendChunk(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ ReorderBufferToastEnt *ent;
+ bool found;
+ int32 chunksize;
+ bool isnull;
+ Pointer chunk;
+ TupleDesc desc = RelationGetDescr(relation);
+ Oid chunk_id;
+ Oid chunk_seq;
+
+ if (txn->toast_hash == NULL)
+ ReorderBufferToastInitHash(buffer, txn);
+
+ Assert(IsToastRelation(relation));
+
+ chunk_id = DatumGetObjectId(fastgetattr(&change->newtuple->tuple, 1, desc, &isnull));
+ Assert(!isnull);
+ chunk_seq = DatumGetInt32(fastgetattr(&change->newtuple->tuple, 2, desc, &isnull));
+ Assert(!isnull);
+
+ ent = (ReorderBufferToastEnt *)
+ hash_search(txn->toast_hash,
+ (void *) &chunk_id,
+ HASH_ENTER,
+ &found);
+
+ if (!found)
+ {
+ Assert(ent->chunk_id == chunk_id);
+ ent->num_chunks = 0;
+ ent->last_chunk_seq = 0;
+ ent->size = 0;
+ ent->reconstructed = NULL;
+ dlist_init(&ent->chunks);
+
+ if (chunk_seq != 0)
+ elog(ERROR, "got sequence entry %d for toast chunk %u instead of seq 0",
+ chunk_seq, chunk_id);
+ }
+ else if (found && chunk_seq != ent->last_chunk_seq + 1)
+ elog(ERROR, "got sequence entry %d for toast chunk %u instead of seq %d",
+ chunk_seq, chunk_id, ent->last_chunk_seq + 1);
+
+ chunk = DatumGetPointer(fastgetattr(&change->newtuple->tuple, 3, desc, &isnull));
+ Assert(!isnull);
+
+ /* calculate size so we can allocate the right size at once later */
+ if (!VARATT_IS_EXTENDED(chunk))
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ else if (VARATT_IS_SHORT(chunk))
+ /* could happen due to heap_form_tuple doing its thing */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ else
+ elog(ERROR, "unexpected type of toast chunk");
+
+ ent->size += chunksize;
+ ent->last_chunk_seq = chunk_seq;
+ ent->num_chunks++;
+ dlist_push_tail(&ent->chunks, &change->node);
+}
+
+/*
+ * Rejigger change->newtuple to point to in-memory toast tuples instead to
+ * on-disk toast tuples that may not longer exist (think DROP TABLE or VACUUM).
+ *
+ * We cannot replace unchanged toast tuples though, so those will still point
+ * to on-disk toast data.
+ */
+static void
+ReorderBufferToastReplace(ReorderBuffer *buffer, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ TupleDesc desc;
+ int natt;
+ Datum *attrs;
+ bool *isnull;
+ bool *free;
+ HeapTuple newtup;
+ Relation toast_rel;
+ TupleDesc toast_desc;
+
+ /* no toast tuples changed */
+ if (txn->toast_hash == NULL)
+ return;
+
+ /* we should only have toast tuples in an INSERT or UPDATE */
+ Assert(change->newtuple);
+
+ desc = RelationGetDescr(relation);
+
+ toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
+ toast_desc = RelationGetDescr(toast_rel);
+
+ /* should we allocate from stack instead? */
+ attrs = palloc0(sizeof(Datum) * desc->natts);
+ isnull = palloc0(sizeof(bool) * desc->natts);
+ free = palloc0(sizeof(bool) * desc->natts);
+
+ heap_deform_tuple(&change->newtuple->tuple, desc,
+ attrs, isnull);
+
+ for (natt = 0; natt < desc->natts; natt++)
+ {
+ Form_pg_attribute attr = desc->attrs[natt];
+ ReorderBufferToastEnt *ent;
+ struct varlena *varlena;
+
+ /* va_rawsize is the size of the original datum -- including header */
+ struct varatt_external toast_pointer;
+ struct varatt_indirect redirect_pointer;
+ struct varlena *new_datum = NULL;
+ struct varlena *reconstructed;
+ dlist_iter it;
+ Size data_done = 0;
+
+ /* system columns aren't toasted */
+ if (attr->attnum < 0)
+ continue;
+
+ if (attr->attisdropped)
+ continue;
+
+ /* not a varlena datatype */
+ if (attr->attlen != -1)
+ continue;
+
+ /* no data */
+ if (isnull[natt])
+ continue;
+
+ /* ok, we know we have a toast datum */
+ varlena = (struct varlena *) DatumGetPointer(attrs[natt]);
+
+ /* no need to do anything if the tuple isn't external */
+ if (!VARATT_IS_EXTERNAL(varlena))
+ continue;
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, varlena);
+
+ /*
+ * check whether the toast tuple changed, replace if so.
+ */
+ ent = (ReorderBufferToastEnt *)
+ hash_search(txn->toast_hash,
+ (void *) &toast_pointer.va_valueid,
+ HASH_FIND,
+ NULL);
+ if (ent == NULL)
+ continue;
+
+ new_datum =
+ (struct varlena *) palloc0(INDIRECT_POINTER_SIZE);
+
+ free[natt] = true;
+
+ reconstructed = palloc0(toast_pointer.va_rawsize);
+
+ ent->reconstructed = reconstructed;
+
+ /* stitch toast tuple back together from its parts */
+ dlist_foreach(it, &ent->chunks)
+ {
+ bool isnull;
+ ReorderBufferTupleBuf *tup =
+ dlist_container(ReorderBufferChange, node, it.cur)->newtuple;
+ Pointer chunk =
+ DatumGetPointer(fastgetattr(&tup->tuple, 3, toast_desc, &isnull));
+
+ Assert(!isnull);
+ Assert(!VARATT_IS_EXTERNAL(chunk));
+ Assert(!VARATT_IS_SHORT(chunk));
+
+ memcpy(VARDATA(reconstructed) + data_done,
+ VARDATA(chunk),
+ VARSIZE(chunk) - VARHDRSZ);
+ data_done += VARSIZE(chunk) - VARHDRSZ;
+ }
+ Assert(data_done == toast_pointer.va_extsize);
+
+ /* make sure its marked as compressed or not */
+ if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+ SET_VARSIZE_COMPRESSED(reconstructed, data_done + VARHDRSZ);
+ else
+ SET_VARSIZE(reconstructed, data_done + VARHDRSZ);
+
+ memset(&redirect_pointer, 0, sizeof(redirect_pointer));
+ redirect_pointer.pointer = reconstructed;
+
+ SET_VARTAG_EXTERNAL(new_datum, VARTAG_INDIRECT);
+ memcpy(VARDATA_EXTERNAL(new_datum), &redirect_pointer,
+ sizeof(redirect_pointer));
+
+ attrs[natt] = PointerGetDatum(new_datum);
+ }
+
+ /*
+ * Build tuple in separate memory & copy tuple back into the tuplebuf
+ * passed to the output plugin. We can't directly heap_fill_tuple() into
+ * the tuplebuf because attrs[] will point back into the current content.
+ */
+ newtup = heap_form_tuple(desc, attrs, isnull);
+ Assert(change->newtuple->tuple.t_len <= MaxHeapTupleSize);
+ Assert(&change->newtuple->header == change->newtuple->tuple.t_data);
+
+ memcpy(change->newtuple->tuple.t_data,
+ newtup->t_data,
+ newtup->t_len);
+ change->newtuple->tuple.t_len = newtup->t_len;
+
+ /*
+ * free resources we won't further need, more persistent stuff will be
+ * free'd in ReorderBufferToastReset().
+ */
+ RelationClose(toast_rel);
+ pfree(newtup);
+ for (natt = 0; natt < desc->natts; natt++)
+ {
+ if (free[natt])
+ pfree(DatumGetPointer(attrs[natt]));
+ }
+ pfree(attrs);
+ pfree(free);
+ pfree(isnull);
+
+}
+
+/*
+ * Free all resources allocated for toast reconstruction.
+ */
+static void
+ReorderBufferToastReset(ReorderBuffer *buffer, ReorderBufferTXN *txn)
+{
+ HASH_SEQ_STATUS hstat;
+ ReorderBufferToastEnt *ent;
+
+ if (txn->toast_hash == NULL)
+ return;
+
+ /* sequentially walk over the hash and free everything */
+ hash_seq_init(&hstat, txn->toast_hash);
+ while ((ent = (ReorderBufferToastEnt *) hash_seq_search(&hstat)) != NULL)
+ {
+ dlist_mutable_iter it;
+
+ if (ent->reconstructed != NULL)
+ pfree(ent->reconstructed);
+
+ dlist_foreach_modify(it, &ent->chunks)
+ {
+ ReorderBufferChange *change =
+ dlist_container(ReorderBufferChange, node, it.cur);
+
+ dlist_delete(&change->node);
+ ReorderBufferReturnChange(buffer, change);
+ }
+ }
+
+ hash_destroy(txn->toast_hash);
+}
+
+
+/*
+ * Visibility support routines
+ */
+
+/*-------------------------------------------------------------------------
+ * Lookup actual cmin/cmax values during timetravel access. We can't always
+ * rely on stored cmin/cmax values because of two scenarios:
+ *
+ * * A tuple got changed multiple times during a single transaction and thus
+ * has got a combocid. Combocid's are only valid for the duration of a single
+ * transaction.
+ * * A tuple with a cmin but no cmax (and thus no combocid) got deleted/updated
+ * in another transaction than the one which created it which we are looking
+ * at right now. As only one of cmin, cmax or combocid is actually stored in
+ * the heap we don't have access to the the value we need anymore.
+ *
+ * To resolve those problems we have a per-transaction hash of (cmin, cmax)
+ * tuples keyed by (relfilenode, ctid) which contains the actual (cmin, cmax)
+ * values. That also takes care of combocids by simply not caring about them at
+ * all. As we have the real cmin/cmax values thats enough.
+ *
+ * As we only care about catalog tuples here the overhead of this hashtable
+ * should be acceptable.
+ * -------------------------------------------------------------------------
+ */
+extern bool
+ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
+ HeapTuple htup, Buffer buffer,
+ CommandId *cmin, CommandId *cmax)
+{
+ ReorderBufferTupleCidKey key;
+ ReorderBufferTupleCidEnt *ent;
+ ForkNumber forkno;
+ BlockNumber blockno;
+
+ /* be careful about padding */
+ memset(&key, 0, sizeof(key));
+
+ Assert(!BufferIsLocal(buffer));
+
+ /*
+ * get relfilenode from the buffer, no convenient way to access it other
+ * than that.
+ */
+ BufferGetTag(buffer, &key.relnode, &forkno, &blockno);
+
+ /* tuples can only be in the main fork */
+ Assert(forkno == MAIN_FORKNUM);
+ Assert(blockno == ItemPointerGetBlockNumber(&htup->t_self));
+
+ ItemPointerCopy(&htup->t_self,
+ &key.tid);
+
+ ent = (ReorderBufferTupleCidEnt *)
+ hash_search(tuplecid_data,
+ (void *) &key,
+ HASH_FIND,
+ NULL);
+
+ if (ent == NULL)
+ return false;
+
+ if (cmin)
+ *cmin = ent->cmin;
+ if (cmax)
+ *cmax = ent->cmax;
+ return true;
+}
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
new file mode 100644
index 0000000..9edd7ff
--- /dev/null
+++ b/src/backend/replication/logical/snapbuild.c
@@ -0,0 +1,1930 @@
+/*-------------------------------------------------------------------------
+ *
+ * snapbuild.c
+ *
+ * Support for building timetravel snapshots based on the contents of the
+ * wal
+ *
+ * NOTES:
+ *
+ * We build snapshots which can *only* be used to read catalog contents by
+ * reading the wal stream. The aim is to provide mvcc and SnapshotNow snapshots
+ * that behave the same as their respective counterparts would have at the time
+ * the XLogRecord was generated. This is done to provide a reliable environment
+ * for decoding those records into every format that pleases the author of an
+ * output plugin.
+ *
+ * To build the snapshots we reuse the infrastructure built for hot
+ * standby. The snapshots we build look different than HS' because we have
+ * different needs. To successfully decode data from the WAL we only need to
+ * access catalogs/(sys|rel|cat)cache, not the actual user tables since the
+ * data we decode is contained in the wal records. Also, our snapshots need to
+ * be different because in contrast to normal snapshots we can't fully rely on
+ * the clog for information about committed transactions because they might
+ * commit in the future from the POV of the wal entry we're currently decoding.
+ *
+ * As the percentage of transactions modifying the catalog normally is fairly
+ * small we keep track of the committed catalog modifying ones inside (xmin,
+ * xmax) instead of keeping track of all running transactions like its done in
+ * a normal snapshot. That is we keep a list of transactions between
+ * snapshot->(xmin, xmax) that we consider committed, everything else is
+ * considered aborted/in progress. That also allows us not to care about
+ * subtransactions before they have committed which means we don't have to deal
+ * with suboverflowed subtransactions and similar.
+ *
+ * Classic SnapshotNow behaviour - which is mainly used for efficiency, not for
+ * correctness - is not actually required by any of the routines that we need
+ * during decoding and is hard to emulate fully. Instead we build snapshots
+ * with MVCC behaviour that are updated whenever another transaction
+ * commits. That gives behaviour consistent with a SnapshotNow behaviour
+ * happening in exactly that instant without other transactions interfering.
+ *
+ * One additional complexity of doing this is that to e.g. handle mixed DDL/DML
+ * transactions we need Snapshots that see intermediate versions of the catalog
+ * in a transaction. During normal operation this is achieved by using
+ * CommandIds/cmin/cmax. The problem with this however is that for space
+ * efficiency reasons only one value of that is stored (c.f. combocid.c). Since
+ * Combocids are only available in memory we log additional information which
+ * allows us to get the original (cmin, cmax) pair during visibility checks.
+ *
+ * To facilitate all this we need our own visibility routine, as the normal
+ * ones are optimized for different usecases. We also need the code to use our
+ * special snapshots automatically whenever SnapshotNow behaviour is expected
+ * (specifying our snapshot everywhere would be far to invasive).
+ *
+ * To replace the normal SnapshotNows snapshots use the SetupDecodingSnapshots
+ * and RevertFromDecodingSnapshots functions.
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/snapbuild.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "access/heapam_xlog.h"
+#include "access/rmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogreader.h"
+
+#include "catalog/catalog.h"
+#include "catalog/pg_control.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_tablespace.h"
+
+#include "miscadmin.h"
+
+#include "replication/reorderbuffer.h"
+#include "replication/snapbuild.h"
+#include "replication/logical.h"
+
+#include "utils/builtins.h"
+#include "utils/catcache.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/relmapper.h"
+#include "utils/snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+#include "storage/block.h" /* debugging output */
+#include "storage/copydir.h" /* fsync_fname */
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/standby.h"
+#include "storage/sinval.h"
+
+typedef struct SnapBuild
+{
+ /* how far are we along building our first full snapshot */
+ SnapBuildState state;
+
+ /* private memory context used to allocate memory for this module. */
+ MemoryContext context;
+
+ /* all transactions < than this have committed/aborted */
+ TransactionId xmin;
+
+ /* all transactions >= than this are uncommitted */
+ TransactionId xmax;
+
+ /*
+ * Don't replay commits from an LSN <= this LSN. This can be set
+ * externally but it will also be advanced (never retreat) from within
+ * snapbuild.c.
+ */
+ XLogRecPtr transactions_after;
+
+ /*
+ * Don't start decoding WAL until the "xl_running_xacts" information
+ * indicates there are no running xids with a xid smaller than this.
+ */
+ TransactionId initial_xmin_horizon;
+
+ /*
+ * Snapshot thats valid to see all currently committed transactions that
+ * see catalog modifications.
+ */
+ Snapshot snapshot;
+
+ /*
+ * LSN of the last location we are sure a snapshot has been serialized to.
+ */
+ XLogRecPtr last_serialized_snapshot;
+
+ ReorderBuffer *reorder;
+
+ /* variable length data */
+
+ /*
+ * Information about initially running transactions
+ *
+ * When we start building a snapshot there already may be transactions in
+ * progress. Those are stored in running.xip. We don't have enough
+ * information about those to decode their contents, so until they are
+ * finished (xcnt=0) we cannot switch to a CONSISTENT state.
+ */
+ struct
+ {
+ /*
+ * As long as running.xcnt all XIDs < running.xmin and > running.xmax
+ * have to be checked whether they still are running.
+ */
+ TransactionId xmin;
+ TransactionId xmax;
+
+ size_t xcnt; /* number of used xip entries */
+ size_t xcnt_space; /* allocated size of xip */
+ TransactionId *xip; /* running xacts array, xidComparator-sorted */
+ } running;
+
+ /*
+ * Array of transactions which could have catalog changes that committed
+ * between xmin and xmax
+ */
+ struct
+ {
+ /* number of committed transactions */
+ size_t xcnt;
+
+ /* available space for committed transactions */
+ size_t xcnt_space;
+
+ /*
+ * Until we reach a CONSISTENT state, we record commits of all
+ * transactions, not just the catalog changing ones. Record when that
+ * changes so we know we cannot export a snapshot safely anymore.
+ */
+ bool includes_all_transactions;
+
+ /*
+ * Array of committed transactions that have modified the catalog.
+ *
+ * As this array is frequently modified we do *not* keep it in
+ * xidComparator order. Instead we sort the array when building &
+ * distributing a snapshot.
+ *
+ * XXX: That doesn't seem to be good reasoning anymore. Everytime we
+ * add something here after becoming consistent will also require
+ * distributing a snapshot. Storing them sorted would potentially make
+ * it easier to purge as well (but more complicated wrt wraparound?).
+ */
+ TransactionId *xip;
+ } committed;
+
+} SnapBuild;
+
+/*
+ * Starting a transaction -- which we need to do while exporting a snapshot --
+ * removes knowledge about the previously used resowner, so we save it here.
+ */
+ResourceOwner SavedResourceOwnerDuringExport = NULL;
+
+/* transaction state manipulation functions */
+static void SnapBuildEndTxn(SnapBuild *builder, TransactionId xid);
+
+static void SnapBuildAbortTxn(SnapBuild *builder, TransactionId xid, int nsubxacts,
+ TransactionId *subxacts);
+
+static void SnapBuildCommitTxn(SnapBuild *builder,
+ XLogRecPtr lsn, TransactionId xid,
+ int nsubxacts, TransactionId *subxacts);
+
+/* ->running manipulation */
+static bool SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid);
+
+/* ->committed manipulation */
+static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
+
+/* snapshot building/manipulation/distribution functions */
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid);
+
+static void SnapBuildFreeSnapshot(Snapshot snap);
+
+static void SnapBuildSnapIncRefcount(Snapshot snap);
+
+static void SnapBuildDistributeSnapshotNow(SnapBuild *builder, XLogRecPtr lsn);
+
+/* xlog reading helper functions for SnapBuildProcessRecord */
+static SnapBuildAction SnapBuildProcessFindSnapshot(SnapBuild *builder, XLogRecordBuffer *buf);
+static SnapBuildAction SnapBuildProcessHeap(SnapBuild *builder, XLogRecordBuffer *buf);
+static SnapBuildAction SnapBuildProcessHeap2(SnapBuild *builder, XLogRecordBuffer *buf);
+static SnapBuildAction SnapBuildProcessXlog(SnapBuild *builder, XLogRecordBuffer *buf);
+static SnapBuildAction SnapBuildProcessStandby(SnapBuild *builder, XLogRecordBuffer *buf);
+static SnapBuildAction SnapBuildProcessXact(SnapBuild *builder, XLogRecordBuffer *buf);
+
+
+/* on disk serialization & restore */
+static bool SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn);
+static void SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn);
+
+/*
+ * Lookup a table via its current relfilenode.
+ *
+ * This requires that some snapshot in which that relfilenode is actually
+ * visible to be set up.
+ *
+ * The result of this function needs to be released from the syscache.
+ */
+Relation
+LookupRelationByRelFileNode(RelFileNode *relfilenode)
+{
+ HeapTuple tuple;
+ Oid heaprel = InvalidOid;
+
+ /* shared relation */
+ if (relfilenode->spcNode == GLOBALTABLESPACE_OID)
+ {
+ heaprel = RelationMapFilenodeToOid(relfilenode->relNode, true);
+ }
+ else
+ {
+ Oid lookup_tablespace;
+
+ /*
+ * relations in the default tablespace are stored with InvalidOid as
+ * pg_class."reltablespace".
+ */
+ if (relfilenode->spcNode == DEFAULTTABLESPACE_OID)
+ lookup_tablespace = InvalidOid;
+ else
+ lookup_tablespace = relfilenode->spcNode;
+
+ tuple = SearchSysCache2(RELFILENODE,
+ lookup_tablespace,
+ relfilenode->relNode);
+
+ /* ok, found it */
+ if (HeapTupleIsValid(tuple))
+ {
+ heaprel = HeapTupleHeaderGetOid(tuple->t_data);
+ ReleaseSysCache(tuple);
+ }
+ /* has to be nonexistant or a nailed table */
+ else
+ {
+ heaprel = RelationMapFilenodeToOid(relfilenode->relNode, false);
+ }
+ }
+
+ /* shared or nailed table */
+ if (heaprel != InvalidOid)
+ return RelationIdGetRelation(heaprel);
+ return NULL;
+}
+
+
+/*
+ * Allocate a new snapshot builder.
+ */
+SnapBuild *
+AllocateSnapshotBuilder(ReorderBuffer *reorder,
+ TransactionId xmin_horizon,
+ XLogRecPtr start_lsn)
+{
+ MemoryContext context;
+ SnapBuild *builder;
+
+ context = AllocSetContextCreate(TopMemoryContext,
+ "snapshot builder context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ builder = MemoryContextAllocZero(context, sizeof(SnapBuild));
+
+ builder->state = SNAPBUILD_START;
+ builder->context = context;
+ builder->reorder = reorder;
+ /* Other struct members initialized by zeroing, above */
+
+ /* builder->running is initialized by zeroing, above */
+
+ builder->committed.xcnt = 0;
+ builder->committed.xcnt_space = 128; /* arbitrary number */
+ builder->committed.xip = MemoryContextAlloc(context,
+ builder->committed.xcnt_space
+ * sizeof(TransactionId));
+ builder->committed.includes_all_transactions = true;
+ builder->committed.xip =
+ MemoryContextAlloc(context,
+ builder->committed.xcnt_space *
+ sizeof(TransactionId));
+ builder->initial_xmin_horizon = xmin_horizon;
+ builder->transactions_after = start_lsn;
+ return builder;
+}
+
+/*
+ * Free a snapshot builder.
+ */
+void
+FreeSnapshotBuilder(SnapBuild *builder)
+{
+ MemoryContext context = builder->context;
+
+ if (builder->snapshot)
+ SnapBuildFreeSnapshot(builder->snapshot);
+
+ if (builder->running.xip)
+ pfree(builder->running.xip);
+
+ if (builder->committed.xip)
+ pfree(builder->committed.xip);
+
+ pfree(builder);
+
+ MemoryContextDelete(context);
+}
+
+/*
+ * Free an unreferenced snapshot that has previously been built by us.
+ */
+static void
+SnapBuildFreeSnapshot(Snapshot snap)
+{
+ /* make sure we don't get passed an external snapshot */
+ Assert(snap->satisfies == HeapTupleSatisfiesMVCCDuringDecoding);
+
+ /* make sure nobody modified our snapshot */
+ Assert(snap->curcid == FirstCommandId);
+ Assert(!snap->suboverflowed);
+ Assert(!snap->takenDuringRecovery);
+ Assert(!snap->regd_count);
+
+ /* slightly more likely, so it's checked even without c-asserts */
+ if (snap->copied)
+ elog(ERROR, "can't free a copied snapshot");
+
+ if (snap->active_count)
+ elog(ERROR, "can't free an active snapshot");
+
+ pfree(snap);
+}
+
+/*
+ * In which state of snapshot building ar we?
+ */
+SnapBuildState
+SnapBuildCurrentState(SnapBuild *builder)
+{
+ return builder->state;
+}
+
+/*
+ * Should the contents of transaction ending at 'ptr' be decoded?
+ */
+bool
+SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr)
+{
+ return ptr <= builder->transactions_after;
+}
+
+/*
+ * Increase refcount of a snapshot.
+ *
+ * This is used when handing out a snapshot to some external resource or when
+ * adding a Snapshot as builder->snapshot.
+ */
+static void
+SnapBuildSnapIncRefcount(Snapshot snap)
+{
+ snap->active_count++;
+}
+
+/*
+ * Decrease refcount of a snapshot and free if the refcount reaches zero.
+ *
+ * Externally visible so external resources that have been handed an IncRef'ed
+ * Snapshot can free it easily.
+ */
+void
+SnapBuildSnapDecRefcount(Snapshot snap)
+{
+ /* make sure we don't get passed an external snapshot */
+ Assert(snap->satisfies == HeapTupleSatisfiesMVCCDuringDecoding);
+
+ /* make sure nobody modified our snapshot */
+ Assert(snap->curcid == FirstCommandId);
+ Assert(!snap->suboverflowed);
+ Assert(!snap->takenDuringRecovery);
+ Assert(!snap->regd_count);
+
+ Assert(snap->active_count);
+
+ /* slightly more likely, so its checked even without casserts */
+ if (snap->copied)
+ elog(ERROR, "can't free a copied snapshot");
+
+ snap->active_count--;
+ if (!snap->active_count)
+ SnapBuildFreeSnapshot(snap);
+}
+
+/*
+ * Build a new snapshot, based on currently committed catalog-modifying
+ * transactions.
+ *
+ * In-progress transactions with catalog access are *not* allowed to modify
+ * these snapshots; they have to copy them and fill in appropriate ->curcid and
+ * ->subxip/subxcnt values.
+ */
+static Snapshot
+SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid)
+{
+ Snapshot snapshot;
+ Size ssize;
+
+ Assert(builder->state >= SNAPBUILD_FULL_SNAPSHOT);
+
+ ssize = sizeof(SnapshotData)
+ + sizeof(TransactionId) * builder->committed.xcnt
+ + sizeof(TransactionId) * 1 /* toplevel xid */ ;
+
+ snapshot = MemoryContextAllocZero(builder->context, ssize);
+
+ snapshot->satisfies = HeapTupleSatisfiesMVCCDuringDecoding;
+
+ /*
+ * We misuse the original meaning of SnapshotData's xip and subxip fields
+ * to make the more fitting for our needs.
+ *
+ * In the 'xip' array we store transactions that have to be treated as
+ * committed. Since we will only ever look at tuples from transactions
+ * that have modified the catalog its more efficient to store those few
+ * that exist between xmin and xmax (frequently there are none).
+ *
+ * Snapshots that are used in transactions that have modified the catalog
+ * also use the 'subxip' array to store their toplevel xid and all the
+ * subtransaction xids so we can recognize when we need to treat rows as
+ * visible that are not in xip but still need to be visible. Subxip only
+ * gets filled when the transaction is copied into the context of a
+ * catalog modifying transaction since we otherwise share a snapshot
+ * between transactions. As long as a txn hasn't modified the catalog it
+ * doesn't need to treat any uncommitted rows as visible, so there is no
+ * need for those xids.
+ *
+ * Both arrays are qsort'ed so that we can use bsearch() on them.
+ *
+ * XXX: Do we want extra fields instead of misusing existing ones instead?
+ */
+ Assert(TransactionIdIsNormal(builder->xmin));
+ Assert(TransactionIdIsNormal(builder->xmax));
+
+ snapshot->xmin = builder->xmin;
+ snapshot->xmax = builder->xmax;
+
+ /* store all transactions to be treated as committed by this snapshot */
+ snapshot->xip =
+ (TransactionId *) ((char *) snapshot + sizeof(SnapshotData));
+ snapshot->xcnt = builder->committed.xcnt;
+ memcpy(snapshot->xip, builder->committed.xip,
+ builder->committed.xcnt * sizeof(TransactionId));
+
+ /* sort so we can bsearch() */
+ qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+
+ /*
+ * Initially, subxip is empty, i.e. it's a snapshot to be used by
+ * transactions that don't modify the catalog. Might be changed later.
+ * XXX how and by whom?
+ */
+ snapshot->subxcnt = 0;
+ snapshot->subxip = NULL;
+
+ snapshot->suboverflowed = false;
+ snapshot->takenDuringRecovery = false;
+ snapshot->copied = false;
+ snapshot->curcid = FirstCommandId;
+ snapshot->active_count = 0;
+ snapshot->regd_count = 0;
+
+ return snapshot;
+}
+
+/*
+ * Export a snapshot so it can be set in another session with SET TRANSACTION
+ * SNAPSHOT.
+ *
+ * For that we need to start a transaction in the current backend as the
+ * importing side checks whether the source transaction is still open to make
+ * sure the xmin horizon hasn't advanced since then.
+ *
+ * After that we convert a locally built snapshot into the normal variant
+ * understood by HeapTupleSatisfiesMVCC et al.
+ */
+const char *
+SnapBuildExportSnapshot(SnapBuild *builder)
+{
+ Snapshot snap;
+ char *snapname;
+ TransactionId xid;
+ TransactionId *newxip;
+ int newxcnt = 0;
+
+ elog(LOG, "building snapshot");
+
+ if (builder->state != SNAPBUILD_CONSISTENT)
+ elog(ERROR, "cannot export a snapshot before reaching a consistent state");
+
+ if (!builder->committed.includes_all_transactions)
+ elog(ERROR, "cannot export a snapshot, not all transactions are monitored anymore");
+
+ /* so we don't overwrite the existing value */
+ if (TransactionIdIsValid(MyPgXact->xmin))
+ elog(ERROR, "cannot export a snapshot when MyPgXact->xmin already is valid");
+
+ if (SavedResourceOwnerDuringExport)
+ elog(ERROR, "can only export one snapshot at a time");
+
+ SavedResourceOwnerDuringExport = CurrentResourceOwner;
+
+ StartTransactionCommand();
+
+ Assert(!FirstSnapshotSet);
+
+ /* There doesn't seem to a nice API to set these */
+ XactIsoLevel = XACT_REPEATABLE_READ;
+ XactReadOnly = true;
+
+ snap = SnapBuildBuildSnapshot(builder,
+ GetTopTransactionId());
+
+ /*
+ * We know that snap->xmin is alive, enforced by the logical xmin
+ * mechanism. Due to that we can do this without locks, we're only
+ * changing our own value.
+ */
+ MyPgXact->xmin = snap->xmin;
+
+ /* allocate in transaction context */
+ newxip = (TransactionId *)
+ palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+
+ /*
+ * snapbuild.c builds transactions in an "inverted" manner, which means it
+ * stores committed transactions in ->xip, not ones in progress. Build a
+ * classical snapshot by marking all non-committed transactions as
+ * in-progress.
+ */
+ for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+ {
+ void *test;
+
+ /*
+ * check whether transaction committed using the timetravel meaning of
+ * ->xip
+ */
+ test = bsearch(&xid, snap->xip, snap->xcnt,
+ sizeof(TransactionId), xidComparator);
+
+ elog(DEBUG2, "checking xid %u.. %d (xmin %u, xmax %u)",
+ xid, test == NULL, snap->xmin, snap->xmax);
+
+ if (test == NULL)
+ {
+ if (newxcnt >= GetMaxSnapshotXidCount())
+ elog(ERROR, "snapshot too large");
+
+ newxip[newxcnt++] = xid;
+
+ elog(DEBUG2, "treat %u as in-progress", xid);
+ }
+
+ TransactionIdAdvance(xid);
+ }
+
+ snap->xcnt = newxcnt;
+ snap->xip = newxip;
+
+ snapname = ExportSnapshot(snap);
+
+ elog(LOG, "exported snapbuild snapshot: %s xcnt %u", snapname, snap->xcnt);
+
+ return snapname;
+}
+
+/*
+ * Reset a previously SnapBuildExportSnapshot'ed snapshot if there is
+ * any. Aborts the previously started transaction and resets the resource owner
+ * back to the previous value.
+ */
+void
+SnapBuildClearExportedSnapshot()
+{
+ /* nothing exported, thats the usual case */
+ if (SavedResourceOwnerDuringExport == NULL)
+ return;
+
+ /* make sure nothing could have ever happened */
+ AbortCurrentTransaction();
+
+ CurrentResourceOwner = SavedResourceOwnerDuringExport;
+ SavedResourceOwnerDuringExport = NULL;
+}
+
+/*
+ * Handle the effects of a single heap change, appropriate to the current state
+ * of the snapshot builder.
+ */
+static SnapBuildAction
+SnapBuildProcessChange(SnapBuild *builder, TransactionId xid,
+ XLogRecordBuffer *buf, RelFileNode *relfilenode)
+{
+ SnapBuildAction ret = SNAPBUILD_SKIP;
+
+ /*
+ * We can't handle data in transactions if we haven't built a snapshot
+ * yet, so don't store them.
+ */
+ if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
+ ;
+
+ /*
+ * No point in keeping track of changes in transactions that we don't have
+ * enough information about to decode.
+ */
+ else if (builder->state < SNAPBUILD_CONSISTENT &&
+ SnapBuildTxnIsRunning(builder, xid))
+ ;
+ else
+ {
+ bool old_tx = ReorderBufferIsXidKnown(builder->reorder, xid);
+
+ ret = SNAPBUILD_DECODE;
+
+ if (!old_tx || !ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
+ {
+ /* only build snapshot if we don't have a prebuilt one */
+ if (builder->snapshot == NULL)
+ {
+ builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+ /* refcount of the snapshot builder */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+ }
+
+ /* refcount of the transaction */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+ ReorderBufferSetBaseSnapshot(builder->reorder,
+ xid, buf->origptr,
+ builder->snapshot);
+ }
+ }
+
+ return ret;
+}
+
+/*
+ * Process a single xlog record.
+ */
+SnapBuildAction
+SnapBuildProcessRecord(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+ SnapBuildAction ret = SNAPBUILD_SKIP;
+
+ /*
+ * Only search for an initial starting point if we haven't build a full
+ * snapshot yet.
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ {
+ ret = SnapBuildProcessFindSnapshot(builder, buf);
+ if (ret == SNAPBUILD_SKIP)
+ return ret;
+ }
+
+ /*
+ * Don't have a starting point to decode from, no point in collecting any
+ * information.
+ */
+ if (builder->state == SNAPBUILD_START)
+ return SNAPBUILD_SKIP;
+
+ /*
+ * Check whether individual records require changes to our snapshot and
+ * whether their content should be decoded because it contains user
+ * visible data.
+ */
+ switch (buf->record.xl_rmid)
+ {
+ case RM_XLOG_ID:
+ ret = SnapBuildProcessXlog(builder, buf);
+ break;
+ case RM_STANDBY_ID:
+ ret = SnapBuildProcessStandby(builder, buf);
+ break;
+ case RM_XACT_ID:
+ ret = SnapBuildProcessXact(builder, buf);
+ break;
+ case RM_HEAP_ID:
+ ret = SnapBuildProcessHeap(builder, buf);
+ break;
+ case RM_HEAP2_ID:
+ ret = SnapBuildProcessHeap2(builder, buf);
+ break;
+ }
+
+ return ret;
+}
+
+
+/*
+ * Check whether `xid` is currently 'running'. Running transactions in our
+ * parlance are transactions which we didn't observe from the start so we can't
+ * properly decode them. They only exist after we freshly started from an
+ * < CONSISTENT snapshot.
+ */
+static bool
+SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid)
+{
+ Assert(builder->state < SNAPBUILD_CONSISTENT);
+ Assert(TransactionIdIsValid(builder->running.xmin));
+ Assert(TransactionIdIsValid(builder->running.xmax));
+
+ if (builder->running.xcnt &&
+ NormalTransactionIdFollows(xid, builder->running.xmin) &&
+ NormalTransactionIdPrecedes(xid, builder->running.xmax))
+ {
+ TransactionId *search =
+ bsearch(&xid, builder->running.xip, builder->running.xcnt_space,
+ sizeof(TransactionId), xidComparator);
+
+ if (search != NULL)
+ {
+ Assert(*search == xid);
+ return true;
+ }
+ }
+
+ return false;
+}
+
+/*
+ * Add a new SnapshotNow to all transactions we're decoding that currently are
+ * in-progress so they can see new catalog contents made by the transaction
+ * that just committed.
+ */
+static void
+SnapBuildDistributeSnapshotNow(SnapBuild *builder, XLogRecPtr lsn)
+{
+ dlist_iter txn_i;
+ ReorderBufferTXN *txn;
+
+ dlist_foreach(txn_i, &builder->reorder->toplevel_by_lsn)
+ {
+ txn = dlist_container(ReorderBufferTXN, node, txn_i.cur);
+
+ /*
+ * XXX: we can ignore transactions that are known as subxacts here if
+ * we make sure their parent transaction has a base snapshot if this
+ * one has one.
+ */
+
+ /*
+ * If we don't have a base snapshot yet, there are no changes yet
+ * which in turn implies we don't yet need a new snapshot.
+ */
+ if (ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
+ {
+ elog(DEBUG2, "adding a new snapshot to %u at %X/%X",
+ txn->xid, (uint32) (lsn >> 32), (uint32) lsn);
+ SnapBuildSnapIncRefcount(builder->snapshot);
+ ReorderBufferAddSnapshot(builder->reorder, txn->xid, lsn,
+ builder->snapshot);
+ }
+ }
+}
+
+/*
+ * Keep track of a new catalog changing transaction that has committed.
+ */
+static void
+SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
+{
+ Assert(TransactionIdIsValid(xid));
+
+ if (builder->committed.xcnt == builder->committed.xcnt_space)
+ {
+ builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
+
+ /* XXX: put in a limit here as a defense against bugs? */
+
+ elog(WARNING, "increasing space for committed transactions to %zu",
+ builder->committed.xcnt_space);
+
+ builder->committed.xip = repalloc(builder->committed.xip,
+ builder->committed.xcnt_space * sizeof(TransactionId));
+ }
+
+ /*
+ * XXX: It might make sense to keep the array sorted here instead of doing
+ * it everytime we build a new snapshot. On the other hand this gets called
+ * repeatedly when a transaction with subtransactions commits.
+ */
+ builder->committed.xip[builder->committed.xcnt++] = xid;
+}
+
+/*
+ * Remove all transactions we treat as committed that are smaller than
+ * ->xmin. Those won't ever get checked via the ->commited array but via the
+ * clog machinery, so we don't need to waste memory on them.
+ */
+static void
+SnapBuildPurgeCommittedTxn(SnapBuild *builder)
+{
+ int off;
+ TransactionId *workspace;
+ int surviving_xids = 0;
+
+ /* not ready yet */
+ if (!TransactionIdIsNormal(builder->xmin))
+ return;
+
+ /* XXX: Neater algorithm? */
+ workspace =
+ MemoryContextAlloc(builder->context,
+ builder->committed.xcnt * sizeof(TransactionId));
+
+ /* copy xids that still are interesting to workspace */
+ for (off = 0; off < builder->committed.xcnt; off++)
+ {
+ if (NormalTransactionIdPrecedes(builder->committed.xip[off],
+ builder->xmin))
+ ; /* remove */
+ else
+ workspace[surviving_xids++] = builder->committed.xip[off];
+ }
+
+ /* copy workspace back to persistent state */
+ memcpy(builder->committed.xip, workspace,
+ surviving_xids * sizeof(TransactionId));
+
+ elog(DEBUG1, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
+ (uint32) builder->committed.xcnt, (uint32) surviving_xids,
+ builder->xmin, builder->xmax);
+ builder->committed.xcnt = surviving_xids;
+
+ pfree(workspace);
+}
+
+/*
+ * Common logic for SnapBuildAbortTxn and SnapBuildCommitTxn dealing with
+ * keeping track of the amount of running transactions.
+ */
+static void
+SnapBuildEndTxn(SnapBuild *builder, TransactionId xid)
+{
+ if (builder->state == SNAPBUILD_CONSISTENT)
+ return;
+
+ if (SnapBuildTxnIsRunning(builder, xid))
+ {
+ if (!--builder->running.xcnt)
+ {
+ /*
+ * none of the originally running transaction is running anymore.
+ * Due to that our incrementaly built snapshot now is complete.
+ */
+ elog(LOG, "found consistent point due to SnapBuildEndTxn + running: %u", xid);
+ builder->state = SNAPBUILD_CONSISTENT;
+ }
+ }
+}
+
+/*
+ * Abort a transaction, throw away all state we kept
+ */
+static void
+SnapBuildAbortTxn(SnapBuild *builder, TransactionId xid,
+ int nsubxacts, TransactionId *subxacts)
+{
+ int i;
+
+ for (i = 0; i < nsubxacts; i++)
+ {
+ TransactionId subxid = subxacts[i];
+
+ SnapBuildEndTxn(builder, subxid);
+ }
+
+ SnapBuildEndTxn(builder, xid);
+}
+
+/*
+ * Handle everything that needs to be done when a transaction commits
+ */
+static void
+SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
+ int nsubxacts, TransactionId *subxacts)
+{
+ int nxact;
+
+ bool forced_timetravel = false;
+ bool sub_does_timetravel = false;
+ bool top_does_timetravel = false;
+
+ TransactionId xmax = xid;
+
+ /*
+ * If we couldn't observe every change of a transaction because it was
+ * already running at the point we started to observe we have to assume it
+ * made catalog changes.
+ *
+ * This has the positive benefit that we afterwards have enough
+ * information to build an exportable snapshot thats usable by pg_dump et
+ * al.
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ {
+ /* ensure that only commits after this are getting replayed */
+ if (builder->transactions_after < lsn)
+ builder->transactions_after = lsn;
+
+ /*
+ * we could avoid treating !SnapBuildTxnIsRunning transactions as
+ * timetravel ones, but we want to be able to export a snapshot when
+ * we reached consistency.
+ */
+ forced_timetravel = true;
+ elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running to early", xid);
+ }
+
+ for (nxact = 0; nxact < nsubxacts; nxact++)
+ {
+ TransactionId subxid = subxacts[nxact];
+
+ /*
+ * make sure txn is not tracked in running txn's anymore, switch state
+ */
+ SnapBuildEndTxn(builder, subxid);
+
+ /*
+ * If we're forcing timetravel we also need accurate subtransaction
+ * status.
+ */
+ if (forced_timetravel)
+ {
+ SnapBuildAddCommittedTxn(builder, subxid);
+ if (NormalTransactionIdFollows(subxid, xmax))
+ xmax = subxid;
+ }
+
+ /*
+ * add subtransaction to base snapshot, we don't distinguish to
+ * toplevel transactions there.
+ */
+ else if (ReorderBufferXidDoesTimetravel(builder->reorder, subxid))
+ {
+ sub_does_timetravel = true;
+
+ elog(DEBUG1, "found subtransaction %u:%u with catalog changes.",
+ xid, subxid);
+
+ SnapBuildAddCommittedTxn(builder, subxid);
+
+ if (NormalTransactionIdFollows(subxid, xmax))
+ xmax = subxid;
+ }
+ }
+
+ /*
+ * make sure txn is not tracked in running txn's anymore, switch state
+ */
+ SnapBuildEndTxn(builder, xid);
+
+ if (forced_timetravel)
+ {
+ elog(DEBUG1, "forced transaction %u to do timetravel.", xid);
+
+ SnapBuildAddCommittedTxn(builder, xid);
+ }
+ /* add toplevel transaction to base snapshot */
+ else if (ReorderBufferXidDoesTimetravel(builder->reorder, xid))
+ {
+ elog(DEBUG1, "found top level transaction %u, with catalog changes!",
+ xid);
+
+ top_does_timetravel = true;
+ SnapBuildAddCommittedTxn(builder, xid);
+ }
+ else if (sub_does_timetravel)
+ {
+ /* mark toplevel txn as timetravel as well */
+ SnapBuildAddCommittedTxn(builder, xid);
+ }
+
+ if (forced_timetravel || top_does_timetravel || sub_does_timetravel)
+ {
+ if (!TransactionIdIsValid(builder->xmax) ||
+ TransactionIdFollowsOrEquals(xmax, builder->xmax))
+ {
+ builder->xmax = xmax;
+ TransactionIdAdvance(builder->xmax);
+ }
+
+ if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
+ return;
+
+ /* refcount of the transaction */
+ if (builder->snapshot)
+ SnapBuildSnapDecRefcount(builder->snapshot);
+
+ builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+
+ /* refcount of the snapshot builder */
+ SnapBuildSnapIncRefcount(builder->snapshot);
+
+ /* add a new SnapshotNow to all currently running transactions */
+ SnapBuildDistributeSnapshotNow(builder, lsn);
+ }
+ else
+ {
+ /* record that we cannot export a general snapshot anymore */
+ builder->committed.includes_all_transactions = false;
+ }
+}
+
+
+/* -----------------------------------
+ * Snapshot building functions dealing with xlog records
+ * -----------------------------------
+ */
+
+/*
+ * Build the start of a snapshot that's capable of decoding the catalog.
+ */
+static SnapBuildAction
+SnapBuildProcessFindSnapshot(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+ uint8 info = buf->record.xl_info & ~XLR_INFO_MASK;
+ xl_running_xacts *running;
+
+ /* we need a RUNNING_XACTS record */
+ if (buf->record.xl_rmid != RM_STANDBY_ID || info != XLOG_RUNNING_XACTS)
+ return SNAPBUILD_DECODE;
+
+ /* ---
+ * Build catalog decoding snapshot incrementally using information about
+ * the currently running transactions. There are several ways to achieve that:
+ * a) there were no running transactions at all
+ * b) all transactions that were known to be running at a previous xl_running record
+ * now have finished (c.f. SnapBuildEndTxn).
+ * c) This (in a previous run) or another decoding slot serialized a
+ * snapshot to disk that we can use to start us up.
+ * ---
+ */
+ running = (xl_running_xacts *) buf->record_data;
+
+ /*
+ * xl_running_xact record is older than what we can use, we might not have
+ * all necessary catalog rows anymore.
+ */
+ if (TransactionIdIsNormal(builder->initial_xmin_horizon) &&
+ NormalTransactionIdPrecedes(running->oldestRunningXid,
+ builder->initial_xmin_horizon))
+ {
+ elog(LOG, "skipping snapshot at %X/%X due to initial xmin horizon of %u vs the snapshot's %u",
+ (uint32) (buf->origptr >> 32), (uint32) buf->origptr,
+ builder->initial_xmin_horizon, running->oldestRunningXid);
+ }
+
+ /*
+ * a) No transaction were running, we can jump to consistent.
+ *
+ * NB: We might have already started to incrementally assemble a snapshot,
+ * so we need to be careful to deal with that.
+ */
+ else if (running->xcnt == 0)
+ {
+ if (builder->transactions_after == InvalidXLogRecPtr ||
+ builder->transactions_after < buf->origptr)
+ builder->transactions_after = buf->origptr;
+
+ builder->xmin = running->oldestRunningXid;
+ builder->xmax = running->latestCompletedXid;
+ TransactionIdAdvance(builder->xmax);
+
+ Assert(TransactionIdIsNormal(builder->xmin));
+ Assert(TransactionIdIsNormal(builder->xmax));
+
+ /* no transactions running now */
+ builder->running.xcnt = 0;
+ builder->running.xmin = InvalidTransactionId;
+ builder->running.xmax = InvalidTransactionId;
+
+ /*
+ * FIXME: abort everything we have stored about running transactions,
+ * relevant e.g. after a crash.
+ */
+ builder->state = SNAPBUILD_CONSISTENT;
+
+ elog(LOG, "found initial snapshot (xmin %u) due to running xacts with xcnt == 0",
+ builder->xmin);
+ return SNAPBUILD_SKIP;
+ }
+ /* c) valid on disk state */
+ else if (SnapBuildRestore(builder, buf->origptr))
+ {
+ Assert(builder->state == SNAPBUILD_CONSISTENT);
+ elog(LOG, "recovered initial snapshot (xmin %u) from disk",
+ builder->xmin);
+ return SNAPBUILD_SKIP;
+ }
+
+ /*
+ * b) first encounter of a useable xl_running_xacts record. If we had found
+ * one earlier we would either track running transactions or be
+ * consistent.
+ */
+ else if (!builder->running.xcnt)
+ {
+ /*
+ * We only care about toplevel xids as those are the ones we
+ * definitely see in the wal stream. As snapbuild.c tracks committed
+ * instead of running transactions we don't need to know anything
+ * about uncommitted subtransactions.
+ */
+ builder->xmin = running->oldestRunningXid;
+ builder->xmax = running->latestCompletedXid;
+ TransactionIdAdvance(builder->xmax);
+
+ /* so we can safely use the faster comparisons */
+ Assert(TransactionIdIsNormal(builder->xmin));
+ Assert(TransactionIdIsNormal(builder->xmax));
+
+ builder->running.xcnt = running->xcnt;
+ builder->running.xcnt_space = running->xcnt;
+ builder->running.xip =
+ MemoryContextAlloc(builder->context,
+ builder->running.xcnt * sizeof(TransactionId));
+ memcpy(builder->running.xip, running->xids,
+ builder->running.xcnt * sizeof(TransactionId));
+
+ /* sort so we can do a binary search */
+ qsort(builder->running.xip, builder->running.xcnt,
+ sizeof(TransactionId), xidComparator);
+
+ builder->running.xmin = builder->running.xip[0];
+ builder->running.xmax = builder->running.xip[running->xcnt - 1];
+
+ /* makes comparisons cheaper later */
+ TransactionIdRetreat(builder->running.xmin);
+ TransactionIdAdvance(builder->running.xmax);
+
+ builder->state = SNAPBUILD_FULL_SNAPSHOT;
+
+ elog(LOG, "found initial snapshot (xmin %u) due to running xacts, %u xacts need to finish",
+ builder->xmin, (uint32) builder->running.xcnt);
+
+ return SNAPBUILD_SKIP;
+ }
+
+ /*
+ * We already started to track running xacts and need to wait for all
+ * in-progress ones to finish. We fall through to the normal processing of
+ * records so incremental cleanup can be performed.
+ */
+ return SNAPBUILD_DECODE;
+}
+
+/*
+ * Process RM_HEAP_ID records for SnapBuildProcessRecord()
+ */
+static SnapBuildAction
+SnapBuildProcessHeap(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+ uint8 info = buf->record.xl_info & ~XLR_INFO_MASK;
+ SnapBuildAction ret = SNAPBUILD_SKIP;
+ TransactionId xid = buf->record.xl_xid;
+
+ switch (info & XLOG_HEAP_OPMASK)
+ {
+ case XLOG_HEAP_INPLACE:
+ {
+ xl_heap_inplace *xlrec;
+
+ xlrec = (xl_heap_inplace *) buf->record_data;
+
+ ret = SnapBuildProcessChange(builder, xid, buf,
+ &xlrec->target.node);
+
+ /* heap_inplace is only done in catalog modifying txns */
+ ReorderBufferXidSetTimetravel(builder->reorder, xid, buf->origptr);
+
+ break;
+ }
+
+ case XLOG_HEAP_LOCK:
+
+ /*
+ * We only ever read changes, so row level locks aren't
+ * interesting.
+ */
+ break;
+
+ case XLOG_HEAP_INSERT:
+ {
+ xl_heap_insert *xlrec = (xl_heap_insert *) buf->record_data;
+
+ ret = SnapBuildProcessChange(builder, xid, buf,
+ &xlrec->target.node);
+ break;
+ }
+ /* HEAP(_HOT)?_UPDATE use the same data layout */
+ case XLOG_HEAP_UPDATE:
+ case XLOG_HEAP_HOT_UPDATE:
+ {
+ xl_heap_update *xlrec = (xl_heap_update *) buf->record_data;
+
+ ret = SnapBuildProcessChange(builder, xid, buf,
+ &xlrec->target.node);
+ break;
+ }
+ case XLOG_HEAP_DELETE:
+ {
+ xl_heap_delete *xlrec = (xl_heap_delete *) buf->record_data;
+
+ ret = SnapBuildProcessChange(builder, xid, buf,
+ &xlrec->target.node);
+ break;
+ }
+ default:
+ break;
+ }
+ return ret;
+}
+
+/*
+ * Process RM_HEAP2_ID records for SnapBuildProcessRecord()
+ */
+static SnapBuildAction
+SnapBuildProcessHeap2(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+ uint8 info = buf->record.xl_info & ~XLR_INFO_MASK;
+ SnapBuildAction ret = SNAPBUILD_SKIP;
+ TransactionId xid = buf->record.xl_xid;
+
+ switch (info)
+ {
+ case XLOG_HEAP2_MULTI_INSERT:
+ {
+ xl_heap_multi_insert *xlrec;
+
+ xlrec = (xl_heap_multi_insert *) buf->record_data;
+
+ ret = SnapBuildProcessChange(builder, xid, buf,
+ &xlrec->node);
+ break;
+ }
+ case XLOG_HEAP2_NEW_CID:
+ {
+ xl_heap_new_cid *xlrec;
+ CommandId cid;
+
+ xlrec = (xl_heap_new_cid *) buf->record_data;
+
+ /*
+ * we only log new_cid's if a catalog tuple was modified, so
+ * set transaction to timetravelling.
+ */
+ ReorderBufferXidSetTimetravel(builder->reorder, xid,
+ buf->origptr);
+
+ ReorderBufferAddNewTupleCids(builder->reorder,
+ xlrec->top_xid,
+ buf->origptr,
+ xlrec->target.node,
+ xlrec->target.tid,
+ xlrec->cmin, xlrec->cmax,
+ xlrec->combocid);
+
+ /* figure out new command id */
+ if (xlrec->cmin != InvalidCommandId &&
+ xlrec->cmax != InvalidCommandId)
+ cid = Max(xlrec->cmin, xlrec->cmax);
+ else if (xlrec->cmax != InvalidCommandId)
+ cid = xlrec->cmax;
+ else if (xlrec->cmin != InvalidCommandId)
+ cid = xlrec->cmin;
+ else
+ {
+ cid = InvalidCommandId; /* silence compiler */
+ elog(ERROR, "broken arrow, no cid?");
+ }
+
+ /*
+ * FIXME: potential race condition here: if multiple snapshots
+ * were running & generating changes in the same transaction
+ * on the source side this could be problematic. But this
+ * cannot happen for system catalogs, right?
+ */
+ ReorderBufferAddNewCommandId(builder->reorder, xid,
+ buf->origptr, cid + 1);
+ break;
+ }
+ default:
+ break;
+ }
+
+ return ret;
+}
+
+/*
+ * Process RM_XLOG_ID records for SnapBuildProcessRecord()
+ */
+static SnapBuildAction
+SnapBuildProcessXlog(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+ uint8 info = buf->record.xl_info & ~XLR_INFO_MASK;
+
+ switch (info)
+ {
+ case XLOG_CHECKPOINT_SHUTDOWN:
+
+ /*
+ * FIXME: abort everything but prepared xacts, we don't track
+ * prepared xacts though so far. It might alo be neccesary to do
+ * this to handle subtxn ids that haven't been assigned to a
+ * toplevel xid after a crash.
+ */
+ SnapBuildSerialize(builder, buf->origptr);
+ break;
+ case XLOG_CHECKPOINT_ONLINE:
+
+ /*
+ * a RUNNING_XACTS record will have been logged around this, we
+ * can restart from there.
+ */
+ break;
+ default:
+ break;
+ }
+ return SNAPBUILD_SKIP;
+}
+
+/*
+ * Process RM_STANDBY_ID records for SnapBuildProcessRecord()
+ */
+static SnapBuildAction
+SnapBuildProcessStandby(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+ uint8 info = buf->record.xl_info & ~XLR_INFO_MASK;
+
+ switch (info)
+ {
+ case XLOG_RUNNING_XACTS:
+ {
+ xl_running_xacts *running;
+ ReorderBufferTXN *txn;
+
+ running = (xl_running_xacts *) buf->record_data;
+
+ SnapBuildSerialize(builder, buf->origptr);
+
+ /*
+ * update range of interesting xids. We don't increase ->xmax
+ * because once we are in a consistent state we can do that
+ * ourselves and much more efficiently so because we only need
+ * to do it for catalog transactions.
+ */
+ builder->xmin = running->oldestRunningXid;
+
+
+ /*
+ * xmax can be lower than xmin here because we only increase
+ * xmax when we hit a transaction with catalog changes. While
+ * odd looking, its correct and actually more efficient this
+ * way since we hit fast paths in tqual.c.
+ */
+
+ /*
+ * Remove transactions we don't need to keep track off
+ * anymore.
+ */
+ SnapBuildPurgeCommittedTxn(builder);
+
+ elog(DEBUG1, "xmin: %u, xmax: %u, oldestrunning: %u",
+ builder->xmin, builder->xmax,
+ running->oldestRunningXid);
+
+ /*
+ * inrease shared memory state, so vacuum can work on tuples
+ * we prevent from being purged.
+ */
+ IncreaseLogicalXminForSlot(buf->origptr,
+ running->oldestRunningXid);
+
+ /*
+ * Also tell the slot where we can restart decoding from. We
+ * don't want to do that after every commit because changing
+ * that implies an fsync...
+ */
+ txn = ReorderBufferGetOldestTXN(builder->reorder);
+
+ /*
+ * oldest ongoing txn might have started when we didn't yet
+ * serialize anything because we haven't reached a consistent
+ * state yet.
+ */
+ if (txn != NULL &&
+ txn->restart_decoding_lsn != InvalidXLogRecPtr)
+ {
+ IncreaseRestartDecodingForSlot(buf->origptr,
+ txn->restart_decoding_lsn);
+ }
+
+ /*
+ * no ongoing transaction, can reuse the last serialized
+ * snapshot if we have one.
+ */
+ else if (txn == NULL &&
+ builder->reorder->current_restart_decoding_lsn != InvalidXLogRecPtr &&
+ builder->last_serialized_snapshot != InvalidXLogRecPtr)
+ {
+ IncreaseRestartDecodingForSlot(buf->origptr,
+ builder->last_serialized_snapshot);
+ }
+
+ break;
+ }
+ case XLOG_STANDBY_LOCK:
+ default:
+ break;
+ }
+ return SNAPBUILD_SKIP;
+}
+
+/*
+ * Process RM_XACT_ID records for SnapBuildProcessRecord()
+ */
+static SnapBuildAction
+SnapBuildProcessXact(SnapBuild *builder, XLogRecordBuffer *buf)
+{
+ uint8 info = buf->record.xl_info & ~XLR_INFO_MASK;
+ SnapBuildAction ret = SNAPBUILD_SKIP;
+ TransactionId xid = buf->record.xl_xid;
+
+
+ switch (info)
+ {
+ case XLOG_XACT_COMMIT:
+ {
+ xl_xact_commit *xlrec = (xl_xact_commit *) buf->record_data;
+
+ /*
+ * Queue cache invalidation messages.
+ */
+ if (xlrec->nmsgs)
+ {
+ TransactionId *subxacts;
+ SharedInvalidationMessage *inval_msgs;
+
+ /* subxid array follows relfilenodes */
+ subxacts = (TransactionId *)
+ &(xlrec->xnodes[xlrec->nrels]);
+ /* invalidation messages follow subxids */
+ inval_msgs = (SharedInvalidationMessage *)
+ &(subxacts[xlrec->nsubxacts]);
+
+ /*
+ * no need to check XactCompletionRelcacheInitFileInval,
+ * we will process the sinval messages that the relmapper
+ * change has generated.
+ */
+ ReorderBufferAddInvalidations(builder->reorder, xid,
+ buf->origptr,
+ xlrec->nmsgs, inval_msgs);
+
+ /*
+ * Let everyone know that this transaction modified the
+ * catalog. We need this at commit time.
+ */
+ ReorderBufferXidSetTimetravel(builder->reorder, xid,
+ buf->origptr);
+
+ }
+
+ SnapBuildCommitTxn(builder, buf->origptr, xid,
+ xlrec->nsubxacts,
+ (TransactionId *) &xlrec->xnodes);
+ ret = SNAPBUILD_DECODE;
+ break;
+ }
+ case XLOG_XACT_COMMIT_COMPACT:
+ {
+ xl_xact_commit_compact *xlrec;
+
+ xlrec = (xl_xact_commit_compact *) buf->record_data;
+
+ SnapBuildCommitTxn(builder, buf->origptr, xid,
+ xlrec->nsubxacts, xlrec->subxacts);
+
+ ret = SNAPBUILD_DECODE;
+ break;
+ }
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ xl_xact_commit_prepared *xlrec;
+ TransactionId *subxacts;
+
+ xlrec = (xl_xact_commit_prepared *) buf->record_data;
+ subxacts = (TransactionId *) &xlrec->crec.xnodes;
+ /* FIXME: check for invalidation messages! */
+
+ SnapBuildCommitTxn(builder, buf->origptr,
+ xlrec->xid,
+ xlrec->crec.nsubxacts, subxacts);
+
+ ret = SNAPBUILD_DECODE;
+ break;
+ }
+ case XLOG_XACT_ABORT:
+ {
+ xl_xact_abort *xlrec;
+ TransactionId *subxacts;
+
+ xlrec = (xl_xact_abort *) buf->record_data;
+ subxacts = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
+
+ SnapBuildAbortTxn(builder, xid, xlrec->nsubxacts, subxacts);
+
+ ret = SNAPBUILD_DECODE;
+ break;
+ }
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ xl_xact_abort_prepared *xlrec;
+ xl_xact_abort *arec;
+ TransactionId *subxacts;
+
+ xlrec = (xl_xact_abort_prepared *) buf->record_data;
+ arec = &xlrec->arec;
+ subxacts = (TransactionId *) &(arec->xnodes[arec->nrels]);
+
+ SnapBuildAbortTxn(builder, xlrec->xid, arec->nsubxacts,
+ subxacts);
+
+ ret = SNAPBUILD_DECODE;
+ break;
+ }
+ case XLOG_XACT_ASSIGNMENT:
+ break;
+ case XLOG_XACT_PREPARE:
+
+ /*
+ * XXX: We could take note of all in-progress prepared xacts so we
+ * can use shutdown checkpoints to abort in-progress
+ * transactions...
+ */
+ break;
+ default:
+ break;
+ }
+ return ret;
+}
+
+/* -----------------------------------
+ * Snapshot serialization support
+ * -----------------------------------
+ */
+
+/*
+ * We store current state of struct SnapBuild on disk in the following manner:
+ *
+ * struct SnapBuild;
+ * TransactionId * running.xcnt_space;
+ * TransactionId * committed.xcnt; (*not xcnt_space*)
+ *
+ */
+typedef struct SnapBuildOnDisk
+{
+ uint32 magic;
+ /* how large is the SnapBuildOnDisk including all data in state */
+ Size size;
+ SnapBuild builder;
+ /* variable amount of TransactionId's */
+} SnapBuildOnDisk;
+
+#define SNAPBUILD_MAGIC 0x51A1E001
+
+/*
+ * Serialize the snapshot 'builder' at the location 'lsn' if it hasn't already
+ * been done by another decoding process.
+ */
+static void
+SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
+{
+ Size needed_size;
+ SnapBuildOnDisk *ondisk;
+ char *ondisk_c;
+ int fd;
+ char tmppath[MAXPGPATH];
+ char path[MAXPGPATH];
+ int ret;
+ struct stat stat_buf;
+
+ needed_size = sizeof(SnapBuildOnDisk) +
+ sizeof(TransactionId) * builder->running.xcnt_space +
+ sizeof(TransactionId) * builder->committed.xcnt;
+
+ Assert(lsn != InvalidXLogRecPtr);
+ Assert(builder->last_serialized_snapshot == InvalidXLogRecPtr ||
+ builder->last_serialized_snapshot <= lsn);
+
+ /*
+ * no point in serializing if we cannot continue to work immediately after
+ * restoring the snapshot
+ */
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ return;
+
+ /*
+ * FIXME: Timeline handling
+ */
+
+ /*
+ * first check whether some other backend already has written the snapshot
+ * for this LSN
+ */
+ sprintf(path, "pg_llog/snapshots/%X-%X.snap",
+ (uint32) (lsn >> 32), (uint32) lsn);
+
+ ret = stat(path, &stat_buf);
+
+ if (ret != 0 && errno != ENOENT)
+ ereport(ERROR, (errmsg("could not stat snapbuild state file %s", path)));
+ else if (ret == 0)
+ {
+ /*
+ * somebody else has already serialized to this point, don't overwrite
+ * but remember location, so we don't need to read old data again.
+ */
+ builder->last_serialized_snapshot = lsn;
+ goto out;
+ }
+
+ /*
+ * there is an obvious race condition here between the time we stat(2) the
+ * file and us writing the file. But we rename the file into place
+ * atomically and all files created need to contain the same data anyway,
+ * so this is perfectly fine, although a bit of a resource waste. Locking
+ * seems like pointless complication.
+ */
+ elog(LOG, "serializing snapshot to %s", path);
+
+ /* to make sure only we will write to this tempfile, include pid */
+ sprintf(tmppath, "pg_llog/snapshots/%X-%X.snap.%u.tmp",
+ (uint32) (lsn >> 32), (uint32) lsn, getpid());
+
+ /*
+ * unlink if file already exists, needs to have been before a crash/error
+ */
+ if (unlink(tmppath) != 0 && errno != ENOENT)
+ ereport(ERROR, (errmsg("could not unlink old file %s", path)));
+
+ ondisk = MemoryContextAllocZero(builder->context, needed_size);
+ ondisk_c = ((char *) ondisk) + sizeof(SnapBuildOnDisk);
+ ondisk->magic = SNAPBUILD_MAGIC;
+ ondisk->size = needed_size;
+
+ /* copy state per struct assignment, lalala lazy. */
+ ondisk->builder = *builder;
+
+ /* NULL-ify memory-only data */
+ ondisk->builder.context = NULL;
+ ondisk->builder.snapshot = NULL;
+ ondisk->builder.reorder = NULL;
+
+ /* copy running xacts */
+ memcpy(ondisk_c, builder->running.xip,
+ sizeof(TransactionId) * builder->running.xcnt_space);
+ ondisk_c += sizeof(TransactionId) * builder->running.xcnt_space;
+
+ /* copy committed xacts */
+ memcpy(ondisk_c, builder->committed.xip,
+ sizeof(TransactionId) * builder->committed.xcnt);
+ ondisk_c += sizeof(TransactionId) * builder->committed.xcnt;
+
+ /* we have valid data now, open tempfile and write it there */
+ fd = OpenTransientFile(tmppath,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+ if (fd < 0)
+ ereport(ERROR, (errmsg("could not open snapbuild state file %s for writing: %m", path)));
+
+ if ((write(fd, ondisk, needed_size)) != needed_size)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to snapbuild state file \"%s\": %m",
+ tmppath)));
+ }
+
+ /*
+ * fsync the file before renaming so that even if we crash after this we
+ * have either a fully valid file or nothing.
+ *
+ * XXX: Do the fsync() via checkpoints/restartpoints, doing it here has
+ * some noticeable overhead?
+ */
+ if (pg_fsync(fd) != 0)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync snapbuild state file \"%s\": %m",
+ tmppath)));
+ }
+
+ CloseTransientFile(fd);
+
+ /*
+ * We may overwrite the work from some other backend, but that's ok, our
+ * snapshot is valid as well.
+ */
+ if (rename(tmppath, path) != 0)
+ {
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not rename snapbuild state file from \"%s\" to \"%s\": %m",
+ tmppath, path)));
+ }
+
+ /* make sure we persist */
+ fsync_fname(path, false);
+ fsync_fname("pg_llog/snapshots", true);
+
+ /* remember serialization point */
+ builder->last_serialized_snapshot = lsn;
+
+out:
+ ReorderBufferSetRestartPoint(builder->reorder,
+ builder->last_serialized_snapshot);
+}
+
+/*
+ * Restore a snapshot into 'builder' if previously one has been stored at the
+ * location indicated by 'lsn'. Returns true if successfull, false otherwise.
+ */
+static bool
+SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
+{
+ SnapBuildOnDisk ondisk;
+ int fd;
+ char path[MAXPGPATH];
+ Size sz;
+
+ sprintf(path, "pg_llog/snapshots/%X-%X.snap",
+ (uint32) (lsn >> 32), (uint32) lsn);
+
+ fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
+
+ elog(LOG, "restoring snapbuild state from %s", path);
+
+ if (fd < 0 && errno == ENOENT)
+ return false;
+ else if (fd < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open snapbuild state file %s", path)));
+
+ elog(LOG, "really restoring from %s", path);
+
+ /* read statically sized portion of snapshot */
+ if (read(fd, &ondisk, sizeof(ondisk)) != sizeof(ondisk))
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read snapbuild file \"%s\": %m",
+ path)));
+ }
+
+ if (ondisk.magic != SNAPBUILD_MAGIC)
+ ereport(ERROR, (errmsg("snapbuild state file has wrong magic %u instead of %u",
+ ondisk.magic, SNAPBUILD_MAGIC)));
+
+ /* restore running xact information */
+ sz = sizeof(TransactionId) * ondisk.builder.running.xcnt_space;
+ ondisk.builder.running.xip = MemoryContextAlloc(builder->context, sz);
+ if (read(fd, ondisk.builder.running.xip, sz) != sz)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read running xacts from snapbuild file \"%s\": %m",
+ path)));
+ }
+
+ /* restore running xact information */
+ sz = sizeof(TransactionId) * ondisk.builder.committed.xcnt;
+ ondisk.builder.committed.xip = MemoryContextAlloc(builder->context, sz);
+ if (read(fd, ondisk.builder.committed.xip, sz) != sz)
+ {
+ CloseTransientFile(fd);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read committed xacts from snapbuild file \"%s\": %m",
+ path)));
+ }
+
+ CloseTransientFile(fd);
+
+ /*
+ * ok, we now have a sensible snapshot here, figure out if it has more
+ * information than we have.
+ */
+
+ /*
+ * We are only interested in consistent snapshots for now, comparing
+ * whether one imcomplete snapshot is more "advanced" seems to be
+ * unnecessarily complex.
+ */
+ if (ondisk.builder.state < SNAPBUILD_CONSISTENT)
+ goto snapshot_not_interesting;
+
+ /*
+ * Don't use a snapshot that requires an xmin that we cannot guarantee to
+ * be available.
+ */
+ if (TransactionIdPrecedes(ondisk.builder.xmin, builder->initial_xmin_horizon))
+ goto snapshot_not_interesting;
+
+ /*
+ * XXX: transactions_after needs to be updated differently, to be checked
+ * here
+ */
+
+ /* ok, we think the snapshot is sensible, copy over everything important */
+ builder->xmin = ondisk.builder.xmin;
+ builder->xmax = ondisk.builder.xmax;
+ builder->state = ondisk.builder.state;
+
+ builder->committed.xcnt = ondisk.builder.committed.xcnt;
+ /* We only allocated/stored xcnt, not xcnt_space xids ! */
+ /* don't overwrite preallocated xip, if we don't have anything here */
+ if (builder->committed.xcnt > 0)
+ {
+ pfree(builder->committed.xip);
+ builder->committed.xcnt_space = ondisk.builder.committed.xcnt;
+ builder->committed.xip = ondisk.builder.committed.xip;
+ }
+ ondisk.builder.committed.xip = NULL;
+
+ builder->running.xcnt = ondisk.builder.committed.xcnt;
+ if (builder->running.xip)
+ pfree(builder->running.xip);
+ builder->running.xcnt_space = ondisk.builder.committed.xcnt_space;
+ builder->running.xip = ondisk.builder.running.xip;
+
+ /* our snapshot is not interesting anymore, build a new one */
+ if (builder->snapshot != NULL)
+ {
+ SnapBuildSnapDecRefcount(builder->snapshot);
+ }
+ builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidTransactionId);
+ SnapBuildSnapIncRefcount(builder->snapshot);
+
+ ReorderBufferSetRestartPoint(builder->reorder, lsn);
+
+ return true;
+
+snapshot_not_interesting:
+ if (ondisk.builder.running.xip != NULL)
+ pfree(ondisk.builder.running.xip);
+ if (ondisk.builder.committed.xip != NULL)
+ pfree(ondisk.builder.committed.xip);
+ return false;
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index bce18b8..2de01f1 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -65,7 +65,7 @@ Node *replication_parse_result;
}
/* Non-keyword tokens */
-%token <str> SCONST
+%token <str> SCONST IDENT
%token <intval> ICONST
%token <recptr> RECPTR
@@ -73,6 +73,9 @@ Node *replication_parse_result;
%token K_BASE_BACKUP
%token K_IDENTIFY_SYSTEM
%token K_START_REPLICATION
+%token K_INIT_LOGICAL_REPLICATION
+%token K_START_LOGICAL_REPLICATION
+%token K_FREE_LOGICAL_REPLICATION
%token K_TIMELINE_HISTORY
%token K_LABEL
%token K_PROGRESS
@@ -82,10 +85,13 @@ Node *replication_parse_result;
%token K_TIMELINE
%type <node> command
-%type <node> base_backup start_replication identify_system timeline_history
+%type <node> base_backup start_replication start_logical_replication init_logical_replication free_logical_replication identify_system timeline_history
%type <list> base_backup_opt_list
%type <defelt> base_backup_opt
%type <intval> opt_timeline
+%type <list> plugin_options plugin_opt_list
+%type <defelt> plugin_opt_elem
+%type <node> plugin_opt_arg
%%
firstcmd: command opt_semicolon
@@ -102,6 +108,9 @@ command:
identify_system
| base_backup
| start_replication
+ | init_logical_replication
+ | start_logical_replication
+ | free_logical_replication
| timeline_history
;
@@ -186,6 +195,67 @@ opt_timeline:
| /* nothing */ { $$ = 0; }
;
+init_logical_replication:
+ K_INIT_LOGICAL_REPLICATION IDENT IDENT
+ {
+ InitLogicalReplicationCmd *cmd;
+ cmd = makeNode(InitLogicalReplicationCmd);
+ cmd->name = $2;
+ cmd->plugin = $3;
+ $$ = (Node *) cmd;
+ }
+ ;
+
+start_logical_replication:
+ K_START_LOGICAL_REPLICATION IDENT RECPTR plugin_options
+ {
+ StartLogicalReplicationCmd *cmd;
+ cmd = makeNode(StartLogicalReplicationCmd);
+ cmd->name = $2;
+ cmd->startpoint = $3;
+ cmd->options = $4;
+ $$ = (Node *) cmd;
+ }
+ ;
+
+plugin_options:
+ '(' plugin_opt_list ')' { $$ = $2; }
+ | /* EMPTY */ { $$ = NIL; }
+ ;
+
+plugin_opt_list:
+ plugin_opt_elem
+ {
+ $$ = list_make1($1);
+ }
+ | plugin_opt_list ',' plugin_opt_elem
+ {
+ $$ = lappend($1, $3);
+ }
+ ;
+
+plugin_opt_elem:
+ IDENT plugin_opt_arg
+ {
+ $$ = makeDefElem($1, $2);
+ }
+ ;
+
+plugin_opt_arg:
+ SCONST { $$ = (Node *) makeString($1); }
+ | /* EMPTY */ { $$ = NULL; }
+ ;
+
+free_logical_replication:
+ K_FREE_LOGICAL_REPLICATION IDENT
+ {
+ FreeLogicalReplicationCmd *cmd;
+ cmd = makeNode(FreeLogicalReplicationCmd);
+ cmd->name = $2;
+ $$ = (Node *) cmd;
+ }
+ ;
+
/*
* TIMELINE_HISTORY %d
*/
@@ -205,6 +275,7 @@ timeline_history:
$$ = (Node *) cmd;
}
;
+
%%
#include "repl_scanner.c"
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index b4743e6..1044bd0 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "utils/builtins.h"
+#include "parser/scansup.h"
/* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
#undef fprintf
@@ -48,7 +49,7 @@ static void addlitchar(unsigned char ychar);
%option warn
%option prefix="replication_yy"
-%x xq
+%x xq xd
/* Extended quote
* xqdouble implements embedded quote, ''''
@@ -57,12 +58,26 @@ xqstart {quote}
xqdouble {quote}{quote}
xqinside [^']+
+/* Double quote
+ * Allows embedded spaces and other special characters into identifiers.
+ */
+dquote \"
+xdstart {dquote}
+xdstop {dquote}
+xddouble {dquote}{dquote}
+xdinside [^"]+
+
digit [0-9]+
hexdigit [0-9A-Za-z]+
quote '
quotestop {quote}
+ident_start [A-Za-z\200-\377_]
+ident_cont [A-Za-z\200-\377_0-9\$]
+
+identifier {ident_start}{ident_cont}*
+
%%
BASE_BACKUP { return K_BASE_BACKUP; }
@@ -74,9 +89,14 @@ PROGRESS { return K_PROGRESS; }
WAL { return K_WAL; }
TIMELINE { return K_TIMELINE; }
START_REPLICATION { return K_START_REPLICATION; }
+INIT_LOGICAL_REPLICATION { return K_INIT_LOGICAL_REPLICATION; }
+START_LOGICAL_REPLICATION { return K_START_LOGICAL_REPLICATION; }
+FREE_LOGICAL_REPLICATION { return K_FREE_LOGICAL_REPLICATION; }
TIMELINE_HISTORY { return K_TIMELINE_HISTORY; }
"," { return ','; }
";" { return ';'; }
+"(" { return '('; }
+")" { return ')'; }
[\n] ;
[\t] ;
@@ -100,20 +120,49 @@ TIMELINE_HISTORY { return K_TIMELINE_HISTORY; }
BEGIN(xq);
startlit();
}
+
<xq>{quotestop} {
yyless(1);
BEGIN(INITIAL);
yylval.str = litbufdup();
return SCONST;
}
-<xq>{xqdouble} {
+
+<xq>{xqdouble} {
addlitchar('\'');
}
+
<xq>{xqinside} {
addlit(yytext, yyleng);
}
-<xq><<EOF>> { yyerror("unterminated quoted string"); }
+{xdstart} {
+ BEGIN(xd);
+ startlit();
+ }
+
+<xd>{xdstop} {
+ int len;
+ yyless(1);
+ BEGIN(INITIAL);
+ yylval.str = litbufdup();
+ len = strlen(yylval.str);
+ truncate_identifier(yylval.str, len, true);
+ return IDENT;
+ }
+
+<xd>{xdinside} {
+ addlit(yytext, yyleng);
+ }
+
+{identifier} {
+ int len = strlen(yytext);
+
+ yylval.str = downcase_truncate_identifier(yytext, len, true);
+ return IDENT;
+ }
+
+<xq,xd><<EOF>> { yyerror("unterminated quoted string"); }
<<EOF>> {
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 4c74d1b..3cbad64 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1137,7 +1137,7 @@ XLogWalRcvSendHSFeedback(bool immed)
* everything else has been checked.
*/
if (hot_standby_feedback)
- xmin = GetOldestXmin(true, false, false);
+ xmin = GetOldestXmin(true, true, false, false);
else
xmin = InvalidTransactionId;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index a421ec5..723d5f8 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -53,6 +53,10 @@
#include "miscadmin.h"
#include "nodes/replnodes.h"
#include "replication/basebackup.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+#include "replication/snapbuild.h"
#include "replication/syncrep.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
@@ -153,6 +157,9 @@ static bool ping_sent = false;
static bool streamingDoneSending;
static bool streamingDoneReceiving;
+/* Are we there yet? */
+static bool WalSndCaughtUp = false;
+
/* Flags set by signal handlers for later service in main loop */
static volatile sig_atomic_t got_SIGHUP = false;
static volatile sig_atomic_t walsender_ready_to_stop = false;
@@ -165,24 +172,42 @@ static volatile sig_atomic_t walsender_ready_to_stop = false;
*/
static volatile sig_atomic_t replication_active = false;
+/* XXX reader */
+static MemoryContext decoding_ctx = NULL;
+static MemoryContext old_decoding_ctx = NULL;
+
+static LogicalDecodingContext *logical_decoding_ctx = NULL;
+static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
+
/* Signal handlers */
static void WalSndSigHupHandler(SIGNAL_ARGS);
static void WalSndXLogSendHandler(SIGNAL_ARGS);
static void WalSndLastCycleHandler(SIGNAL_ARGS);
/* Prototypes for private functions */
-static void WalSndLoop(void);
+typedef void (*WalSndSendData)(void);
+static void WalSndLoop(WalSndSendData send_data);
static void InitWalSenderSlot(void);
static void WalSndKill(int code, Datum arg);
-static void XLogSend(bool *caughtup);
+static void XLogSendPhysical(void);
+static void XLogSendLogical(void);
+static void WalSndDone(WalSndSendData send_data);
static XLogRecPtr GetStandbyFlushRecPtr(void);
static void IdentifySystem(void);
static void StartReplication(StartReplicationCmd *cmd);
+static void InitLogicalReplication(InitLogicalReplicationCmd *cmd);
+static void StartLogicalReplication(StartLogicalReplicationCmd *cmd);
+static void FreeLogicalReplication(FreeLogicalReplicationCmd *cmd);
static void ProcessStandbyMessage(void);
static void ProcessStandbyReplyMessage(void);
static void ProcessStandbyHSFeedbackMessage(void);
static void ProcessRepliesIfAny(void);
static void WalSndKeepalive(bool requestReply);
+static void WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
+static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
+static void XLogRead(char *buf, XLogRecPtr startptr, Size count);
+
+
/* Initialize walsender process before entering the main command loop */
@@ -269,8 +294,6 @@ IdentifySystem(void)
if (MyDatabaseId != InvalidOid)
dbname = get_database_name(MyDatabaseId);
- else
- dbname = "(none)";
/* Send a RowDescription message */
pq_beginmessage(&buf, 'T');
@@ -295,22 +318,22 @@ IdentifySystem(void)
pq_sendint(&buf, 0, 2); /* format code */
/* third field */
- pq_sendstring(&buf, "xlogpos");
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
- pq_sendint(&buf, TEXTOID, 4);
- pq_sendint(&buf, -1, 2);
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
+ pq_sendstring(&buf, "xlogpos"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
/* fourth field */
- pq_sendstring(&buf, "dbname");
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
- pq_sendint(&buf, TEXTOID, 4);
- pq_sendint(&buf, -1, 2);
- pq_sendint(&buf, 0, 4);
- pq_sendint(&buf, 0, 2);
+ pq_sendstring(&buf, "dbname"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
pq_endmessage(&buf);
/* Send a DataRow message */
@@ -322,9 +345,16 @@ IdentifySystem(void)
pq_sendbytes(&buf, (char *) tli, strlen(tli));
pq_sendint(&buf, strlen(xpos), 4); /* col3 len */
pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
- pq_sendint(&buf, strlen(dbname), 4); /* col4 len */
- pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
-
+ /* send NULL if not connected to a database */
+ if (dbname)
+ {
+ pq_sendint(&buf, strlen(dbname), 4); /* col4 len */
+ pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
+ }
+ else
+ {
+ pq_sendint(&buf, -1, 4); /* col4 len */
+ }
pq_endmessage(&buf);
}
@@ -573,7 +603,7 @@ StartReplication(StartReplicationCmd *cmd)
/* Main loop of walsender */
replication_active = true;
- WalSndLoop();
+ WalSndLoop(XLogSendPhysical);
replication_active = false;
if (walsender_ready_to_stop)
@@ -640,6 +670,498 @@ StartReplication(StartReplicationCmd *cmd)
pq_puttextmessage('C', "START_STREAMING");
}
+static int
+replay_read_page(XLogReaderState* state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetRecPtr, char* cur_page, TimeLineID *pageTLI)
+{
+ XLogRecPtr flushptr;
+ int count;
+
+ flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
+
+ /* more than one block available */
+ if (targetPagePtr + XLOG_BLCKSZ <= flushptr)
+ count = XLOG_BLCKSZ;
+ /* not enough data there */
+ else if (targetPagePtr + reqLen > flushptr)
+ return -1;
+ /* part of the page available */
+ else
+ count = flushptr - targetPagePtr;
+
+ /* FIXME: more sensible/efficient implementation */
+ XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+
+ return count;
+}
+
+/*
+ * Initialize logical replication and wait for an initial consistent point to
+ * start sending changes from.
+ */
+static void
+InitLogicalReplication(InitLogicalReplicationCmd *cmd)
+{
+ const char *slot_name;
+ StringInfoData buf;
+ char xpos[MAXFNAMELEN];
+ const char *snapshot_name = NULL;
+ LogicalDecodingContext *ctx;
+ XLogRecPtr startptr;
+
+ CheckLogicalReplicationRequirements();
+
+ Assert(!MyLogicalDecodingSlot);
+
+ /* XXX apply sanity checking to slot name? */
+ LogicalDecodingAcquireFreeSlot(cmd->name, cmd->plugin);
+
+ Assert(MyLogicalDecodingSlot);
+
+ decoding_ctx = AllocSetContextCreate(TopMemoryContext,
+ "decoding context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ old_decoding_ctx = MemoryContextSwitchTo(decoding_ctx);
+ /* XXX pointless? */
+ TopTransactionContext = decoding_ctx;
+
+ /* setup state for XLogReadPage */
+ sendTimeLineIsHistoric = false;
+ sendTimeLine = ThisTimeLineID;
+
+ initStringInfo(&output_message);
+ ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false, InvalidXLogRecPtr,
+ NIL, replay_read_page,
+ WalSndPrepareWrite, WalSndWriteData);
+
+ MemoryContextSwitchTo(old_decoding_ctx);
+ TopTransactionContext = NULL;
+
+ startptr = MyLogicalDecodingSlot->restart_decoding;
+
+ elog(WARNING, "Initiating logical rep from %X/%X",
+ (uint32)(startptr >> 32), (uint32)startptr);
+
+ for (;;)
+ {
+ XLogRecord *record;
+ XLogRecordBuffer buf;
+ char *err = NULL;
+
+ /* the read_page callback waits for new WAL */
+ record = XLogReadRecord(ctx->reader, startptr, &err);
+ /* xlog record was invalid */
+ if (err)
+ elog(ERROR, "%s", err);
+
+ /* read up from last position next time round */
+ startptr = InvalidXLogRecPtr;
+
+ Assert(record);
+
+ buf.origptr = ctx->reader->ReadRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+ DecodeRecordIntoReorderBuffer(ctx, &buf);
+
+ /* only continue till we found a consistent spot */
+ if (LogicalDecodingContextReady(ctx))
+ {
+ /* export plain, importable, snapshot to the user */
+ snapshot_name = SnapBuildExportSnapshot(ctx->snapshot_builder);
+ break;
+ }
+ }
+
+ MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+ slot_name = NameStr(MyLogicalDecodingSlot->name);
+ snprintf(xpos, sizeof(xpos), "%X/%X",
+ (uint32) (MyLogicalDecodingSlot->confirmed_flush >> 32),
+ (uint32) MyLogicalDecodingSlot->confirmed_flush);
+
+ pq_beginmessage(&buf, 'T');
+ pq_sendint(&buf, 4, 2); /* 4 fields */
+
+ /* first field */
+ pq_sendstring(&buf, "replication_id"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_sendstring(&buf, "consistent_point"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_sendstring(&buf, "snapshot_name"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_sendstring(&buf, "plugin"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_endmessage(&buf);
+
+ /* Send a DataRow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint(&buf, 4, 2); /* # of columns */
+
+ /* replication_id */
+ pq_sendint(&buf, strlen(slot_name), 4); /* col1 len */
+ pq_sendbytes(&buf, slot_name, strlen(slot_name));
+
+ /* consistent wal location */
+ pq_sendint(&buf, strlen(xpos), 4); /* col2 len */
+ pq_sendbytes(&buf, xpos, strlen(xpos));
+
+ /* snapshot name */
+ pq_sendint(&buf, strlen(snapshot_name), 4); /* col3 len */
+ pq_sendbytes(&buf, snapshot_name, strlen(snapshot_name));
+
+ /* plugin */
+ pq_sendint(&buf, strlen(cmd->plugin), 4); /* col4 len */
+ pq_sendbytes(&buf, cmd->plugin, strlen(cmd->plugin));
+
+ pq_endmessage(&buf);
+
+ /*
+ * release active status again, START_LOGICAL_REPLICATION will reacquire it
+ */
+ LogicalDecodingReleaseSlot();
+}
+
+/*
+ * Load previously initiated logical slot and prepare for sending data (via
+ * WalSndLoop).
+ */
+static void
+StartLogicalReplication(StartLogicalReplicationCmd *cmd)
+{
+ StringInfoData buf;
+ XLogRecPtr confirmed_flush;
+
+ elog(WARNING, "Starting logical replication");
+
+ /* make sure that our requirements are still fulfilled */
+ CheckLogicalReplicationRequirements();
+
+ Assert(!MyLogicalDecodingSlot);
+
+ LogicalDecodingReAcquireSlot(cmd->name);
+
+ if (am_cascading_walsender && !RecoveryInProgress())
+ {
+ ereport(LOG,
+ (errmsg("terminating walsender process to force cascaded standby to update timeline and reconnect")));
+ walsender_ready_to_stop = true;
+ }
+
+ WalSndSetState(WALSNDSTATE_CATCHUP);
+
+ /* Send a CopyBothResponse message, and start streaming */
+ pq_beginmessage(&buf, 'W');
+ pq_sendbyte(&buf, 0);
+ pq_sendint(&buf, 0, 2);
+ pq_endmessage(&buf);
+ pq_flush();
+
+ /* setup state for XLogReadPage */
+ sendTimeLineIsHistoric = false;
+ sendTimeLine = ThisTimeLineID;
+
+ confirmed_flush = MyLogicalDecodingSlot->confirmed_flush;
+
+ Assert(confirmed_flush != InvalidXLogRecPtr);
+
+ /* continue from last position */
+ if (cmd->startpoint == InvalidXLogRecPtr)
+ cmd->startpoint = MyLogicalDecodingSlot->confirmed_flush;
+ else if (cmd->startpoint > MyLogicalDecodingSlot->confirmed_flush)
+ elog(ERROR, "cannot stream from %X/%X, minimum is %X/%X",
+ (uint32)(cmd->startpoint >> 32), (uint32)cmd->startpoint,
+ (uint32)(confirmed_flush >> 32), (uint32)confirmed_flush);
+
+ /*
+ * Initialize position to the last ack'ed one, then the xlog records begin
+ * to be shipped from that position.
+ */
+ logical_decoding_ctx = CreateLogicalDecodingContext(
+ MyLogicalDecodingSlot, false, cmd->startpoint, cmd->options,
+ replay_read_page, WalSndPrepareWrite, WalSndWriteData);
+
+ /*
+ * XXX: For feedback purposes it would be nicer to set sentPtr to
+ * cmd->startpoint, but we use it to know where to read xlog in the main
+ * loop...
+ */
+ sentPtr = MyLogicalDecodingSlot->restart_decoding;
+ logical_startptr = sentPtr;
+
+ /* Also update the start position status in shared memory */
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = MyWalSnd;
+
+ SpinLockAcquire(&walsnd->mutex);
+ walsnd->sentPtr = MyLogicalDecodingSlot->restart_decoding;
+ SpinLockRelease(&walsnd->mutex);
+ }
+
+ elog(LOG, "starting to decode from %X/%X, replay %X/%X",
+ (uint32)(MyWalSnd->sentPtr >> 32), (uint32)MyWalSnd->sentPtr,
+ (uint32)(cmd->startpoint >> 32), (uint32)cmd->startpoint);
+
+ replication_active = true;
+
+ SyncRepInitConfig();
+
+ /* Main loop of walsender */
+ WalSndLoop(XLogSendLogical);
+
+ LogicalDecodingReleaseSlot();
+
+ replication_active = false;
+ if (walsender_ready_to_stop)
+ proc_exit(0);
+ WalSndSetState(WALSNDSTATE_STARTUP);
+
+ /* Get out of COPY mode (CommandComplete). */
+ EndCommand("COPY 0", DestRemote);
+}
+
+/*
+ * Free permanent state by a now inactive but defined logical slot.
+ */
+static void
+FreeLogicalReplication(FreeLogicalReplicationCmd *cmd)
+{
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingFreeSlot(cmd->name);
+ EndCommand("FREE_LOGICAL_REPLICATION", DestRemote);
+}
+
+/*
+ * LogicalDecodingContext 'prepare_write' callback.
+ *
+ * Prepare a write into a StringInfo.
+ *
+ * Don't do anything lasting in here, it's quite possible that nothing will done
+ * with the data.
+ */
+static void
+WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ AssertVariableIsOfType(&WalSndPrepareWrite, LogicalOutputPluginWriterPrepareWrite);
+
+ resetStringInfo(ctx->out);
+
+ pq_sendbyte(ctx->out, 'w');
+ pq_sendint64(ctx->out, lsn); /* dataStart */
+ /* XXX: overwrite when data is assembled */
+ pq_sendint64(ctx->out, lsn); /* walEnd */
+ /* XXX: gather that value later just as it's done in XLogSendPhysical */
+ pq_sendint64(ctx->out, 0 /*GetCurrentIntegerTimestamp() */);/* sendtime */
+}
+
+/*
+ * LogicalDecodingContext 'write' callback.
+ *
+ * Actually write out data previously prepared by WalSndPrepareWrite out to the
+ * network, take as long as needed but process replies from the other side
+ * during that.
+ */
+static void
+WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ AssertVariableIsOfType(&WalSndWriteData, LogicalOutputPluginWriterWrite);
+
+ /* output previously gathered data in a CopyData packet */
+ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
+
+ /* fast path */
+ /* Try to flush pending output to the client */
+ if (pq_flush_if_writable() != 0)
+ return;
+
+ if (!pq_is_send_pending())
+ return;
+
+ for (;;)
+ {
+ int wakeEvents;
+ long sleeptime = 10000; /* 10s */
+
+ /*
+ * Emergency bailout if postmaster has died. This is to avoid the
+ * necessity for manual cleanup of all postmaster children.
+ */
+ if (!PostmasterIsAlive())
+ exit(1);
+
+ /* Process any requests or signals received recently */
+ if (got_SIGHUP)
+ {
+ got_SIGHUP = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ SyncRepInitConfig();
+ }
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Check for input from the client */
+ ProcessRepliesIfAny();
+
+ /* Clear any already-pending wakeups */
+ ResetLatch(&MyWalSnd->latch);
+
+ /* Try to flush pending output to the client */
+ if (pq_flush_if_writable() != 0)
+ break;
+
+ /* If we finished clearing the buffered data, we're done here. */
+ if (!pq_is_send_pending())
+ break;
+
+ /*
+ * Note we don't set a timeout here. It would be pointless, because
+ * if the socket is not writable there's not much we can do elsewhere
+ * anyway.
+ */
+ wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+ WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
+
+ ImmediateInterruptOK = true;
+ CHECK_FOR_INTERRUPTS();
+ WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
+ MyProcPort->sock, sleeptime);
+ ImmediateInterruptOK = false;
+ }
+
+ /* reactivate latch so WalSndLoop knows to continue */
+ SetLatch(&MyWalSnd->latch);
+}
+
+/*
+ * Wait till WAL < loc is flushed to disk so it can be safely read.
+ */
+XLogRecPtr
+WalSndWaitForWal(XLogRecPtr loc)
+{
+ int wakeEvents;
+ XLogRecPtr flushptr;
+
+ /* fast path if everything is there already */
+ /*
+ * XXX: introduce RecentFlushPtr to avoid acquiring the spinlock in the
+ * fast path case where we already know we have enough WAL available.
+ */
+ flushptr = GetFlushRecPtr();
+ if (loc <= flushptr)
+ return flushptr;
+
+ for (;;)
+ {
+ long sleeptime = 10000; /* 10 s */
+
+ /*
+ * Emergency bailout if postmaster has died. This is to avoid the
+ * necessity for manual cleanup of all postmaster children.
+ */
+ if (!PostmasterIsAlive())
+ exit(1);
+
+ /* Process any requests or signals received recently */
+ if (got_SIGHUP)
+ {
+ got_SIGHUP = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ SyncRepInitConfig();
+ }
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Check for input from the client */
+ ProcessRepliesIfAny();
+
+ /* Clear any already-pending wakeups */
+ ResetLatch(&MyWalSnd->latch);
+
+ /* Update our idea of flushed position. */
+ flushptr = GetFlushRecPtr();
+
+ /* If postmaster asked us to stop, don't wait here anymore */
+ if (walsender_ready_to_stop)
+ break;
+
+ /* check whether we're done */
+ if (loc <= flushptr)
+ break;
+
+ /* Determine time until replication timeout */
+ if (wal_sender_timeout > 0)
+ {
+ if (!ping_sent)
+ {
+ TimestampTz timeout;
+
+ /*
+ * If half of wal_sender_timeout has lapsed without receiving
+ * any reply from standby, send a keep-alive message to standby
+ * requesting an immediate reply.
+ */
+ timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
+ wal_sender_timeout / 2);
+ if (GetCurrentTimestamp() >= timeout)
+ {
+ WalSndKeepalive(true);
+ ping_sent = true;
+ /* Try to flush pending output to the client */
+ if (pq_flush_if_writable() != 0)
+ break;
+ }
+ }
+
+ sleeptime = 1 + (wal_sender_timeout / 10);
+ }
+
+ wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+ WL_SOCKET_READABLE | WL_TIMEOUT;
+
+ ImmediateInterruptOK = true;
+ CHECK_FOR_INTERRUPTS();
+ WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
+ MyProcPort->sock, sleeptime);
+ ImmediateInterruptOK = false;
+
+ /*
+ * The equivalent code in WalSndLoop checks here that replication
+ * timeout hasn't been exceeded. We don't do that here. XXX explain
+ * why.
+ */
+ }
+
+ /* reactivate latch so WalSndLoop knows to continue */
+ SetLatch(&MyWalSnd->latch);
+ return flushptr;
+}
+
/*
* Execute an incoming replication command.
*/
@@ -651,6 +1173,12 @@ exec_replication_command(const char *cmd_string)
MemoryContext cmd_context;
MemoryContext old_context;
+ /*
+ * INIT_LOGICAL_REPLICATION exports a snapshot until the next command
+ * arrives. Clean up the old stuff if there's anything.
+ */
+ SnapBuildClearExportedSnapshot();
+
elog(DEBUG1, "received replication command: %s", cmd_string);
CHECK_FOR_INTERRUPTS();
@@ -682,6 +1210,18 @@ exec_replication_command(const char *cmd_string)
StartReplication((StartReplicationCmd *) cmd_node);
break;
+ case T_InitLogicalReplicationCmd:
+ InitLogicalReplication((InitLogicalReplicationCmd *) cmd_node);
+ break;
+
+ case T_StartLogicalReplicationCmd:
+ StartLogicalReplication((StartLogicalReplicationCmd *) cmd_node);
+ break;
+
+ case T_FreeLogicalReplicationCmd:
+ FreeLogicalReplication((FreeLogicalReplicationCmd *) cmd_node);
+ break;
+
case T_BaseBackupCmd:
SendBaseBackup((BaseBackupCmd *) cmd_node);
break;
@@ -891,6 +1431,12 @@ ProcessStandbyReplyMessage(void)
SpinLockRelease(&walsnd->mutex);
}
+ /*
+ * Advance our local xmin horizon when the client confirmed a flush.
+ */
+ if (MyLogicalDecodingSlot && flushPtr != InvalidXLogRecPtr)
+ LogicalConfirmReceivedLocation(flushPtr);
+
if (!am_cascading_walsender)
SyncRepReleaseWaiters();
}
@@ -975,10 +1521,8 @@ ProcessStandbyHSFeedbackMessage(void)
/* Main loop of walsender process that streams the WAL over Copy messages. */
static void
-WalSndLoop(void)
+WalSndLoop(WalSndSendData send_data)
{
- bool caughtup = false;
-
/*
* Allocate buffers that will be used for each outgoing and incoming
* message. We do this just once to reduce palloc overhead.
@@ -1030,21 +1574,21 @@ WalSndLoop(void)
/*
* If we don't have any pending data in the output buffer, try to send
- * some more. If there is some, we don't bother to call XLogSend
+ * some more. If there is some, we don't bother to call send_data
* again until we've flushed it ... but we'd better assume we are not
* caught up.
*/
if (!pq_is_send_pending())
- XLogSend(&caughtup);
+ send_data();
else
- caughtup = false;
+ WalSndCaughtUp = false;
/* Try to flush pending output to the client */
if (pq_flush_if_writable() != 0)
goto send_failure;
/* If nothing remains to be sent right now ... */
- if (caughtup && !pq_is_send_pending())
+ if (WalSndCaughtUp && !pq_is_send_pending())
{
/*
* If we're in catchup state, move to streaming. This is an
@@ -1069,28 +1613,17 @@ WalSndLoop(void)
* the walsender is not sure which.
*/
if (walsender_ready_to_stop)
- {
- /* ... let's just be real sure we're caught up ... */
- XLogSend(&caughtup);
- if (caughtup && !pq_is_send_pending())
- {
- /* Inform the standby that XLOG streaming is done */
- EndCommand("COPY 0", DestRemote);
- pq_flush();
-
- proc_exit(0);
- }
- }
+ WalSndDone(send_data);
}
/*
* We don't block if not caught up, unless there is unsent data
* pending in which case we'd better block until the socket is
- * write-ready. This test is only needed for the case where XLogSend
+ * write-ready. This test is only needed for the case where send_data
* loaded a subset of the available data but then pq_flush_if_writable
* flushed it all --- we should immediately try to send more.
*/
- if ((caughtup && !streamingDoneSending) || pq_is_send_pending())
+ if ((WalSndCaughtUp && !streamingDoneSending) || pq_is_send_pending())
{
TimestampTz timeout = 0;
long sleeptime = 10000; /* 10 s */
@@ -1419,15 +1952,17 @@ retry:
}
/*
+ * Send out the WAL in its normal physical/stored form.
+ *
* Read up to MAX_SEND_SIZE bytes of WAL that's been flushed to disk,
* but not yet sent to the client, and buffer it in the libpq output
* buffer.
*
- * If there is no unsent WAL remaining, *caughtup is set to true, otherwise
- * *caughtup is set to false.
+ * If there is no unsent WAL remaining, WalSndCaughtUp is set to true,
+ * otherwise WalSndCaughtUp is set to false.
*/
static void
-XLogSend(bool *caughtup)
+XLogSendPhysical(void)
{
XLogRecPtr SendRqstPtr;
XLogRecPtr startptr;
@@ -1436,7 +1971,7 @@ XLogSend(bool *caughtup)
if (streamingDoneSending)
{
- *caughtup = true;
+ WalSndCaughtUp = true;
return;
}
@@ -1553,7 +2088,7 @@ XLogSend(bool *caughtup)
pq_putmessage_noblock('c', NULL, 0);
streamingDoneSending = true;
- *caughtup = true;
+ WalSndCaughtUp = true;
elog(DEBUG1, "walsender reached end of timeline at %X/%X (sent up to %X/%X)",
(uint32) (sendTimeLineValidUpto >> 32), (uint32) sendTimeLineValidUpto,
@@ -1565,7 +2100,7 @@ XLogSend(bool *caughtup)
Assert(sentPtr <= SendRqstPtr);
if (SendRqstPtr <= sentPtr)
{
- *caughtup = true;
+ WalSndCaughtUp = true;
return;
}
@@ -1589,15 +2124,15 @@ XLogSend(bool *caughtup)
{
endptr = SendRqstPtr;
if (sendTimeLineIsHistoric)
- *caughtup = false;
+ WalSndCaughtUp = false;
else
- *caughtup = true;
+ WalSndCaughtUp = true;
}
else
{
/* round down to page boundary. */
endptr -= (endptr % XLOG_BLCKSZ);
- *caughtup = false;
+ WalSndCaughtUp = false;
}
nbytes = endptr - startptr;
@@ -1658,6 +2193,96 @@ XLogSend(bool *caughtup)
}
/*
+ * Send out the WAL after it being decoded into a logical format by the output
+ * plugin specified in INIT_LOGICAL_DECODING
+ */
+static void
+XLogSendLogical(void)
+{
+ XLogRecord *record;
+ char *errm;
+
+ if (decoding_ctx == NULL)
+ {
+ decoding_ctx = AllocSetContextCreate(TopMemoryContext,
+ "decoding context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ }
+
+ record = XLogReadRecord(logical_decoding_ctx->reader, logical_startptr, &errm);
+ logical_startptr = InvalidXLogRecPtr;
+
+ /* xlog record was invalid */
+ if (errm != NULL)
+ elog(ERROR, "%s", errm);
+
+ if (record != NULL)
+ {
+ XLogRecordBuffer buf;
+
+ buf.origptr = logical_decoding_ctx->reader->ReadRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+
+ old_decoding_ctx = MemoryContextSwitchTo(decoding_ctx);
+ TopTransactionContext = decoding_ctx;
+
+ DecodeRecordIntoReorderBuffer(logical_decoding_ctx, &buf);
+
+ MemoryContextSwitchTo(old_decoding_ctx);
+ TopTransactionContext = NULL;
+
+ /*
+ * If the record we just read is at or beyond the flushed point, then
+ * we're caught up.
+ */
+ WalSndCaughtUp =
+ logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr();
+ }
+ else
+ /*
+ * xlogreader failed, and no error was reported? we must be caught up.
+ */
+ WalSndCaughtUp = true;
+
+ /* Update shared memory status */
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = MyWalSnd;
+
+ SpinLockAcquire(&walsnd->mutex);
+ walsnd->sentPtr = logical_decoding_ctx->reader->ReadRecPtr;
+ SpinLockRelease(&walsnd->mutex);
+ }
+}
+
+/*
+ * The sender is caught up, so we can go away for shutdown processing
+ * to finish normally. (This should only be called when the shutdown
+ * signal has been received from postmaster.)
+ *
+ * Note that if while doing this we determine that there's still more
+ * data to send, this function will return control to the caller.
+ */
+static void
+WalSndDone(WalSndSendData send_data)
+{
+ /* ... let's just be real sure we're caught up ... */
+ send_data();
+
+ if (WalSndCaughtUp && !pq_is_send_pending())
+ {
+ /* Inform the standby that XLOG streaming is done */
+ EndCommand("COPY 0", DestRemote);
+ pq_flush();
+
+ proc_exit(0);
+ }
+}
+
+/*
* Returns the latest point in WAL that has been safely flushed to disk, and
* can be sent to the standby. This should only be called when in recovery,
* ie. we're streaming to a cascaded standby.
@@ -2124,7 +2749,8 @@ wait_for_remote_lsn(int32 pid, XLogRecPtr ptr, bool wait_for_apply)
int i;
bool done;
- do {
+ do
+ {
done = true;
for (i = 0; i < max_wal_senders; i++)
@@ -2135,7 +2761,9 @@ wait_for_remote_lsn(int32 pid, XLogRecPtr ptr, bool wait_for_apply)
if (walsnd->pid != 0 && (pid == 0 || pid == walsnd->pid))
{
- XLogRecPtr rptr = wait_for_apply ? walsnd->apply : walsnd->flush;
+ XLogRecPtr rptr;
+
+ rptr = wait_for_apply ? walsnd->apply : walsnd->flush;
if (rptr < ptr)
done = false;
}
@@ -2147,7 +2775,7 @@ wait_for_remote_lsn(int32 pid, XLogRecPtr ptr, bool wait_for_apply)
}
if (!done)
- pg_usleep(10*1000);
+ pg_usleep(10 * 1000);
}
while (!done);
}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b34ba44..4fcbd4a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -26,6 +26,7 @@
#include "postmaster/autovacuum.h"
#include "postmaster/bgwriter.h"
#include "postmaster/postmaster.h"
+#include "replication/logical.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "storage/bufmgr.h"
@@ -122,6 +123,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
size = add_size(size, ProcSignalShmemSize());
size = add_size(size, CheckpointerShmemSize());
size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, LogicalDecodingShmemSize());
size = add_size(size, WalSndShmemSize());
size = add_size(size, WalRcvShmemSize());
size = add_size(size, BTreeShmemSize());
@@ -227,6 +229,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
ProcSignalShmemInit();
CheckpointerShmemInit();
AutoVacuumShmemInit();
+ LogicalDecodingShmemInit();
WalSndShmemInit();
WalRcvShmemInit();
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 993efac..4c6c1ed 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -51,6 +51,9 @@
#include "access/xact.h"
#include "access/twophase.h"
#include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/walsender.h"
+#include "replication/walsender_private.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/spin.h"
@@ -1100,11 +1103,12 @@ TransactionIdIsActive(TransactionId xid)
* GetOldestXmin() move backwards, with no consequences for data integrity.
*/
TransactionId
-GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked)
+GetOldestXmin(bool allDbs, bool ignoreVacuum, bool systable, bool alreadyLocked)
{
ProcArrayStruct *arrayP = procArray;
TransactionId result;
int index;
+ volatile TransactionId logical_xmin = InvalidTransactionId;
/* Cannot look for individual databases during recovery */
Assert(allDbs || !RecoveryInProgress());
@@ -1157,6 +1161,10 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked)
}
}
+ /* fetch into volatile var while ProcArrayLock is held */
+ if (max_logical_slots > 0)
+ logical_xmin = LogicalDecodingCtl->xmin;
+
if (RecoveryInProgress())
{
/*
@@ -1196,6 +1204,15 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked)
result = FirstNormalTransactionId;
}
+ /*
+ * after locks are released and defer_cleanup_age has been applied, check
+ * whether we need to back up further to make logical decoding possible.
+ */
+ if (systable &&
+ TransactionIdIsValid(logical_xmin) &&
+ NormalTransactionIdPrecedes(logical_xmin, result))
+ result = logical_xmin;
+
return result;
}
@@ -1250,6 +1267,8 @@ GetMaxSnapshotSubxidCount(void)
* RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
* running transactions, except those running LAZY VACUUM). This is
* the same computation done by GetOldestXmin(true, true, ...).
+ * RecentGlobalDataXmin: the global xmin for non-catalog tables
+ * >= RecentGlobalXmin
*
* Note: this function should probably not be called with an argument that's
* not statically allocated (see xip allocation below).
@@ -1265,6 +1284,7 @@ GetSnapshotData(Snapshot snapshot)
int count = 0;
int subcount = 0;
bool suboverflowed = false;
+ volatile TransactionId logical_xmin = InvalidTransactionId;
Assert(snapshot != NULL);
@@ -1442,8 +1462,14 @@ GetSnapshotData(Snapshot snapshot)
suboverflowed = true;
}
+
+ /* fetch into volatile var while ProcArrayLock is held */
+ if (max_logical_slots > 0)
+ logical_xmin = LogicalDecodingCtl->xmin;
+
if (!TransactionIdIsValid(MyPgXact->xmin))
MyPgXact->xmin = TransactionXmin = xmin;
+
LWLockRelease(ProcArrayLock);
/*
@@ -1458,6 +1484,17 @@ GetSnapshotData(Snapshot snapshot)
RecentGlobalXmin = globalxmin - vacuum_defer_cleanup_age;
if (!TransactionIdIsNormal(RecentGlobalXmin))
RecentGlobalXmin = FirstNormalTransactionId;
+
+ /* Non-catalog tables can be vacuumed if older than this xid */
+ RecentGlobalDataXmin = RecentGlobalXmin;
+
+ /*
+ * peg the global xmin to the one required for logical decoding if required
+ */
+ if (TransactionIdIsNormal(logical_xmin) &&
+ NormalTransactionIdPrecedes(logical_xmin, RecentGlobalXmin))
+ RecentGlobalXmin = logical_xmin;
+
RecentXmin = xmin;
snapshot->xmin = xmin;
@@ -1558,9 +1595,11 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
* Similar to GetSnapshotData but returns more information. We include
* all PGXACTs with an assigned TransactionId, even VACUUM processes.
*
- * We acquire XidGenLock, but the caller is responsible for releasing it.
- * This ensures that no new XIDs enter the proc array until the caller has
- * WAL-logged this snapshot, and releases the lock.
+ * We acquire XidGenLock and ProcArrayLock, but the caller is responsible for
+ * releasing them. Acquiring XidGenLock ensures that no new XIDs enter the proc
+ * array until the caller has WAL-logged this snapshot, and releases the
+ * lock. Acquiring ProcArrayLock ensures that no transactions commit until the
+ * lock is released.
*
* The returned data structure is statically allocated; caller should not
* modify it, and must not assume it is valid past the next call.
@@ -1695,6 +1734,12 @@ GetRunningTransactionData(void)
}
}
+ /*
+ * Its important *not* to track decoding tasks here because snapbuild.c
+ * uses ->oldestRunningXid to manage its xmin. If it were to be included
+ * here the initial value could never increase.
+ */
+
CurrentRunningXacts->xcnt = count - subcount;
CurrentRunningXacts->subxcnt = subcount;
CurrentRunningXacts->subxid_overflow = suboverflowed;
@@ -1702,13 +1747,12 @@ GetRunningTransactionData(void)
CurrentRunningXacts->oldestRunningXid = oldestRunningXid;
CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
- /* We don't release XidGenLock here, the caller is responsible for that */
- LWLockRelease(ProcArrayLock);
-
Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));
+ /* We don't release the locks here, the caller is responsible for that */
+
return CurrentRunningXacts;
}
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index e85733b..93ed9dd 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -879,7 +879,22 @@ LogStandbySnapshot(void)
* record we write, because standby will open up when it sees this.
*/
running = GetRunningTransactionData();
- LogCurrentRunningXacts(running);
+
+ /*
+ * GetRunningTransactionData() acquired ProcArrayLock, we must release
+ * it. We can do that before inserting the WAL record because
+ * ProcArrayApplyRecoveryInfo can recheck the commit status using the
+ * clog. If we're doing logical replication we can't do that though, so
+ * hold the lock for a moment longer.
+ */
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ LWLockRelease(ProcArrayLock);
+
+ recptr = LogCurrentRunningXacts(running);
+
+ /* Release lock if we kept it longer ... */
+ if (wal_level >= WAL_LEVEL_LOGICAL)
+ LWLockRelease(ProcArrayLock);
/* GetRunningTransactionData() acquired XidGenLock, we must release it */
LWLockRelease(XidGenLock);
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index e0dc126..9c93cb4 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -475,7 +475,7 @@ RegisterRelcacheInvalidation(Oid dbId, Oid relId)
* Only the local caches are flushed; this does not transmit the message
* to other backends.
*/
-static void
+void
LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
{
if (msg->id >= 0)
@@ -547,7 +547,7 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
* since that tells us we've lost some shared-inval messages and hence
* don't know what needs to be invalidated.
*/
-static void
+void
InvalidateSystemCaches(void)
{
int i;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 3f7386e..5425d32 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1602,6 +1602,10 @@ RelationIdGetRelation(Oid relationId)
return rd;
}
+ /* up2date system relations, even during timetravel */
+ if (IsSystemRelationId(relationId))
+ SuspendDecodingSnapshots();
+
/*
* no reldesc in the cache, so have RelationBuildDesc() build one and add
* it.
@@ -1609,6 +1613,10 @@ RelationIdGetRelation(Oid relationId)
rd = RelationBuildDesc(relationId, true);
if (RelationIsValid(rd))
RelationIncrementReferenceCount(rd);
+
+ if (IsSystemRelationId(relationId))
+ UnSuspendDecodingSnapshots();
+
return rd;
}
@@ -1730,6 +1738,10 @@ RelationReloadIndexInfo(Relation relation)
return;
}
+ /* up2date system relations, even during timetravel */
+ if (IsSystemRelation(relation))
+ SuspendDecodingSnapshots();
+
/*
* Read the pg_class row
*
@@ -1797,6 +1809,9 @@ RelationReloadIndexInfo(Relation relation)
/* Okay, now it's valid again */
relation->rd_isvalid = true;
+
+ if (IsSystemRelation(relation))
+ UnSuspendDecodingSnapshots();
}
/*
@@ -1978,6 +1993,10 @@ RelationClearRelation(Relation relation, bool rebuild)
bool keep_tupdesc;
bool keep_rules;
+ /* up2date system relations, even during timetravel */
+ if (IsSystemRelation(relation))
+ SuspendDecodingSnapshots();
+
/* Build temporary entry, but don't link it into hashtable */
newrel = RelationBuildDesc(save_relid, false);
if (newrel == NULL)
@@ -2047,6 +2066,9 @@ RelationClearRelation(Relation relation, bool rebuild)
/* And now we can throw away the temporary entry */
RelationDestroyRelation(newrel);
+
+ if (IsSystemRelation(relation))
+ UnSuspendDecodingSnapshots();
}
}
@@ -3552,7 +3574,10 @@ RelationGetIndexList(Relation relation)
Form_pg_attribute attr;
/* internal column, like oid */
if (attno <= 0)
- continue;
+ {
+ found = false;
+ break;
+ }
attr = relation->rd_att->attrs[attno - 1];
if (!attr->attnotnull)
@@ -3840,17 +3865,26 @@ RelationGetIndexPredicate(Relation relation)
* be bms_free'd when not needed anymore.
*/
Bitmapset *
-RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
+RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
{
Bitmapset *indexattrs;
- Bitmapset *uindexattrs;
+ Bitmapset *uindexattrs; /* unique keys */
+ Bitmapset *cindexattrs; /* best candidate key */
List *indexoidlist;
ListCell *l;
MemoryContext oldcxt;
/* Quick exit if we already computed the result. */
if (relation->rd_indexattr != NULL)
- return bms_copy(keyAttrs ? relation->rd_keyattr : relation->rd_indexattr);
+ switch(attrKind)
+ {
+ case INDEX_ATTR_BITMAP_CANDIDATE_KEY:
+ return bms_copy(relation->rd_ckeyattr);
+ case INDEX_ATTR_BITMAP_KEY:
+ return bms_copy(relation->rd_keyattr);
+ case INDEX_ATTR_BITMAP_ALL:
+ return bms_copy(relation->rd_indexattr);
+ }
/* Fast path if definitely no indexes */
if (!RelationGetForm(relation)->relhasindex)
@@ -3877,13 +3911,16 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
*/
indexattrs = NULL;
uindexattrs = NULL;
+ cindexattrs = NULL;
foreach(l, indexoidlist)
{
Oid indexOid = lfirst_oid(l);
Relation indexDesc;
IndexInfo *indexInfo;
int i;
- bool isKey;
+ bool isCKey;/* candidate or primary key */
+ bool isKey;/* key member */
+
indexDesc = index_open(indexOid, AccessShareLock);
@@ -3895,6 +3932,8 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
indexInfo->ii_Expressions == NIL &&
indexInfo->ii_Predicate == NIL;
+ isCKey = indexOid == relation->rd_primary;
+
/* Collect simple attribute references */
for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
{
@@ -3904,6 +3943,11 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
{
indexattrs = bms_add_member(indexattrs,
attrnum - FirstLowInvalidHeapAttributeNumber);
+
+ if (isCKey)
+ cindexattrs = bms_add_member(cindexattrs,
+ attrnum - FirstLowInvalidHeapAttributeNumber);
+
if (isKey)
uindexattrs = bms_add_member(uindexattrs,
attrnum - FirstLowInvalidHeapAttributeNumber);
@@ -3925,10 +3969,21 @@ RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs)
oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
relation->rd_indexattr = bms_copy(indexattrs);
relation->rd_keyattr = bms_copy(uindexattrs);
+ relation->rd_ckeyattr = bms_copy(cindexattrs);
MemoryContextSwitchTo(oldcxt);
/* We return our original working copy for caller to play with */
- return keyAttrs ? uindexattrs : indexattrs;
+ switch(attrKind)
+ {
+ case INDEX_ATTR_BITMAP_CANDIDATE_KEY:
+ return cindexattrs;
+ case INDEX_ATTR_BITMAP_KEY:
+ return uindexattrs;
+ case INDEX_ATTR_BITMAP_ALL:
+ return indexattrs;
+ default:
+ elog(ERROR, "unknown attrKind %u", attrKind);
+ }
}
/*
@@ -4903,3 +4958,49 @@ unlink_initfile(const char *initfilename)
elog(LOG, "could not remove cache file \"%s\": %m", initfilename);
}
}
+
+bool
+RelationIsDoingTimetravelInternal(Relation relation)
+{
+ Assert(wal_level >= WAL_LEVEL_LOGICAL);
+
+ if (!RelationNeedsWAL(relation))
+ return false;
+
+ /*
+ * XXX: Doing this test instead of using IsSystemNamespace has the
+ * advantage of classifying a catalog relation's toast tables as a
+ * timetravel relation as well. This is safe since even a oid wraparound
+ * will preserve this property (c.f. GetNewObjectId()).
+ */
+ if (IsSystemRelation(relation))
+ return true;
+
+ /*
+ * Also log relevant data if we want the table to behave as a catalog
+ * table, although its not a system provided one.
+ * XXX: we need to make sure both the relation and its toast relation have
+ * the flag set!
+ */
+ if (RelationIsTreatedAsCatalogTable(relation))
+ return true;
+
+ return false;
+}
+
+bool
+RelationIsLogicallyLoggedInternal(Relation relation)
+{
+ Assert(wal_level >= WAL_LEVEL_LOGICAL);
+ if (!RelationNeedsWAL(relation))
+ return false;
+ /*
+ * XXX: In addition to the above comment, we could decide to always log
+ * data even for real system catalogs, although the benefits of that seem
+ * unclear.
+ */
+ if (IsSystemRelation(relation))
+ return false;
+
+ return true;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea16c64..896df78 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -57,6 +57,7 @@
#include "postmaster/postmaster.h"
#include "postmaster/syslogger.h"
#include "postmaster/walwriter.h"
+#include "replication/logical.h"
#include "replication/syncrep.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
@@ -2047,6 +2048,17 @@ static struct config_int ConfigureNamesInt[] =
},
{
+ /* see max_connections */
+ {"max_logical_slots", PGC_POSTMASTER, REPLICATION_SENDING,
+ gettext_noop("Sets the maximum number of simultaneously defined WAL decoding slots."),
+ NULL
+ },
+ &max_logical_slots,
+ 0, 0, MAX_BACKENDS /*?*/,
+ NULL, NULL, NULL
+ },
+
+ {
{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
gettext_noop("Sets the maximum time to wait for WAL replication."),
NULL,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 0303ac7..92f276d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -160,7 +160,7 @@
# - Settings -
-#wal_level = minimal # minimal, archive, or hot_standby
+#wal_level = minimal # minimal, archive, logical or hot_standby
# (change requires restart)
#fsync = on # turns forced synchronization on or off
#synchronous_commit = on # synchronization level;
@@ -207,11 +207,18 @@
# Set these on the master and on any standby that will send replication data.
-#max_wal_senders = 0 # max number of walsender processes
+#max_wal_senders = 0 # max number of walsender processes, including
+ # both physical and logical replication senders.
# (change requires restart)
#wal_keep_segments = 0 # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s # in milliseconds; 0 disables
+#max_logical_slots = 0 # max number of logical replication sender
+ # and receiver processes. Logical senders
+ # (but not receivers) also consume a
+ # max_wal_senders slot.
+ # (change requires restart)
+
# - Master Server -
# These settings are ignored on a standby server.
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index e739d2d..4162f92 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -75,13 +75,14 @@ static Snapshot SecondarySnapshot = NULL;
* for the convenience of TransactionIdIsInProgress: even in bootstrap
* mode, we don't want it to say that BootstrapTransactionId is in progress.
*
- * RecentGlobalXmin is initialized to InvalidTransactionId, to ensure that no
+ * RecentGlobal(Data)?Xmin is initialized to InvalidTransactionId, to ensure that no
* one tries to use a stale value. Readers should ensure that it has been set
* to something else before using it.
*/
TransactionId TransactionXmin = FirstNormalTransactionId;
TransactionId RecentXmin = FirstNormalTransactionId;
TransactionId RecentGlobalXmin = InvalidTransactionId;
+TransactionId RecentGlobalDataXmin = InvalidTransactionId;
/*
* Elements of the active snapshot stack.
@@ -731,7 +732,7 @@ AtEOXact_Snapshot(bool isCommit)
* Returns the token (the file name) that can be used to import this
* snapshot.
*/
-static char *
+char *
ExportSnapshot(Snapshot snapshot)
{
TransactionId topXid;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 3254a2d..24f0949 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -64,6 +64,8 @@
#include "access/xact.h"
#include "storage/bufmgr.h"
#include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/combocid.h"
#include "utils/tqual.h"
@@ -73,9 +75,17 @@ SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};
+static Snapshot SnapshotNowDecoding;
+/* (table, ctid) => (cmin, cmax) mapping during timetravel */
+static HTAB *tuplecid_data = NULL;
+static int timetravel_suspended = 0;
+
+
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
-
+static bool FailsSatisfies(HeapTuple htup, Snapshot snapshot, Buffer buffer);
+static bool RedirectSatisfiesNow(HeapTuple htup, Snapshot snapshot,
+ Buffer buffer);
/*
* SetHintBits()
@@ -1700,3 +1710,242 @@ HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
*/
return true;
}
+
+/*
+ * check whether the transaciont id 'xid' in in the pre-sorted array 'xip'.
+ */
+static bool
+TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
+{
+ return bsearch(&xid, xip, num,
+ sizeof(TransactionId), xidComparator) != NULL;
+}
+
+/*
+ * See the comments for HeapTupleSatisfiesMVCC for the semantics this function
+ * obeys.
+ *
+ * Only usable on tuples from catalog tables!
+ *
+ * We don't need to support HEAP_MOVED_(IN|OFF) for now because we only support
+ * reading catalog pages which couldn't have been created in an older version.
+ *
+ * We don't set any hint bits in here as it seems unlikely to be beneficial as
+ * those should already be set by normal access and it seems to be too
+ * dangerous to do so as the semantics of doing so during timetravel are more
+ * complicated than when dealing "only" with the present.
+ */
+bool
+HeapTupleSatisfiesMVCCDuringDecoding(HeapTuple htup, Snapshot snapshot,
+ Buffer buffer)
+{
+ HeapTupleHeader tuple = htup->t_data;
+ TransactionId xmin = HeapTupleHeaderGetXmin(tuple);
+ TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
+
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
+ /* transaction aborted */
+ if (tuple->t_infomask & HEAP_XMIN_INVALID)
+ {
+ Assert(!TransactionIdDidCommit(xmin));
+ return false;
+ }
+ /* check if its one of our txids, toplevel is also in there */
+ else if (TransactionIdInArray(xmin, snapshot->subxip, snapshot->subxcnt))
+ {
+ CommandId cmin = HeapTupleHeaderGetRawCommandId(tuple);
+ CommandId cmax = InvalidCommandId;
+
+ /*
+ * if another transaction deleted this tuple or if cmin/cmax is stored
+ * in a combocid we need to to lookup the actual values externally.
+ */
+ if ((!(tuple->t_infomask & HEAP_XMAX_INVALID) &&
+ !TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt)) ||
+ tuple->t_infomask & HEAP_COMBOCID
+ )
+ {
+ bool resolved;
+
+ resolved = ResolveCminCmaxDuringDecoding(tuplecid_data, htup,
+ buffer, &cmin, &cmax);
+
+ if (!resolved)
+ elog(ERROR, "could not resolve cmin/cmax of catalog tuple");
+ }
+
+ if (cmin >= snapshot->curcid)
+ return false; /* inserted after scan started */
+ }
+ /* normal transaction state counts */
+ else if (TransactionIdPrecedes(xmin, snapshot->xmin))
+ {
+ Assert(!(tuple->t_infomask & HEAP_XMIN_COMMITTED &&
+ !TransactionIdDidCommit(xmin)));
+
+ if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED) &&
+ !TransactionIdDidCommit(xmin))
+ return false;
+ }
+ /* beyond our xmax horizon, i.e. invisible */
+ else if (TransactionIdFollowsOrEquals(xmin, snapshot->xmax))
+ {
+ return false;
+ }
+ /* check if we know the transaction has committed */
+ else if(TransactionIdInArray(xmin, snapshot->xip, snapshot->xcnt))
+ {
+ }
+ else
+ {
+ return false;
+ }
+
+ /* at this point we know xmin is visible, check xmax */
+
+ /* why should those be in catalog tables? */
+ Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
+
+ /* xid invalid or aborted */
+ if (tuple->t_infomask & HEAP_XMAX_INVALID)
+ return true;
+ /* locked tuples are always visible */
+ else if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+ return true;
+ /* check if its one of our txids, toplevel is also in there */
+ else if (TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt))
+ {
+ CommandId cmin;
+ CommandId cmax = HeapTupleHeaderGetRawCommandId(tuple);
+
+ /* Lookup actual cmin/cmax values */
+ if (tuple->t_infomask & HEAP_COMBOCID)
+ {
+ bool resolved;
+
+ resolved = ResolveCminCmaxDuringDecoding(tuplecid_data, htup,
+ buffer, &cmin, &cmax);
+
+ if (!resolved)
+ elog(ERROR, "could not resolve combocid to cmax");
+ }
+
+ if (cmax >= snapshot->curcid)
+ return true; /* deleted after scan started */
+ else
+ return false; /* deleted before scan started */
+ }
+ /* normal transaction state is valid */
+ else if (TransactionIdPrecedes(xmax, snapshot->xmin))
+ {
+ Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED &&
+ !TransactionIdDidCommit(xmax)));
+
+ if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
+ return false;
+
+ return !TransactionIdDidCommit(xmax);
+ }
+ /* we cannot possibly see the deleting transaction */
+ else if (TransactionIdFollowsOrEquals(xmax, snapshot->xmax))
+ return true;
+ /* do we know that the deleting txn is valid? */
+ else if (TransactionIdInArray(xmax, snapshot->xip, snapshot->xcnt))
+ return false;
+ else
+ return true;
+}
+
+/*
+ * Setup a replacement SnapshotNow that allows catalog access to behave just
+ * like it did at a certain point in the past.
+ *
+ * Needed for after-the-fact WAL decoding.
+ */
+void
+SetupDecodingSnapshots(Snapshot snapshot_now, HTAB *tuplecids)
+{
+ /* prevent recursively setting up decoding snapshots */
+ Assert(SnapshotNowData.satisfies != RedirectSatisfiesNow);
+
+ SnapshotNowData.satisfies = RedirectSatisfiesNow;
+ /* make sure normal snapshots aren't used*/
+ SnapshotSelfData.satisfies = FailsSatisfies;
+ SnapshotAnyData.satisfies = FailsSatisfies;
+ /* don't overwrite SnapshotToastData, we want that to behave normally */
+
+ /* setup the timetravel snapshot */
+ SnapshotNowDecoding = snapshot_now;
+
+ /* setup (cmin, cmax) lookup hash */
+ tuplecid_data = tuplecids;
+
+ timetravel_suspended = 0;
+}
+
+
+/*
+ * Make SnapshotNow behave normally again.
+ */
+void
+RevertFromDecodingSnapshots(void)
+{
+ SnapshotNowDecoding = NULL;
+ tuplecid_data = NULL;
+
+ /* rally to restore sanity and/or boredom */
+ SnapshotNowData.satisfies = HeapTupleSatisfiesNow;
+ SnapshotSelfData.satisfies = HeapTupleSatisfiesSelf;
+ SnapshotAnyData.satisfies = HeapTupleSatisfiesAny;
+ timetravel_suspended = 0;
+}
+
+/*
+ * Disable timetravel SnapshotNow emulation and perform old-fashioned
+ * SnapshotNow access but make re-enabling cheap.. This is useful for accessing
+ * catalog entries which must stay up2date like the pg_class entries of system
+ * relations.
+ *
+ * Can be called several times in a nested fashion since several of it's
+ * callers suspend timetravel access on several code levels.
+ */
+void
+SuspendDecodingSnapshots(void)
+{
+ timetravel_suspended++;
+}
+
+/*
+ * Enable timetravel SnapshotNow emulation again.
+ */
+void
+UnSuspendDecodingSnapshots(void)
+{
+ timetravel_suspended--;
+}
+
+/*
+ * Error out if a normal snapshot is used. That is neither legal nor expected
+ * during timetravel, so this is just extra assurance.
+ */
+static bool
+FailsSatisfies(HeapTuple htup, Snapshot snapshot, Buffer buffer)
+{
+ elog(ERROR, "Normal snapshots cannot be used during timetravel access.");
+ return false;
+}
+
+/*
+ * Call the replacement SatisifiesNow with the required SnapshotNow data.
+ */
+static bool
+RedirectSatisfiesNow(HeapTuple htup, Snapshot snapshot, Buffer buffer)
+{
+ Assert(SnapshotNowDecoding != NULL);
+ if (timetravel_suspended > 0)
+ return HeapTupleSatisfiesNow(htup, snapshot, buffer);
+ return HeapTupleSatisfiesMVCCDuringDecoding(htup, SnapshotNowDecoding,
+ buffer);
+}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9ff96c6..18b8ca0 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -193,7 +193,9 @@ const char *subdirs[] = {
"base/1",
"pg_tblspc",
"pg_stat",
- "pg_stat_tmp"
+ "pg_stat_tmp",
+ "pg_llog",
+ "pg_llog/snapshots"
};
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index a790f99..8d86de0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -77,6 +77,8 @@ wal_level_str(WalLevel wal_level)
return "archive";
case WAL_LEVEL_HOT_STANDBY:
return "hot_standby";
+ case WAL_LEVEL_LOGICAL:
+ return "logical";
}
return _("unrecognized wal_level");
}
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 4381778..42f3e6b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -55,6 +55,18 @@
#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
+#define XLOG_HEAP2_NEW_CID 0x70
+
+/*
+ * xl_heap_* ->flag values
+ */
+/* PD_ALL_VISIBLE was cleared */
+#define XLOG_HEAP_ALL_VISIBLE_CLEARED (1<<0)
+/* PD_ALL_VISIBLE was cleared in the 2nd page */
+#define XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED (1<<1)
+#define XLOG_HEAP_CONTAINS_OLD_TUPLE (1<<2)
+#define XLOG_HEAP_CONTAINS_OLD_KEY (1<<3)
+#define XLOG_HEAP_CONTAINS_NEW_TUPLE (1<<4)
/*
* All what we need to find changed tuple
@@ -78,10 +90,10 @@ typedef struct xl_heap_delete
xl_heaptid target; /* deleted tuple id */
TransactionId xmax; /* xmax of the deleted tuple */
uint8 infobits_set; /* infomask bits */
- bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
+ uint8 flags;
} xl_heap_delete;
-#define SizeOfHeapDelete (offsetof(xl_heap_delete, all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapDelete (offsetof(xl_heap_delete, flags) + sizeof(uint8))
/*
* We don't store the whole fixed part (HeapTupleHeaderData) of an inserted
@@ -100,15 +112,23 @@ typedef struct xl_heap_header
#define SizeOfHeapHeader (offsetof(xl_heap_header, t_hoff) + sizeof(uint8))
+typedef struct xl_heap_header_len
+{
+ uint16 t_len;
+ xl_heap_header header;
+} xl_heap_header_len;
+
+#define SizeOfHeapHeaderLen (offsetof(xl_heap_header_len, header) + SizeOfHeapHeader)
+
/* This is what we need to know about insert */
typedef struct xl_heap_insert
{
xl_heaptid target; /* inserted tuple id */
- bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
+ uint8 flags;
/* xl_heap_header & TUPLE DATA FOLLOWS AT END OF STRUCT */
} xl_heap_insert;
-#define SizeOfHeapInsert (offsetof(xl_heap_insert, all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapInsert (offsetof(xl_heap_insert, flags) + sizeof(uint8))
/*
* This is what we need to know about a multi-insert. The record consists of
@@ -120,7 +140,7 @@ typedef struct xl_heap_multi_insert
{
RelFileNode node;
BlockNumber blkno;
- bool all_visible_cleared;
+ uint8 flags;
uint16 ntuples;
OffsetNumber offsets[1];
@@ -147,13 +167,12 @@ typedef struct xl_heap_update
TransactionId old_xmax; /* xmax of the old tuple */
TransactionId new_xmax; /* xmax of the new tuple */
ItemPointerData newtid; /* new inserted tuple id */
- uint8 old_infobits_set; /* infomask bits to set on old tuple */
- bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
- bool new_all_visible_cleared; /* same for the page of newtid */
+ uint8 old_infobits_set; /* infomask bits to set on old tuple */
+ uint8 flags;
/* NEW TUPLE xl_heap_header AND TUPLE DATA FOLLOWS AT END OF STRUCT */
} xl_heap_update;
-#define SizeOfHeapUpdate (offsetof(xl_heap_update, new_all_visible_cleared) + sizeof(bool))
+#define SizeOfHeapUpdate (offsetof(xl_heap_update, flags) + sizeof(uint8))
/*
* This is what we need to know about vacuum page cleanup/redirect
@@ -261,6 +280,28 @@ typedef struct xl_heap_visible
#define SizeOfHeapVisible (offsetof(xl_heap_visible, cutoff_xid) + sizeof(TransactionId))
+typedef struct xl_heap_new_cid
+{
+ /*
+ * store toplevel xid so we don't have to merge cids from different
+ * transactions
+ */
+ TransactionId top_xid;
+ CommandId cmin;
+ CommandId cmax;
+ /*
+ * don't really need the combocid but the padding makes it free and its
+ * useful for debugging.
+ */
+ CommandId combocid;
+ /*
+ * Store the relfilenode/ctid pair to facilitate lookups.
+ */
+ xl_heaptid target;
+} xl_heap_new_cid;
+
+#define SizeOfHeapNewCid (offsetof(xl_heap_new_cid, target) + SizeOfHeapTid)
+
extern void HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
TransactionId *latestRemovedXid);
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 23a41fd..8452ec5 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -63,6 +63,11 @@
(AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
(int32) ((id1) - (id2)) < 0)
+/* compare two XIDs already known to be normal; this is a macro for speed */
+#define NormalTransactionIdFollows(id1, id2) \
+ (AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
+ (int32) ((id1) - (id2)) > 0)
+
/* ----------
* Object ID (OID) zero is InvalidOid.
*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index b4a75ce..80f9ab6 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -196,7 +196,8 @@ typedef enum WalLevel
{
WAL_LEVEL_MINIMAL = 0,
WAL_LEVEL_ARCHIVE,
- WAL_LEVEL_HOT_STANDBY
+ WAL_LEVEL_HOT_STANDBY,
+ WAL_LEVEL_LOGICAL
} WalLevel;
extern int wal_level;
@@ -209,9 +210,12 @@ extern int wal_level;
*/
#define XLogIsNeeded() (wal_level >= WAL_LEVEL_ARCHIVE)
-/* Do we need to WAL-log information required only for Hot Standby? */
+/* Do we need to WAL-log information required only for Hot Standby and logical replication? */
#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_HOT_STANDBY)
+/* Do we need to WAL-log information required only for logical replication? */
+#define XLogLogicalInfoActive() (wal_level >= WAL_LEVEL_LOGICAL)
+
#ifdef WAL_DEBUG
extern bool XLOG_DEBUG;
#endif
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 3829ce2..72179ab 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -19,6 +19,7 @@
#ifndef XLOGREADER_H
#define XLOGREADER_H
+#include "access/xlog.h"
#include "access/xlog_internal.h"
typedef struct XLogReaderState XLogReaderState;
@@ -108,10 +109,19 @@ struct XLogReaderState
char *errormsg_buf;
};
-/* Get a new XLogReader */
+
extern XLogReaderState *XLogReaderAllocate(XLogPageReadCB pagereadfunc,
void *private_data);
+
+typedef struct XLogRecordBuffer
+{
+ XLogRecPtr origptr;
+ XLogRecord record;
+ char *record_data;
+} XLogRecordBuffer;
+
+
/* Free an XLogReader */
extern void XLogReaderFree(XLogReaderState *state);
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 44b6f38..a96ed69 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -23,6 +23,7 @@ extern ForkNumber forkname_to_number(char *forkName);
extern char *GetDatabasePath(Oid dbNode, Oid spcNode);
+extern bool IsSystemRelationId(Oid relid);
extern bool IsSystemRelation(Relation relation);
extern bool IsToastRelation(Relation relation);
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8d268dd..9b38477 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2619,6 +2619,8 @@ DATA(insert OID = 2022 ( pg_stat_get_activity PGNSP PGUID 12 1 100 0 0 f f f
DESCR("statistics: information about currently active backends");
DATA(insert OID = 3099 ( pg_stat_get_wal_senders PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 "" "{23,25,25,25,25,25,23,25}" "{o,o,o,o,o,o,o,o}" "{pid,state,sent_location,write_location,flush_location,replay_location,sync_priority,sync_state}" _null_ pg_stat_get_wal_senders _null_ _null_ _null_ ));
DESCR("statistics: information about currently active replication");
+DATA(insert OID = 3457 ( pg_stat_get_logical_decoding_slots PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 "" "{25,25,26,16,28,25}" "{o,o,o,o,o,o}" "{slot_name,plugin,database,active,xmin,restart_decoding_lsn}" _null_ pg_stat_get_logical_decoding_slots _null_ _null_ _null_ ));
+DESCR("statistics: information about logical replication slots currently in use");
DATA(insert OID = 2026 ( pg_backend_pid PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 23 "" _null_ _null_ _null_ _null_ pg_backend_pid _null_ _null_ _null_ ));
DESCR("statistics: current backend PID");
DATA(insert OID = 1937 ( pg_stat_get_backend_pid PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 23 "23" _null_ _null_ _null_ _null_ pg_stat_get_backend_pid _null_ _null_ _null_ ));
@@ -4723,6 +4725,10 @@ DESCR("SP-GiST support for quad tree over range");
DATA(insert OID = 3473 ( spg_range_quad_leaf_consistent PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "2281 2281" _null_ _null_ _null_ _null_ spg_range_quad_leaf_consistent _null_ _null_ _null_ ));
DESCR("SP-GiST support for quad tree over range");
+DATA(insert OID = 3779 ( init_logical_replication PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2249 "19 19" "{19,19,25,25}" "{i,i,o,o}" "{slotname,plugin,slotname,xlog_position}" _null_ init_logical_replication _null_ _null_ _null_ ));
+DESCR("set up a logical replication slot");
+DATA(insert OID = 3780 ( stop_logical_replication PGNSP PGUID 12 1 0 0 0 f f f f f f v 1 0 23 "19" _null_ _null_ _null_ _null_ stop_logical_replication _null_ _null_ _null_ ));
+DESCR("stop logical replication");
DATA(insert OID = 3781 ( pg_xlog_wait_remote_apply PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2278 "25 23" _null_ _null_ _null_ _null_ pg_xlog_wait_remote_apply _null_ _null_ _null_ ));
DESCR("wait for an lsn to be applied by a remote node");
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index d8dd8b0..2616ac1 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -156,7 +156,7 @@ extern void vac_update_relstats(Relation relation,
TransactionId frozenxid,
MultiXactId minmulti);
extern void vacuum_set_xid_limits(int freeze_min_age, int freeze_table_age,
- bool sharedRel,
+ bool sharedRel, bool catalogRel,
TransactionId *oldestXmin,
TransactionId *freezeLimit,
TransactionId *freezeTableLimit,
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 0d5c007..0b17182 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -408,6 +408,9 @@ typedef enum NodeTag
T_IdentifySystemCmd,
T_BaseBackupCmd,
T_StartReplicationCmd,
+ T_InitLogicalReplicationCmd,
+ T_StartLogicalReplicationCmd,
+ T_FreeLogicalReplicationCmd,
T_TimeLineHistoryCmd,
/*
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 85b4544..3da8d40 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -52,6 +52,41 @@ typedef struct StartReplicationCmd
/* ----------------------
+ * INIT_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct InitLogicalReplicationCmd
+{
+ NodeTag type;
+ char *name;
+ char *plugin;
+} InitLogicalReplicationCmd;
+
+
+/* ----------------------
+ * START_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct StartLogicalReplicationCmd
+{
+ NodeTag type;
+ char *name;
+ XLogRecPtr startpoint;
+ List *options;
+} StartLogicalReplicationCmd;
+
+/* ----------------------
+ * FREE_LOGICAL_REPLICATION command
+ * ----------------------
+ */
+typedef struct FreeLogicalReplicationCmd
+{
+ NodeTag type;
+ char *name;
+} FreeLogicalReplicationCmd;
+
+
+/* ----------------------
* TIMELINE_HISTORY command
* ----------------------
*/
diff --git a/src/include/replication/decode.h b/src/include/replication/decode.h
new file mode 100644
index 0000000..dd3f2ca
--- /dev/null
+++ b/src/include/replication/decode.h
@@ -0,0 +1,20 @@
+/*-------------------------------------------------------------------------
+ * decode.h
+ * PostgreSQL WAL to logical transformation
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DECODE_H
+#define DECODE_H
+
+#include "access/xlogreader.h"
+#include "replication/reorderbuffer.h"
+#include "replication/logical.h"
+
+void DecodeRecordIntoReorderBuffer(LogicalDecodingContext *ctx,
+ XLogRecordBuffer *buf);
+
+#endif
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
new file mode 100644
index 0000000..971180b
--- /dev/null
+++ b/src/include/replication/logical.h
@@ -0,0 +1,198 @@
+/*-------------------------------------------------------------------------
+ * logical.h
+ * PostgreSQL WAL to logical transformation
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LOGICAL_H
+#define LOGICAL_H
+
+#include "access/xlog.h"
+#include "access/xlogreader.h"
+#include "replication/output_plugin.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+
+/*
+ * Shared memory state of a single logical decoding slot
+ */
+typedef struct LogicalDecodingSlot
+{
+ /* lock, on same cacheline as effective_xmin */
+ slock_t mutex;
+
+ /* on-disk xmin, updated first */
+ TransactionId xmin;
+
+ /* in-memory xmin, updated after syncing to disk */
+ TransactionId effective_xmin;
+
+ /* is this slot defined */
+ bool in_use;
+
+ /* is somebody streaming out changes for this slot */
+ bool active;
+
+ /* have we been aborted while ->active */
+ bool aborted;
+
+ /* ----
+ * If we shutdown, crash, whatever where do we have to restart decoding
+ * from to
+ * a) find a valid & ready snapshot
+ * b) the complete content for all in-progress xacts
+ * ----
+ */
+ XLogRecPtr restart_decoding;
+
+ /*
+ * Last location we know the client has confirmed to have safely received
+ * data to. No earlier data can be decoded after a restart/crash.
+ */
+ XLogRecPtr confirmed_flush;
+
+ /* ----
+ * When the client has confirmed flushes >= candidate_xmin_after we can
+ * a) advance the pegged xmin
+ * b) advance restart_decoding_from so we have to read/keep less WAL
+ * ----
+ */
+ XLogRecPtr candidate_lsn;
+ TransactionId candidate_xmin;
+ XLogRecPtr candidate_restart_decoding;
+
+ /* database the slot is active on */
+ Oid database;
+
+ /* slot identifier */
+ NameData name;
+
+ /* plugin name */
+ NameData plugin;
+} LogicalDecodingSlot;
+
+/*
+ * Shared memory control area for all of logical decoding
+ */
+typedef struct LogicalDecodingCtlData
+{
+ /*
+ * Xmin across all logical slots.
+ *
+ * Protected by ProcArrayLock.
+ */
+ TransactionId xmin;
+
+ LogicalDecodingSlot logical_slots[FLEXIBLE_ARRAY_MEMBER];
+} LogicalDecodingCtlData;
+
+/*
+ * Pointers to shared memory
+ */
+extern LogicalDecodingCtlData *LogicalDecodingCtl;
+extern LogicalDecodingSlot *MyLogicalDecodingSlot;
+
+struct LogicalDecodingContext;
+
+typedef void (*LogicalOutputPluginWriterWrite) (
+ struct LogicalDecodingContext *lr,
+ XLogRecPtr Ptr,
+ TransactionId xid
+);
+
+typedef LogicalOutputPluginWriterWrite LogicalOutputPluginWriterPrepareWrite;
+
+/*
+ * Output plugin callbacks
+ */
+typedef struct OutputPluginCallbacks
+{
+ LogicalDecodeInitCB init_cb;
+ LogicalDecodeBeginCB begin_cb;
+ LogicalDecodeChangeCB change_cb;
+ LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeCleanupCB cleanup_cb;
+} OutputPluginCallbacks;
+
+typedef struct LogicalDecodingContext
+{
+ struct XLogReaderState *reader;
+ struct LogicalDecodingSlot *slot;
+ struct ReorderBuffer *reorder;
+ struct SnapBuild *snapshot_builder;
+
+ struct OutputPluginCallbacks callbacks;
+
+ bool stop_after_consistent;
+
+ /*
+ * User specified options
+ */
+ List *output_plugin_options;
+
+ /*
+ * User-Provided callback for writing/streaming out data.
+ */
+ LogicalOutputPluginWriterPrepareWrite prepare_write;
+ LogicalOutputPluginWriterWrite write;
+
+ /*
+ * Output buffer.
+ */
+ StringInfo out;
+
+ /*
+ * Private data pointer for the creator of the logical decoding context.
+ */
+ void *owner_private;
+
+ /*
+ * Private data pointer of the output plugin.
+ */
+ void *output_plugin_private;
+
+ /*
+ * Private data pointer for the data writer.
+ */
+ void *output_writer_private;
+} LogicalDecodingContext;
+
+/* GUCs */
+extern PGDLLIMPORT int max_logical_slots;
+
+extern Size LogicalDecodingShmemSize(void);
+extern void LogicalDecodingShmemInit(void);
+
+extern void LogicalDecodingAcquireFreeSlot(const char *name, const char *plugin);
+extern void LogicalDecodingReleaseSlot(void);
+extern void LogicalDecodingReAcquireSlot(const char *name);
+extern void LogicalDecodingFreeSlot(const char *name);
+
+extern void ComputeLogicalXmin(void);
+
+/* change logical xmin */
+extern void IncreaseLogicalXminForSlot(XLogRecPtr lsn, TransactionId xmin);
+
+/* change recovery restart location */
+extern void IncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart_lsn);
+
+extern void LogicalConfirmReceivedLocation(XLogRecPtr lsn);
+
+extern void CheckLogicalReplicationRequirements(void);
+
+extern void StartupLogicalReplication(XLogRecPtr checkPointRedo);
+
+extern LogicalDecodingContext *CreateLogicalDecodingContext(
+ LogicalDecodingSlot *slot,
+ bool is_init,
+ XLogRecPtr start_lsn,
+ List *output_plugin_options,
+ XLogPageReadCB read_page,
+ LogicalOutputPluginWriterPrepareWrite prepare_write,
+ LogicalOutputPluginWriterWrite do_write);
+extern bool LogicalDecodingContextReady(LogicalDecodingContext *ctx);
+extern void FreeLogicalDecodingContext(LogicalDecodingContext *ctx);
+
+#endif
diff --git a/src/include/replication/logicalfuncs.h b/src/include/replication/logicalfuncs.h
new file mode 100644
index 0000000..37f36a5
--- /dev/null
+++ b/src/include/replication/logicalfuncs.h
@@ -0,0 +1,19 @@
+/*-------------------------------------------------------------------------
+ * logicalfuncs.h
+ * PostgreSQL WAL to logical transformation support functions
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LOGICALFUNCS_H
+#define LOGICALFUNCS_H
+
+extern int logical_read_local_xlog_page(XLogReaderState *state,
+ XLogRecPtr targetPagePtr,
+ int reqLen, XLogRecPtr targetRecPtr,
+ char *cur_page, TimeLineID *pageTLI);
+
+extern Datum pg_stat_get_logical_decoding_slots(PG_FUNCTION_ARGS);
+
+#endif
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
new file mode 100644
index 0000000..66b4fd9
--- /dev/null
+++ b/src/include/replication/output_plugin.h
@@ -0,0 +1,73 @@
+/*-------------------------------------------------------------------------
+ * output_plugin.h
+ * PostgreSQL Logical Decode Plugin Interface
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OUTPUT_PLUGIN_H
+#define OUTPUT_PLUGIN_H
+
+#include "replication/reorderbuffer.h"
+
+struct LogicalDecodingContext;
+
+/*
+ * Callback that gets called in a user-defined plugin. ctx->private_data can
+ * be set to some private data.
+ *
+ * Gets looked up via the library symbol pg_decode_init.
+ */
+typedef void (*LogicalDecodeInitCB) (
+ struct LogicalDecodingContext *ctx,
+ bool is_init
+);
+
+/*
+ * Gets called for every BEGIN of a successful transaction.
+ *
+ * Return "true" if the message in "out" should get sent, false otherwise.
+ *
+ * Gets looked up via the library symbol pg_decode_begin_txn.
+ */
+typedef bool (*LogicalDecodeBeginCB) (
+ struct LogicalDecodingContext *,
+ ReorderBufferTXN *txn);
+
+/*
+ * Gets called for every change in a successful transaction.
+ *
+ * Return "true" if the message in "out" should get sent, false otherwise.
+ *
+ * Gets looked up via the library symbol pg_decode_change.
+ */
+typedef bool (*LogicalDecodeChangeCB) (
+ struct LogicalDecodingContext *,
+ ReorderBufferTXN *txn,
+ Relation relation,
+ ReorderBufferChange *change
+);
+
+/*
+ * Gets called for every COMMIT of a successful transaction.
+ *
+ * Return "true" if the message in "out" should get sent, false otherwise.
+ *
+ * Gets looked up via the library symbol pg_decode_commit_txn.
+ */
+typedef bool (*LogicalDecodeCommitCB) (
+ struct LogicalDecodingContext *,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Gets called to cleanup the state of an output plugin
+ *
+ * Gets looked up via the library symbol pg_decode_cleanup.
+ */
+typedef void (*LogicalDecodeCleanupCB) (
+ struct LogicalDecodingContext *
+);
+
+#endif /* OUTPUT_PLUGIN_H */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
new file mode 100644
index 0000000..b34b6fd
--- /dev/null
+++ b/src/include/replication/reorderbuffer.h
@@ -0,0 +1,320 @@
+/*
+ * reorderbuffer.h
+ *
+ * PostgreSQL logical replay "cache" management
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * src/include/replication/reorderbuffer.h
+ */
+#ifndef REORDERBUFFER_H
+#define REORDERBUFFER_H
+
+#include "access/htup_details.h"
+#include "utils/hsearch.h"
+#include "utils/rel.h"
+
+#include "lib/ilist.h"
+
+#include "storage/sinval.h"
+
+#include "utils/snapshot.h"
+
+
+typedef struct ReorderBuffer ReorderBuffer;
+
+/* types of the change passed to a 'change' callback */
+enum ReorderBufferChangeType
+{
+ REORDER_BUFFER_CHANGE_INSERT,
+ REORDER_BUFFER_CHANGE_UPDATE,
+ REORDER_BUFFER_CHANGE_DELETE
+};
+
+/* an individual tuple, stored in one chunk of memory */
+typedef struct ReorderBufferTupleBuf
+{
+ /* position in preallocated list */
+ slist_node node;
+
+ /* tuple, stored sequentially */
+ HeapTupleData tuple;
+ HeapTupleHeaderData header;
+ char data[MaxHeapTupleSize];
+} ReorderBufferTupleBuf;
+
+/*
+ * a single 'change', can be an insert (with one tuple), an update (old, new),
+ * or a delete (old).
+ *
+ * The same struct is also used internally for other purposes but that should
+ * never be visible outside reorderbuffer.c.
+ */
+typedef struct ReorderBufferChange
+{
+ XLogRecPtr lsn;
+
+ /* type of change */
+ union
+ {
+ enum ReorderBufferChangeType action;
+ /* do not leak internal enum values to the outside */
+ int action_internal;
+ };
+
+ /*
+ * Context data for the change, which part of the union is valid depends
+ * on action/action_internal.
+ */
+ union
+ {
+ /* old, new tuples when action == *_INSERT|UPDATE|DELETE */
+ struct
+ {
+ /* relation that has been changed */
+ RelFileNode relnode;
+ /* valid for DELETE || UPDATE */
+ ReorderBufferTupleBuf *oldtuple;
+ /* valid for INSERT || UPDATE */
+ ReorderBufferTupleBuf *newtuple;
+ };
+
+ /* new snapshot */
+ Snapshot snapshot;
+
+ /* new command id for existing snapshot in a catalog changing tx */
+ CommandId command_id;
+
+ /* new cid mapping for catalog changing transaction */
+ struct
+ {
+ RelFileNode node;
+ ItemPointerData tid;
+ CommandId cmin;
+ CommandId cmax;
+ CommandId combocid;
+ } tuplecid;
+ };
+
+ /*
+ * While in use this is how a change is linked into a transactions,
+ * otherwise it's the preallocated list.
+ */
+ dlist_node node;
+} ReorderBufferChange;
+
+typedef struct ReorderBufferTXN
+{
+ /*
+ * The transactions transaction id, can be a toplevel or sub xid.
+ */
+ TransactionId xid;
+
+ /*
+ * LSN of the first wal record with knowledge about this xid.
+ */
+ XLogRecPtr lsn;
+ XLogRecPtr last_lsn;
+
+ /*
+ * LSN of the last lsn at which snapshot information reside, so we can
+ * restart decoding from there and fully recover this transaction from
+ * WAL.
+ */
+ XLogRecPtr restart_decoding_lsn;
+
+ /* did the TX have catalog changes */
+ bool does_timetravel;
+
+ /*
+ * Base snapshot or NULL.
+ */
+ Snapshot base_snapshot;
+
+ /*
+ * Do we know this is a subxact?
+ */
+ bool is_known_as_subxact;
+
+ /*
+ * How many ReorderBufferChange's do we have in this txn.
+ *
+ * Changes in subtransactions are *not* included but tracked separately.
+ */
+ Size nentries;
+
+ /*
+ * How many of the above entries are stored in memory in contrast to being
+ * spilled to disk.
+ */
+ Size nentries_mem;
+
+ /*
+ * List of ReorderBufferChange structs, including new Snapshots and new
+ * CommandIds
+ */
+ dlist_head changes;
+
+ /*
+ * List of (relation, ctid) => (cmin, cmax) mappings for catalog tuples.
+ * Those are always assigned to the toplevel transaction. (Keep track of
+ * #entries to create a hash of the right size)
+ */
+ dlist_head tuplecids;
+ size_t ntuplecids;
+
+ /*
+ * On-demand built hash for looking up the above values.
+ */
+ HTAB *tuplecid_hash;
+
+ /*
+ * Hash containing (potentially partial) toast entries. NULL if no toast
+ * tuples have been found for the current change.
+ */
+ HTAB *toast_hash;
+
+ /*
+ * non-hierarchical list of subtransactions that are *not* aborted. Only
+ * used in toplevel transactions.
+ */
+ dlist_head subtxns;
+ size_t nsubtxns;
+
+ /*
+ * Position in one of three lists: * list of subtransactions if we are
+ * *known* to be subxact * list of toplevel xacts (can be a as-yet unknown
+ * subxact) * list of preallocated ReorderBufferTXNs
+ */
+ dlist_node node;
+
+ /*
+ * Stored cache invalidations. This is not a linked list because we get
+ * all the invalidations at once.
+ */
+ SharedInvalidationMessage *invalidations;
+ size_t ninvalidations;
+
+} ReorderBufferTXN;
+
+
+/* change callback signature */
+typedef void (*ReorderBufferApplyChangeCB) (
+ ReorderBuffer *cache,
+ ReorderBufferTXN *txn,
+ Relation relation,
+ ReorderBufferChange *change);
+
+/* begin callback signature */
+typedef void (*ReorderBufferBeginCB) (
+ ReorderBuffer *cache,
+ ReorderBufferTXN *txn);
+
+/* commit callback signature */
+typedef void (*ReorderBufferCommitCB) (
+ ReorderBuffer *cache,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+struct ReorderBuffer
+{
+ /*
+ * xid => ReorderBufferTXN lookup table
+ */
+ HTAB *by_txn;
+
+ /*
+ * Transactions that could be a toplevel xact, ordered by LSN of the first
+ * record bearing that xid..
+ */
+ dlist_head toplevel_by_lsn;
+
+ /*
+ * one-entry sized cache for by_txn. Very frequently the same txn gets
+ * looked up over and over again.
+ */
+ TransactionId by_txn_last_xid;
+ ReorderBufferTXN *by_txn_last_txn;
+
+ /*
+ * Callacks to be called when a transactions commits.
+ */
+ ReorderBufferBeginCB begin;
+ ReorderBufferApplyChangeCB apply_change;
+ ReorderBufferCommitCB commit;
+
+ /*
+ * Pointer that will be passed untouched to the callbacks.
+ */
+ void *private_data;
+
+ /*
+ * Private memory context.
+ */
+ MemoryContext context;
+
+ /*
+ * Data structure slab cache.
+ *
+ * We allocate/deallocate some structures very frequently, to avoid bigger
+ * overhead we cache some unused ones here.
+ *
+ * The maximum number of cached entries is controlled by const variables
+ * ontop of reorderbuffer.c
+ */
+
+ /* cached ReorderBufferTXNs */
+ dlist_head cached_transactions;
+ Size nr_cached_transactions;
+
+ /* cached ReorderBufferChanges */
+ dlist_head cached_changes;
+ Size nr_cached_changes;
+
+ /* cached ReorderBufferTupleBufs */
+ slist_head cached_tuplebufs;
+ Size nr_cached_tuplebufs;
+
+ XLogRecPtr current_restart_decoding_lsn;
+
+ /* buffer for disk<->memory conversions */
+ char *outbuf;
+ Size outbufsize;
+};
+
+
+ReorderBuffer *ReorderBufferAllocate(void);
+void ReorderBufferFree(ReorderBuffer *);
+
+ReorderBufferTupleBuf *ReorderBufferGetTupleBuf(ReorderBuffer *);
+void ReorderBufferReturnTupleBuf(ReorderBuffer *, ReorderBufferTupleBuf *tuple);
+ReorderBufferChange *ReorderBufferGetChange(ReorderBuffer *);
+void ReorderBufferReturnChange(ReorderBuffer *, ReorderBufferChange *);
+
+void ReorderBufferAddChange(ReorderBuffer *, TransactionId, XLogRecPtr lsn, ReorderBufferChange *);
+void ReorderBufferCommit(ReorderBuffer *, TransactionId, XLogRecPtr lsn);
+void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr lsn);
+void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr lsn);
+void ReorderBufferAbort(ReorderBuffer *, TransactionId, XLogRecPtr lsn);
+
+void ReorderBufferSetBaseSnapshot(ReorderBuffer *, TransactionId, XLogRecPtr lsn, struct SnapshotData *snap);
+void ReorderBufferAddSnapshot(ReorderBuffer *, TransactionId, XLogRecPtr lsn, struct SnapshotData *snap);
+void ReorderBufferAddNewCommandId(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+ CommandId cid);
+void ReorderBufferAddNewTupleCids(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+ RelFileNode node, ItemPointerData pt,
+ CommandId cmin, CommandId cmax, CommandId combocid);
+void ReorderBufferAddInvalidations(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
+ Size nmsgs, SharedInvalidationMessage *msgs);
+bool ReorderBufferIsXidKnown(ReorderBuffer *, TransactionId xid);
+void ReorderBufferXidSetTimetravel(ReorderBuffer *, TransactionId xid, XLogRecPtr lsn);
+bool ReorderBufferXidDoesTimetravel(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+
+ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
+
+void ReorderBufferSetRestartPoint(ReorderBuffer *, XLogRecPtr ptr);
+
+void ReorderBufferStartup(void);
+
+#endif
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
new file mode 100644
index 0000000..20d1368
--- /dev/null
+++ b/src/include/replication/snapbuild.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * snapbuild.h
+ * Exports from replication/logical/snapbuild.c.
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * src/include/replication/snapbuild.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SNAPBUILD_H
+#define SNAPBUILD_H
+
+#include "replication/reorderbuffer.h"
+
+#include "utils/hsearch.h"
+#include "utils/snapshot.h"
+#include "access/htup.h"
+
+typedef enum
+{
+ /*
+ * Initial state, we can't do much yet.
+ */
+ SNAPBUILD_START,
+
+ /*
+ * We have collected enough information to decode tuples in transactions
+ * that started after this.
+ *
+ * Once we reached this we start to collect changes. We cannot apply them
+ * yet because the might be based on transactions that were still running
+ * when we reached them yet.
+ */
+ SNAPBUILD_FULL_SNAPSHOT,
+
+ /*
+ * Found a point after hitting built_full_snapshot where all transactions
+ * that were running at that point finished. Till we reach that we hold
+ * off calling any commit callbacks.
+ */
+ SNAPBUILD_CONSISTENT
+} SnapBuildState;
+
+typedef enum
+{
+ SNAPBUILD_SKIP,
+ SNAPBUILD_DECODE
+} SnapBuildAction;
+
+/* forward declare so we don't have to expose the struct to the public */
+struct SnapBuild;
+typedef struct SnapBuild SnapBuild;
+
+/* forward declare so we don't have to include xlogreader */
+struct XLogRecordBuffer;
+
+extern SnapBuild *AllocateSnapshotBuilder(ReorderBuffer *cache, TransactionId xmin_horizon, XLogRecPtr start_lsn);
+extern void FreeSnapshotBuilder(SnapBuild *cache);
+
+extern SnapBuildAction SnapBuildProcessRecord(SnapBuild *snapstate, struct XLogRecordBuffer *buf);
+
+extern Relation LookupRelationByRelFileNode(RelFileNode *r);
+
+extern void SnapBuildSnapDecRefcount(Snapshot snap);
+
+extern const char *SnapBuildExportSnapshot(SnapBuild *snapstate);
+extern void SnapBuildClearExportedSnapshot(void);
+
+extern SnapBuildState SnapBuildCurrentState(SnapBuild *snapstate);
+
+extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+
+#endif /* SNAPBUILD_H */
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 7eaa21b..daae320 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -66,6 +66,7 @@ typedef struct WalSnd
extern WalSnd *MyWalSnd;
+
/* There is one WalSndCtl struct for the whole database cluster */
typedef struct
{
@@ -93,7 +94,6 @@ typedef struct
extern WalSndCtlData *WalSndCtl;
-
extern void WalSndSetState(WalSndState state);
/*
@@ -108,4 +108,8 @@ extern void replication_scanner_finish(void);
extern Node *replication_parse_result;
+/* logical wal sender data gathering functions */
+extern XLogRecPtr WalSndWaitForWal(XLogRecPtr loc);
+
+
#endif /* _WALSENDER_PRIVATE_H */
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index e0eb184..75c56a9 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -116,6 +116,9 @@ typedef ItemPointerData *ItemPointer;
/*
* ItemPointerCopy
* Copies the contents of one disk item pointer to another.
+ *
+ * Should there ever be padding in an ItemPointer this would need to be handled
+ * differently as it's used as hash key.
*/
#define ItemPointerCopy(fromPointer, toPointer) \
( \
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index d8f7e9d..1a6dee9 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -79,6 +79,7 @@ typedef enum LWLockId
SerializablePredicateLockListLock,
OldSerXidLock,
SyncRepLock,
+ LogicalReplicationCtlLock,
/* Individual lock IDs end here */
FirstBufMappingLock,
FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index fe0bad7..5465be5 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -49,7 +49,7 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
extern bool TransactionIdIsActive(TransactionId xid);
-extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked);
+extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum, bool systable, bool alreadyLocked);
extern TransactionId GetOldestActiveTransactionId(void);
extern VirtualTransactionId *GetVirtualXIDsDelayingChkpt(int *nvxids);
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index 9e833ca..8e1611c 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -136,4 +136,6 @@ extern void ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs
int nmsgs, bool RelcacheInitFileInval,
Oid dbid, Oid tsid);
+extern void LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg);
+
#endif /* SINVAL_H */
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index feb55f1..4b9d967 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -66,5 +66,5 @@ extern void CallSyscacheCallbacks(int cacheid, uint32 hashvalue);
extern void inval_twophase_postcommit(TransactionId xid, uint16 info,
void *recdata, uint32 len);
-
+extern void InvalidateSystemCaches(void);
#endif /* INVAL_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index bd2466e..9cbf8a1 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -104,6 +104,7 @@ typedef struct RelationData
List *rd_indexlist; /* list of OIDs of indexes on relation */
Bitmapset *rd_indexattr; /* identifies columns used in indexes */
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
+ Bitmapset *rd_ckeyattr; /* cols that are included ref'd by pkey */
Oid rd_oidindex; /* OID of unique index on OID, if any */
LockInfoData rd_lockInfo; /* lock mgr's info for locking relation */
RuleLock *rd_rules; /* rewrite rules */
@@ -220,6 +221,7 @@ typedef struct StdRdOptions
int fillfactor; /* page fill factor in percent (0..100) */
AutoVacOpts autovacuum; /* autovacuum-related options */
bool security_barrier; /* for views */
+ bool treat_as_catalog_table; /* treat as timetraveleable table */
} StdRdOptions;
#define HEAP_MIN_FILLFACTOR 10
@@ -256,6 +258,15 @@ typedef struct StdRdOptions
((StdRdOptions *) (relation)->rd_options)->security_barrier : false)
/*
+ * RelationIsTreatedAsCatalogTable
+ * Returns whether the relation should be treated as a catalog table
+ * from the pov of logical decoding.
+ */
+#define RelationIsTreatedAsCatalogTable(relation) \
+ ((relation)->rd_options ? \
+ ((StdRdOptions *) (relation)->rd_options)->treat_as_catalog_table : false)
+
+/*
* RelationIsValid
* True iff relation descriptor is valid.
*/
@@ -407,7 +418,6 @@ typedef struct StdRdOptions
((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP && \
!(relation)->rd_islocaltemp)
-
/*
* RelationIsScannable
* Currently can only be false for a materialized view which has not been
@@ -424,6 +434,24 @@ typedef struct StdRdOptions
*/
#define RelationIsPopulated(relation) ((relation)->rd_rel->relispopulated)
+/*
+ * RelationIsDoingTimetravel
+ * True if we need to log enough information to provide timetravel access
+ */
+#define RelationIsDoingTimetravel(relation) \
+ (wal_level >= WAL_LEVEL_LOGICAL && \
+ RelationIsDoingTimetravelInternal(relation))
+
+/*
+ * RelationIsLogicallyLogged
+ * True if we need to log enough information to provide timetravel access
+ */
+#define RelationIsLogicallyLogged(relation) \
+ (wal_level >= WAL_LEVEL_LOGICAL && \
+ RelationIsLogicallyLoggedInternal(relation))
+
+extern bool RelationIsDoingTimetravelInternal(Relation relation);
+extern bool RelationIsLogicallyLoggedInternal(Relation relation);
/* routines in utils/cache/relcache.c */
extern void RelationIncrementReferenceCount(Relation rel);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8ac2549..cfeded8 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -41,7 +41,16 @@ extern List *RelationGetIndexList(Relation relation);
extern Oid RelationGetOidIndex(Relation relation);
extern List *RelationGetIndexExpressions(Relation relation);
extern List *RelationGetIndexPredicate(Relation relation);
-extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation, bool keyAttrs);
+
+typedef enum IndexAttrBitmapKind {
+ INDEX_ATTR_BITMAP_ALL,
+ INDEX_ATTR_BITMAP_KEY,
+ INDEX_ATTR_BITMAP_CANDIDATE_KEY
+} IndexAttrBitmapKind;
+
+extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
+ IndexAttrBitmapKind keyAttrs);
+
extern void RelationGetExclusionInfo(Relation indexRelation,
Oid **operators,
Oid **procs,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index bfbd8dd..b6a766a 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -23,6 +23,7 @@ extern bool FirstSnapshotSet;
extern TransactionId TransactionXmin;
extern TransactionId RecentXmin;
extern TransactionId RecentGlobalXmin;
+extern TransactionId RecentGlobalDataXmin;
extern Snapshot GetTransactionSnapshot(void);
extern Snapshot GetLatestSnapshot(void);
@@ -50,4 +51,6 @@ extern bool XactHasExportedSnapshots(void);
extern void DeleteAllExportedSnapshotFiles(void);
extern bool ThereAreNoPriorRegisteredSnapshots(void);
+extern char *ExportSnapshot(Snapshot snapshot);
+
#endif /* SNAPMGR_H */
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 800e366..f686607 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -39,7 +39,8 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
/* This macro encodes the knowledge of which snapshots are MVCC-safe */
#define IsMVCCSnapshot(snapshot) \
- ((snapshot)->satisfies == HeapTupleSatisfiesMVCC)
+ ((snapshot)->satisfies == HeapTupleSatisfiesMVCC || \
+ (snapshot)->satisfies == HeapTupleSatisfiesMVCCDuringDecoding)
/*
* HeapTupleSatisfiesVisibility
@@ -90,4 +91,34 @@ extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid);
extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
+/*
+ * Special "satisfies" routines used during decoding xlog from a different
+ * point of lsn. Also used for timetravel SnapshotNow's.
+ */
+extern bool HeapTupleSatisfiesMVCCDuringDecoding(HeapTuple htup,
+ Snapshot snapshot, Buffer buffer);
+
+/*
+ * install the 'snapshot_now' snapshot as a timetravelling snapshot replacing
+ * the normal SnapshotNow behaviour. This snapshot needs to have been created
+ * by snapbuild.c otherwise you will see crashes!
+ *
+ * FIXME: We need something resembling the real SnapshotNow to handle things
+ * like enum lookups from indices correctly.
+ */
+extern void SetupDecodingSnapshots(Snapshot snapshot_now, HTAB *tuplecids);
+extern void RevertFromDecodingSnapshots(void);
+extern void SuspendDecodingSnapshots(void);
+extern void UnSuspendDecodingSnapshots(void);
+
+/*
+ * resolve combocids and overwritten cmin values
+ *
+ * To avoid leaking to much knowledge about the reorderbuffer this is
+ * implemented in reorderbuffer.c not tqual.c.
+ */
+extern bool ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data, HeapTuple htup,
+ Buffer buffer,
+ CommandId *cmin, CommandId *cmax);
+
#endif /* TQUAL_H */
diff --git a/src/test/regress/expected/logical.out b/src/test/regress/expected/logical.out
new file mode 100644
index 0000000..e59f7d9
--- /dev/null
+++ b/src/test/regress/expected/logical.out
@@ -0,0 +1,7 @@
+--CHECKPOINT;
+CREATE EXTENSION test_logical_decoding;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 57ae842..bc02e08 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1678,6 +1678,13 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, +
| pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock +
| FROM pg_database d;
+ pg_stat_logical_decoding | SELECT l.slot_name, +
+ | l.plugin, +
+ | l.database, +
+ | l.active, +
+ | l.xmin, +
+ | l.restart_decoding_lsn +
+ | FROM pg_stat_get_logical_decoding_slots() l(slot_name, plugin, database, active, xmin, restart_decoding_lsn);
pg_stat_replication | SELECT s.pid, +
| s.usesysid, +
| u.rolname AS usename, +
@@ -2139,7 +2146,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| FROM tv;
tvvmv | SELECT tvvm.grandtot +
| FROM tvvm;
-(64 rows)
+(65 rows)
SELECT tablename, rulename, definition FROM pg_rules
ORDER BY tablename, rulename;
diff --git a/src/test/regress/sql/logical.sql b/src/test/regress/sql/logical.sql
new file mode 100644
index 0000000..0c7fd2b
--- /dev/null
+++ b/src/test/regress/sql/logical.sql
@@ -0,0 +1,3 @@
+--CHECKPOINT;
+CREATE EXTENSION test_logical_decoding;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 452235d..3a6e465 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -621,6 +621,7 @@ Form_pg_ts_template
Form_pg_type
Form_pg_user_mapping
FormatNode
+FreeLogicalReplicationCmd
FromCharDateMode
FromExpr
FuncCall
@@ -791,6 +792,7 @@ IdentifySystemCmd
IncrementVarSublevelsUp_context
Index
IndexArrayKeyInfo
+IndexAttrBitmapKind
IndexBuildCallback
IndexBuildResult
IndexBulkDeleteCallback
@@ -818,6 +820,7 @@ IndxInfo
InfoItem
InhInfo
InhOption
+InitLogicalReplicationCmd
InheritableSocket
InlineCodeBlock
InsertStmt
@@ -937,6 +940,17 @@ LockTupleMode
LockingClause
LogOpts
LogStmtLevel
+LogicalDecodeBeginCB
+LogicalDecodeChangeCB
+LogicalDecodeCleanupCB
+LogicalDecodeCommitCB
+LogicalDecodeInitCB
+LogicalDecodingCheckpointData
+LogicalDecodingContext
+LogicalDecodingCtlData
+LogicalDecodingSlot
+LogicalOutputPluginWriterPrepareWrite
+LogicalOutputPluginWriterWrite
LogicalTape
LogicalTapeSet
MAGIC
@@ -1050,6 +1064,7 @@ OprInfo
OprProofCacheEntry
OprProofCacheKey
OutputContext
+OutputPluginCallbacks
OverrideSearchPath
OverrideStackEntry
PACE_HEADER
@@ -1464,6 +1479,21 @@ Relids
RelocationBufferInfo
RenameStmt
ReopenPtr
+ReorderBuffer
+ReorderBufferApplyChangeCB
+ReorderBufferBeginCB
+ReorderBufferChange
+ReorderBufferChangeTypeInternal
+ReorderBufferCommitCB
+ReorderBufferDiskChange
+ReorderBufferIterTXNEntry
+ReorderBufferIterTXNState
+ReorderBufferToastEnt
+ReorderBufferTupleBuf
+ReorderBufferTupleCidEnt
+ReorderBufferTupleCidKey
+ReorderBufferTXN
+ReorderBufferTXNByIdEnt
ReplaceVarsFromTargetList_context
ReplaceVarsNoMatchOption
ResTarget
@@ -1518,6 +1548,8 @@ SID_NAME_USE
SISeg
SMgrRelation
SMgrRelationData
+SnapBuildAction
+SnapBuildState
SOCKADDR
SOCKET
SPELL
@@ -1609,6 +1641,8 @@ SlruSharedData
Snapshot
SnapshotData
SnapshotSatisfiesFunc
+Snapstate
+SnapstateOnDisk
SockAddr
Sort
SortBy
@@ -1651,6 +1685,7 @@ StandardChunkHeader
StartBlobPtr
StartBlobsPtr
StartDataPtr
+StartLogicalReplicationCmd
StartReplicationCmd
StartupPacket
StatEntry
@@ -1874,6 +1909,7 @@ WalRcvData
WalRcvState
WalSnd
WalSndCtlData
+WalSndSendData
WalSndState
WholeRowVarExprState
WindowAgg
@@ -1925,6 +1961,7 @@ XLogReaderState
XLogRecData
XLogRecPtr
XLogRecord
+XLogRecordBuffer
XLogSegNo
XLogSource
XLogwrtResult
@@ -2348,6 +2385,7 @@ symbol
tablespaceinfo
teReqs
teSection
+TestDecodingData
temp_tablespaces_extra
text
timeKEY
@@ -2420,11 +2458,13 @@ xl_heap_cleanup_info
xl_heap_delete
xl_heap_freeze
xl_heap_header
+xl_heap_header_len
xl_heap_inplace
xl_heap_insert
xl_heap_lock
xl_heap_lock_updated
xl_heap_multi_insert
+xl_heap_new_cid
xl_heap_newpage
xl_heap_update
xl_heap_visible
--
1.8.2.rc2.4.g7799588.dirty
0014-wal_decoding-test_decoding-Add-a-simple-decoding-mod.patchtext/x-patch; charset=us-asciiDownload
>From c4b278fc30f34863cdddef5b4fe7fa0b37c50e76 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 14/17] wal_decoding: test_decoding: Add a simple decoding
module in contrib
This is mostly useful for testing, demonstration and documentation purposes.
---
contrib/Makefile | 1 +
contrib/test_decoding/Makefile | 16 ++
contrib/test_decoding/test_decoding.c | 325 ++++++++++++++++++++++++++++++++++
3 files changed, 342 insertions(+)
create mode 100644 contrib/test_decoding/Makefile
create mode 100644 contrib/test_decoding/test_decoding.c
diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..6d2fe32 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -50,6 +50,7 @@ SUBDIRS = \
tablefunc \
tcn \
test_parser \
+ test_decoding \
tsearch2 \
unaccent \
vacuumlo \
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
new file mode 100644
index 0000000..2ac9653
--- /dev/null
+++ b/contrib/test_decoding/Makefile
@@ -0,0 +1,16 @@
+# contrib/test_decoding/Makefile
+
+MODULE_big = test_decoding
+OBJS = test_decoding.o
+
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_decoding
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
new file mode 100644
index 0000000..fc846bc
--- /dev/null
+++ b/contrib/test_decoding/test_decoding.c
@@ -0,0 +1,325 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_decoding.c
+ * example output plugin for the logical replication functionality
+ *
+ * Copyright (c) 2012-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/test_decoding/test_decoding.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/sysattr.h"
+
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "catalog/index.h"
+
+#include "nodes/parsenodes.h"
+
+#include "replication/output_plugin.h"
+#include "replication/logical.h"
+
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relcache.h"
+#include "utils/syscache.h"
+#include "utils/typcache.h"
+
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+
+typedef struct
+{
+ MemoryContext context;
+ bool include_xids;
+} TestDecodingData;
+
+/* These must be available to pg_dlsym() */
+extern void pg_decode_init(LogicalDecodingContext *ctx, bool is_init);
+extern bool pg_decode_begin_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn);
+extern bool pg_decode_commit_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+extern bool pg_decode_change(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, Relation rel,
+ ReorderBufferChange *change);
+
+void
+_PG_init(void)
+{
+}
+
+/* initialize this plugin */
+void
+pg_decode_init(LogicalDecodingContext *ctx, bool is_init)
+{
+ ListCell *option;
+ TestDecodingData *data;
+
+ AssertVariableIsOfType(&pg_decode_init, LogicalDecodeInitCB);
+
+ data = palloc(sizeof(TestDecodingData));
+ data->context = AllocSetContextCreate(TopMemoryContext,
+ "text conversion context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ data->include_xids = true;
+
+ ctx->output_plugin_private = data;
+
+ foreach(option, ctx->output_plugin_options)
+ {
+ DefElem *elem = lfirst(option);
+
+ Assert(elem->arg == NULL || IsA(elem->arg, String));
+
+ if (strcmp(elem->defname, "hide-xids") == 0)
+ {
+ /* FIXME: parse argument */
+ data->include_xids = false;
+ }
+ else
+ {
+ elog(WARNING, "option %s = %s is unknown",
+ elem->defname, elem->arg ? strVal(elem->arg) : "(null)");
+ }
+ }
+}
+
+/* BEGIN callback */
+bool
+pg_decode_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ AssertVariableIsOfType(&pg_decode_begin_txn, LogicalDecodeBeginCB);
+
+ ctx->prepare_write(ctx, txn->lsn, txn->xid);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "BEGIN %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "BEGIN");
+ ctx->write(ctx, txn->lsn, txn->xid);
+
+ return true;
+}
+
+/* COMMIT callback */
+bool
+pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ AssertVariableIsOfType(&pg_decode_commit_txn, LogicalDecodeCommitCB);
+
+ ctx->prepare_write(ctx, txn->lsn, txn->xid);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "COMMIT %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "COMMIT");
+ ctx->write(ctx, txn->lsn, txn->xid);
+
+ return true;
+}
+
+/* print the tuple 'tuple' into the StringInfo s */
+static void
+tuple_to_stringinfo(StringInfo s, TupleDesc tupdesc, HeapTuple tuple)
+{
+ int natt;
+ Oid oid;
+
+ /* print oid of tuple, it's not included in the TupleDesc */
+ if ((oid = HeapTupleHeaderGetOid(tuple->t_data)) != InvalidOid)
+ {
+ appendStringInfo(s, " oid[oid]:%u", oid);
+ }
+
+ /* print all columns individually */
+ for (natt = 0; natt < tupdesc->natts; natt++)
+ {
+ Form_pg_attribute attr; /* the attribute itself */
+ Oid typid; /* type of current attribute */
+ HeapTuple type_tuple; /* information about a type */
+ Form_pg_type type_form;
+ Oid typoutput; /* output function */
+ bool typisvarlena;
+ Datum origval; /* possibly toasted Datum */
+ Datum val; /* definitely detoasted Datum */
+ char *outputstr = NULL;
+ bool isnull; /* column is null? */
+
+ attr = tupdesc->attrs[natt];
+
+ /*
+ * don't print dropped columns, we can't be sure everything is
+ * available for them
+ */
+ if (attr->attisdropped)
+ continue;
+
+ /*
+ * Don't print system columns
+ */
+ if (attr->attnum < 0)
+ continue;
+
+ typid = attr->atttypid;
+
+ /* gather type name */
+ type_tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
+ if (!HeapTupleIsValid(type_tuple))
+ elog(ERROR, "cache lookup failed for type %u", typid);
+ type_form = (Form_pg_type) GETSTRUCT(type_tuple);
+
+ /* print attribute name */
+ appendStringInfoChar(s, ' ');
+ appendStringInfoString(s, NameStr(attr->attname));
+
+ /* print attribute type */
+ appendStringInfoChar(s, '[');
+ appendStringInfoString(s, NameStr(type_form->typname));
+ appendStringInfoChar(s, ']');
+
+ /* query output function */
+ getTypeOutputInfo(typid,
+ &typoutput, &typisvarlena);
+
+ ReleaseSysCache(type_tuple);
+
+ /* get Datum from tuple */
+ origval = fastgetattr(tuple, natt + 1, tupdesc, &isnull);
+
+ if (isnull)
+ outputstr = "(null)";
+ else if (typisvarlena && VARATT_IS_EXTERNAL_ONDISK(origval))
+ outputstr = "(unchanged-toast-datum)";
+ else if (typisvarlena)
+ val = PointerGetDatum(PG_DETOAST_DATUM(origval));
+ else
+ val = origval;
+
+ /* print data */
+ if (outputstr == NULL)
+ outputstr = OidOutputFunctionCall(typoutput, val);
+
+ appendStringInfoChar(s, ':');
+ appendStringInfoString(s, outputstr);
+ }
+}
+
+/*
+ * callback for individual changed tuples
+ */
+bool
+pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ TestDecodingData *data;
+ Form_pg_class class_form;
+ TupleDesc tupdesc;
+ MemoryContext old;
+
+ AssertVariableIsOfType(&pg_decode_change, LogicalDecodeChangeCB);
+
+ data = ctx->output_plugin_private;
+ class_form = RelationGetForm(relation);
+ tupdesc = RelationGetDescr(relation);
+
+ /* Avoid leaking memory by using and resetting our own context */
+ old = MemoryContextSwitchTo(data->context);
+
+ ctx->prepare_write(ctx, change->lsn, txn->xid);
+
+ appendStringInfoString(ctx->out, "table \"");
+ appendStringInfoString(ctx->out, NameStr(class_form->relname));
+ appendStringInfoString(ctx->out, "\":");
+
+ switch (change->action)
+ {
+ case REORDER_BUFFER_CHANGE_INSERT:
+ appendStringInfoString(ctx->out, " INSERT:");
+ if (change->newtuple == NULL)
+ appendStringInfoString(ctx->out, " (no-tuple-data)");
+ else
+ tuple_to_stringinfo(ctx->out, tupdesc, &change->newtuple->tuple);
+ break;
+ case REORDER_BUFFER_CHANGE_UPDATE:
+ appendStringInfoString(ctx->out, " UPDATE:");
+ if (change->oldtuple != NULL)
+ {
+ Relation indexrel;
+ TupleDesc indexdesc;
+
+ appendStringInfoString(ctx->out, " old-pkey:");
+ RelationGetIndexList(relation);
+
+ if (!OidIsValid(relation->rd_primary))
+ {
+ elog(LOG, "tuple in table with oid: %u without primary key",
+ RelationGetRelid(relation));
+ break;
+ }
+
+ indexrel = RelationIdGetRelation(relation->rd_primary);
+
+ indexdesc = RelationGetDescr(indexrel);
+
+ tuple_to_stringinfo(ctx->out, indexdesc, &change->oldtuple->tuple);
+
+ RelationClose(indexrel);
+ appendStringInfoString(ctx->out, " new-tuple:");
+ }
+
+ if (change->newtuple == NULL)
+ appendStringInfoString(ctx->out, " (no-tuple-data)");
+ else
+ tuple_to_stringinfo(ctx->out, tupdesc, &change->newtuple->tuple);
+
+ break;
+ case REORDER_BUFFER_CHANGE_DELETE:
+ appendStringInfoString(ctx->out, " DELETE:");
+
+ /* if there was no PK, we only know that a delete happened */
+ if (change->oldtuple == NULL)
+ appendStringInfoString(ctx->out, " (no-tuple-data)");
+ /* In DELETE, only the PK is present; display that */
+ else
+ {
+ Relation indexrel;
+
+ /* make sure rd_primary is set */
+ RelationGetIndexList(relation);
+
+ if (!OidIsValid(relation->rd_primary))
+ {
+ elog(LOG, "tuple in table with oid: %u without primary key",
+ RelationGetRelid(relation));
+ break;
+ }
+
+ indexrel = RelationIdGetRelation(relation->rd_primary);
+
+ tuple_to_stringinfo(ctx->out, RelationGetDescr(indexrel),
+ &change->oldtuple->tuple);
+
+ RelationClose(indexrel);
+ }
+ break;
+ }
+
+ MemoryContextSwitchTo(old);
+ MemoryContextReset(data->context);
+
+ ctx->write(ctx, change->lsn, txn->xid);
+ return true;
+}
--
1.8.2.rc2.4.g7799588.dirty
0015-wal_decoding-pg_receivellog-Introduce-pg_receivexlog.patchtext/x-patch; charset=us-asciiDownload
>From a16f2b824b3fb8de9662d83d5610aa8b2b32f261 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 15/17] wal_decoding: pg_receivellog: Introduce pg_receivexlog
equivalent for logical changes
---
src/bin/pg_basebackup/.gitignore | 1 +
src/bin/pg_basebackup/Makefile | 8 +-
src/bin/pg_basebackup/pg_receivellog.c | 870 +++++++++++++++++++++++++++++++++
src/bin/pg_basebackup/streamutil.c | 3 +-
src/bin/pg_basebackup/streamutil.h | 1 +
5 files changed, 880 insertions(+), 3 deletions(-)
create mode 100644 src/bin/pg_basebackup/pg_receivellog.c
diff --git a/src/bin/pg_basebackup/.gitignore b/src/bin/pg_basebackup/.gitignore
index 1334a1f..eb2978c 100644
--- a/src/bin/pg_basebackup/.gitignore
+++ b/src/bin/pg_basebackup/.gitignore
@@ -1,2 +1,3 @@
/pg_basebackup
/pg_receivexlog
+/pg_receivellog
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a707c93..a41b73c 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -20,7 +20,7 @@ override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
OBJS=receivelog.o streamutil.o $(WIN32RES)
-all: pg_basebackup pg_receivexlog
+all: pg_basebackup pg_receivexlog pg_receivellog
pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport
$(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -28,9 +28,13 @@ pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport
pg_receivexlog: pg_receivexlog.o $(OBJS) | submake-libpq submake-libpgport
$(CC) $(CFLAGS) pg_receivexlog.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_receivellog: pg_receivellog.o $(OBJS) | submake-libpq submake-libpgport
+ $(CC) $(CFLAGS) pg_receivellog.o $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
install: all installdirs
$(INSTALL_PROGRAM) pg_basebackup$(X) '$(DESTDIR)$(bindir)/pg_basebackup$(X)'
$(INSTALL_PROGRAM) pg_receivexlog$(X) '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
+ $(INSTALL_PROGRAM) pg_receivellog$(X) '$(DESTDIR)$(bindir)/pg_receivellog$(X)'
installdirs:
$(MKDIR_P) '$(DESTDIR)$(bindir)'
@@ -40,4 +44,4 @@ uninstall:
rm -f '$(DESTDIR)$(bindir)/pg_receivexlog$(X)'
clean distclean maintainer-clean:
- rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o
+ rm -f pg_basebackup$(X) pg_receivexlog$(X) $(OBJS) pg_basebackup.o pg_receivexlog.o pg_receivellog.o
diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c
new file mode 100644
index 0000000..e98452d
--- /dev/null
+++ b/src/bin/pg_basebackup/pg_receivellog.c
@@ -0,0 +1,870 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_receivellog.c - receive streaming logical log data and write it
+ * to a local file.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/pg_receivellog.c
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * We have to use postgres.h not postgres_fe.h here, because there's so much
+ * backend-only stuff in the XLOG include files we need. But we need a
+ * frontend-ish environment otherwise. Hence this ugly hack.
+ */
+#define FRONTEND 1
+#include "postgres.h"
+
+#include "common/fe_memutils.h"
+#include "libpq-fe.h"
+#include "libpq/pqsignal.h"
+#include "access/xlog_internal.h"
+#include "utils/datetime.h"
+#include "utils/timestamp.h"
+
+#include "receivelog.h"
+#include "streamutil.h"
+
+#include <dirent.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "getopt_long.h"
+
+/* Time to sleep between reconnection attempts */
+#define RECONNECT_SLEEP_TIME 5
+
+/* Global options */
+static char *outfile = NULL;
+static int outfd = -1;
+static int verbose = 0;
+static int noloop = 0;
+static int standby_message_timeout = 10 * 1000; /* 10 sec = default */
+static volatile bool time_to_abort = false;
+static const char *plugin = "test_decoding";
+static const char *slot = NULL;
+static XLogRecPtr startpos;
+static bool do_init_slot = false;
+static bool do_start_slot = false;
+static bool do_stop_slot = false;
+
+
+static void usage(void);
+static void StreamLog();
+
+static void
+usage(void)
+{
+ printf(_("%s receives PostgreSQL logical change stream.\n\n"),
+ progname);
+ printf(_("Usage:\n"));
+ printf(_(" %s [OPTION]...\n"), progname);
+ printf(_("\nOptions:\n"));
+ printf(_(" -f, --file=FILE receive log into this file. - for stdout\n"));
+ printf(_(" -n, --no-loop do not loop on connection lost\n"));
+ printf(_(" -v, --verbose output verbose messages\n"));
+ printf(_(" -V, --version output version information, then exit\n"));
+ printf(_(" -?, --help show this help, then exit\n"));
+ printf(_("\nConnection options:\n"));
+ printf(_(" -d, --database=DBNAME database to connect to\n"));
+ printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
+ printf(_(" -p, --port=PORT database server port number\n"));
+ printf(_(" -U, --username=NAME connect as specified database user\n"));
+ printf(_(" -w, --no-password never prompt for password\n"));
+ printf(_(" -W, --password force password prompt (should happen automatically)\n"));
+ printf(_("\nReplication options:\n"));
+ printf(_(" -P, --plugin=PLUGIN use output plugin PLUGIN (defaults to test_decoding)\n"));
+ printf(_(" -s, --status-interval=INTERVAL\n"
+ " time between status packets sent to server (in seconds)\n"));
+ printf(_(" -S, --slot=SLOT use existing replication slot SLOT instead of starting a new one\n"));
+ printf(_("\nAction to be performed:\n"));
+ printf(_(" --init initiate a new replication slot (for the slotname see --slot)\n"));
+ printf(_(" --start start streaming in a replication slot (for the slotname see --slot)\n"));
+ printf(_(" --stop stop the replication slot (for the slotname see --slot)\n"));
+ printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+
+/*
+ * Local version of GetCurrentTimestamp(), since we are not linked with
+ * backend code. The protocol always uses integer timestamps, regardless of
+ * server setting.
+ */
+static int64
+localGetCurrentTimestamp(void)
+{
+ int64 result;
+ struct timeval tp;
+
+ gettimeofday(&tp, NULL);
+
+ result = (int64) tp.tv_sec -
+ ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY);
+
+ result = (result * USECS_PER_SEC) + tp.tv_usec;
+
+ return result;
+}
+
+/*
+ * Local version of TimestampDifference(), since we are not linked with
+ * backend code.
+ */
+static void
+localTimestampDifference(int64 start_time, int64 stop_time,
+ long *secs, int *microsecs)
+{
+ int64 diff = stop_time - start_time;
+
+ if (diff <= 0)
+ {
+ *secs = 0;
+ *microsecs = 0;
+ }
+ else
+ {
+ *secs = (long) (diff / USECS_PER_SEC);
+ *microsecs = (int) (diff % USECS_PER_SEC);
+ }
+}
+
+/*
+ * Local version of TimestampDifferenceExceeds(), since we are not
+ * linked with backend code.
+ */
+static bool
+localTimestampDifferenceExceeds(int64 start_time,
+ int64 stop_time,
+ int msec)
+{
+ int64 diff = stop_time - start_time;
+
+ return (diff >= msec * INT64CONST(1000));
+}
+
+/*
+ * Converts an int64 to network byte order.
+ */
+static void
+sendint64(int64 i, char *buf)
+{
+ uint32 n32;
+
+ /* High order half first, since we're doing MSB-first */
+ n32 = (uint32) (i >> 32);
+ n32 = htonl(n32);
+ memcpy(&buf[0], &n32, 4);
+
+ /* Now the low order half */
+ n32 = (uint32) i;
+ n32 = htonl(n32);
+ memcpy(&buf[4], &n32, 4);
+}
+
+/*
+ * Converts an int64 from network byte order to native format.
+ */
+static int64
+recvint64(char *buf)
+{
+ int64 result;
+ uint32 h32;
+ uint32 l32;
+
+ memcpy(&h32, buf, 4);
+ memcpy(&l32, buf + 4, 4);
+ h32 = ntohl(h32);
+ l32 = ntohl(l32);
+
+ result = h32;
+ result <<= 32;
+ result |= l32;
+
+ return result;
+}
+
+/*
+ * Send a Standby Status Update message to server.
+ */
+static bool
+sendFeedback(PGconn *conn, XLogRecPtr blockpos, int64 now, bool replyRequested)
+{
+ char replybuf[1 + 8 + 8 + 8 + 8 + 1];
+ int len = 0;
+
+ if (blockpos == startpos)
+ return true;
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: confirming flush up to %X/%X (slot %s)\n"),
+ progname, (uint32) (blockpos >> 32), (uint32) blockpos,
+ slot);
+
+ replybuf[len] = 'r';
+ len += 1;
+ sendint64(blockpos, &replybuf[len]); /* write */
+ len += 8;
+ sendint64(blockpos, &replybuf[len]); /* flush */
+ len += 8;
+ sendint64(InvalidXLogRecPtr, &replybuf[len]); /* apply */
+ len += 8;
+ sendint64(now, &replybuf[len]); /* sendTime */
+ len += 8;
+ replybuf[len] = replyRequested ? 1 : 0; /* replyRequested */
+ len += 1;
+
+ startpos = blockpos;
+
+ if (PQputCopyData(conn, replybuf, len) <= 0 || PQflush(conn))
+ {
+ fprintf(stderr, _("%s: could not send feedback packet: %s"),
+ progname, PQerrorMessage(conn));
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Start the log streaming
+ */
+static void
+StreamLog(void)
+{
+ PGresult *res;
+ char query[256];
+ char *copybuf = NULL;
+ int64 last_status = -1;
+ XLogRecPtr logoff = InvalidXLogRecPtr;
+
+ /*
+ * Connect in replication mode to the server
+ */
+ if (!conn)
+ conn = GetConnection();
+ if (!conn)
+ /* Error message already written in GetConnection() */
+ return;
+
+ /*
+ * Start the replication
+ */
+ if (verbose)
+ fprintf(stderr,
+ _("%s: starting log streaming at %X/%X (slot %s)\n"),
+ progname, (uint32) (startpos >> 32), (uint32) startpos,
+ slot);
+
+ /* Initiate the replication stream at specified location */
+ snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION \"%s\" %X/%X",
+ slot, (uint32) (startpos >> 32), (uint32) startpos);
+ res = PQexec(conn, query);
+ if (PQresultStatus(res) != PGRES_COPY_BOTH)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s\n"),
+ progname, query, PQresultErrorMessage(res));
+ PQclear(res);
+ goto error;
+ }
+ PQclear(res);
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: initiated streaming\n"),
+ progname);
+
+ while (!time_to_abort)
+ {
+ int r;
+ int bytes_left;
+ int bytes_written;
+ int64 now;
+ int hdr_len;
+
+ if (copybuf != NULL)
+ {
+ PQfreemem(copybuf);
+ copybuf = NULL;
+ }
+
+ /*
+ * Potentially send a status message to the master
+ */
+ now = localGetCurrentTimestamp();
+ if (standby_message_timeout > 0 &&
+ localTimestampDifferenceExceeds(last_status, now,
+ standby_message_timeout))
+ {
+ /* Time to send feedback! */
+ if (!sendFeedback(conn, logoff, now, false))
+ goto error;
+
+ last_status = now;
+ }
+
+ r = PQgetCopyData(conn, ©buf, 1);
+ if (r == 0)
+ {
+ /*
+ * In async mode, and no data available. We block on reading but
+ * not more than the specified timeout, so that we can send a
+ * response back to the client.
+ */
+ fd_set input_mask;
+ struct timeval timeout;
+ struct timeval *timeoutptr;
+
+ FD_ZERO(&input_mask);
+ FD_SET(PQsocket(conn), &input_mask);
+ if (standby_message_timeout)
+ {
+ int64 targettime;
+ long secs;
+ int usecs;
+
+ targettime = last_status + (standby_message_timeout - 1) *
+ ((int64) 1000);
+ localTimestampDifference(now,
+ targettime,
+ &secs,
+ &usecs);
+ if (secs <= 0)
+ timeout.tv_sec = 1; /* Always sleep at least 1 sec */
+ else
+ timeout.tv_sec = secs;
+ timeout.tv_usec = usecs;
+ timeoutptr = &timeout;
+ }
+ else
+ timeoutptr = NULL;
+
+ r = select(PQsocket(conn) + 1, &input_mask, NULL, NULL, timeoutptr);
+ if (r == 0 || (r < 0 && errno == EINTR))
+ {
+ /*
+ * Got a timeout or signal. Continue the loop and either
+ * deliver a status packet to the server or just go back into
+ * blocking.
+ */
+ continue;
+ }
+ else if (r < 0)
+ {
+ fprintf(stderr, _("%s: select() failed: %s\n"),
+ progname, strerror(errno));
+ goto error;
+ }
+ /* Else there is actually data on the socket */
+ if (PQconsumeInput(conn) == 0)
+ {
+ fprintf(stderr,
+ _("%s: could not receive data from WAL stream: %s"),
+ progname, PQerrorMessage(conn));
+ goto error;
+ }
+ continue;
+ }
+ if (r == -1)
+ /* End of copy stream */
+ break;
+ if (r == -2)
+ {
+ fprintf(stderr, _("%s: could not read COPY data: %s"),
+ progname, PQerrorMessage(conn));
+ goto error;
+ }
+
+ /* Check the message type. */
+ if (copybuf[0] == 'k')
+ {
+ int pos;
+ bool replyRequested;
+
+ /*
+ * Parse the keepalive message, enclosed in the CopyData message.
+ * We just check if the server requested a reply, and ignore the
+ * rest.
+ */
+ pos = 1; /* skip msgtype 'k' */
+ pos += 8; /* skip walEnd */
+ pos += 8; /* skip sendTime */
+
+ if (r < pos + 1)
+ {
+ fprintf(stderr, _("%s: streaming header too small: %d\n"),
+ progname, r);
+ goto error;
+ }
+ replyRequested = copybuf[pos];
+
+ /* If the server requested an immediate reply, send one. */
+ if (replyRequested)
+ {
+ now = localGetCurrentTimestamp();
+ if (!sendFeedback(conn, logoff, now, false))
+ goto error;
+ last_status = now;
+ }
+ continue;
+ }
+ else if (copybuf[0] != 'w')
+ {
+ fprintf(stderr, _("%s: unrecognized streaming header: \"%c\"\n"),
+ progname, copybuf[0]);
+ goto error;
+ }
+
+
+ /*
+ * Read the header of the XLogData message, enclosed in the CopyData
+ * message. We only need the WAL location field (dataStart), the rest
+ * of the header is ignored.
+ */
+ hdr_len = 1; /* msgtype 'w' */
+ hdr_len += 8; /* dataStart */
+ hdr_len += 8; /* walEnd */
+ hdr_len += 8; /* sendTime */
+ if (r < hdr_len + 1)
+ {
+ fprintf(stderr, _("%s: streaming header too small: %d\n"),
+ progname, r);
+ goto error;
+ }
+
+ /* Extract WAL location for this block */
+ {
+ XLogRecPtr temp = recvint64(©buf[1]);
+
+ logoff = Max(temp, logoff);
+ }
+
+ if (outfd == -1 && strcmp(outfile, "-") == 0)
+ {
+ outfd = 1;
+ }
+ else if (outfd == -1)
+ {
+ outfd = open(outfile, O_CREAT | O_APPEND | O_WRONLY | PG_BINARY,
+ S_IRUSR | S_IWUSR);
+ if (outfd == -1)
+ {
+ fprintf(stderr,
+ _("%s: could not open log file \"%s\": %s\n"),
+ progname, outfile, strerror(errno));
+ goto error;
+ }
+ }
+
+ bytes_left = r - hdr_len;
+ bytes_written = 0;
+
+
+ while (bytes_left)
+ {
+ int ret;
+
+ ret = write(outfd,
+ copybuf + hdr_len + bytes_written,
+ bytes_left);
+
+ if (ret < 0)
+ {
+ fprintf(stderr,
+ _("%s: could not write %u bytes to log file \"%s\": %s\n"),
+ progname, bytes_left, outfile,
+ strerror(errno));
+ goto error;
+ }
+
+ /* Write was successful, advance our position */
+ bytes_written += ret;
+ bytes_left -= ret;
+ }
+
+ if (write(outfd, "\n", 1) != 1)
+ {
+ fprintf(stderr,
+ _("%s: could not write %u bytes to log file \"%s\": %s\n"),
+ progname, 1, outfile,
+ strerror(errno));
+ goto error;
+ }
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ fprintf(stderr,
+ _("%s: unexpected termination of replication stream: %s"),
+ progname, PQresultErrorMessage(res));
+ goto error;
+ }
+ PQclear(res);
+
+ if (copybuf != NULL)
+ PQfreemem(copybuf);
+
+ if (outfd != -1 && close(outfd) != 0)
+ fprintf(stderr, _("%s: could not close file \"%s\": %s\n"),
+ progname, outfile, strerror(errno));
+ outfd = -1;
+error:
+ PQfinish(conn);
+ conn = NULL;
+}
+
+/*
+ * When sigint is called, just tell the system to exit at the next possible
+ * moment.
+ */
+#ifndef WIN32
+
+static void
+sigint_handler(int signum)
+{
+ time_to_abort = true;
+}
+#endif
+
+int
+main(int argc, char **argv)
+{
+ PGresult *res;
+ static struct option long_options[] = {
+/* general options */
+ {"file", required_argument, NULL, 'f'},
+ {"no-loop", no_argument, NULL, 'n'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"version", no_argument, NULL, 'V'},
+ {"help", no_argument, NULL, '?'},
+/* connnection options */
+ {"database", required_argument, NULL, 'd'},
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+/* replication options */
+ {"plugin", required_argument, NULL, 'P'},
+ {"status-interval", required_argument, NULL, 's'},
+ {"slot", required_argument, NULL, 'S'},
+ {"startpos", required_argument, NULL, 'I'},
+/* action */
+ {"init", no_argument, NULL, 1},
+ {"start", no_argument, NULL, 2},
+ {"stop", no_argument, NULL, 3},
+ {NULL, 0, NULL, 0}
+ };
+ int c;
+ int option_index;
+ uint32 hi,
+ lo;
+
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_receivellog"));
+
+ if (argc > 1)
+ {
+ if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+ {
+ usage();
+ exit(0);
+ }
+ else if (strcmp(argv[1], "-V") == 0 ||
+ strcmp(argv[1], "--version") == 0)
+ {
+ puts("pg_receivellog (PostgreSQL) " PG_VERSION);
+ exit(0);
+ }
+ }
+
+ while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:",
+ long_options, &option_index)) != -1)
+ {
+ switch (c)
+ {
+/* general options */
+ case 'f':
+ outfile = pg_strdup(optarg);
+ break;
+ case 'n':
+ noloop = 1;
+ break;
+ case 'v':
+ verbose++;
+ break;
+/* connnection options */
+ case 'd':
+ dbname = pg_strdup(optarg);
+ break;
+ case 'h':
+ dbhost = pg_strdup(optarg);
+ break;
+ case 'p':
+ if (atoi(optarg) <= 0)
+ {
+ fprintf(stderr, _("%s: invalid port number \"%s\"\n"),
+ progname, optarg);
+ exit(1);
+ }
+ dbport = pg_strdup(optarg);
+ break;
+ case 'U':
+ dbuser = pg_strdup(optarg);
+ break;
+ case 'w':
+ dbgetpassword = -1;
+ break;
+ case 'W':
+ dbgetpassword = 1;
+ break;
+/* replication options */
+ case 'P':
+ plugin = pg_strdup(optarg);
+ break;
+ case 's':
+ standby_message_timeout = atoi(optarg) * 1000;
+ if (standby_message_timeout < 0)
+ {
+ fprintf(stderr, _("%s: invalid status interval \"%s\"\n"),
+ progname, optarg);
+ exit(1);
+ }
+ break;
+ case 'S':
+ slot = pg_strdup(optarg);
+ break;
+ case 'I':
+ if (sscanf(optarg, "%X/%X", &hi, &lo) != 2)
+ {
+ fprintf(stderr,
+ _("%s: could not parse start position \"%s\"\n"),
+ progname, optarg);
+ exit(1);
+ }
+ startpos = ((uint64) hi) << 32 | lo;
+ break;
+ case 1:
+ do_init_slot = true;
+ break;
+ case 2:
+ do_start_slot = true;
+ break;
+ case 3:
+ do_stop_slot = true;
+ break;
+/* action */
+
+ default:
+
+ /*
+ * getopt_long already emitted a complaint
+ */
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ }
+
+ /*
+ * Any non-option arguments?
+ */
+ if (optind < argc)
+ {
+ fprintf(stderr,
+ _("%s: too many command-line arguments (first is \"%s\")\n"),
+ progname, argv[optind]);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Required arguments
+ */
+ if (slot == NULL)
+ {
+ fprintf(stderr, _("%s: no slot specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (!do_stop_slot && outfile == NULL)
+ {
+ fprintf(stderr, _("%s: no target file specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (!do_stop_slot && dbname == NULL)
+ {
+ fprintf(stderr, _("%s: no database specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (!do_stop_slot && !do_init_slot && !do_start_slot)
+ {
+ fprintf(stderr, _("%s: at least one action needs to be specified\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+
+ }
+
+ if (do_stop_slot && (do_init_slot || do_start_slot))
+ {
+ fprintf(stderr, _("%s: --stop cannot be combined with --init or --start\n"), progname);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+
+ }
+#ifndef WIN32
+ pqsignal(SIGINT, sigint_handler);
+#endif
+
+
+ /*
+ * don't really need this but it actually helps to get more precise error
+ * messages about authentication and such.
+ */
+ {
+ conn = GetConnection();
+ if (!conn)
+ /* Error message already written in GetConnection() */
+ exit(1);
+
+ /*
+ * Run IDENTIFY_SYSTEM so we can get the timeline and current xlog
+ * position.
+ */
+ res = PQexec(conn, "IDENTIFY_SYSTEM");
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+ progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
+ disconnect_and_exit(1);
+ }
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
+ {
+ fprintf(stderr,
+ _("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
+ progname, PQntuples(res), PQnfields(res), 1, 4);
+ disconnect_and_exit(1);
+ }
+ PQclear(res);
+ }
+
+
+ /*
+ * stop a replication slot
+ */
+ if (do_stop_slot)
+ {
+ char query[256];
+
+ snprintf(query, sizeof(query), "FREE_LOGICAL_REPLICATION \"%s\"",
+ slot);
+ res = PQexec(conn, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+ progname, query, PQerrorMessage(conn));
+ disconnect_and_exit(1);
+ }
+
+ if (PQntuples(res) != 0 || PQnfields(res) != 0)
+ {
+ fprintf(stderr,
+ _("%s: could not stop logical rep: got %d rows and %d fields, expected %d rows and %d fields\n"),
+ progname, PQntuples(res), PQnfields(res), 0, 0);
+ disconnect_and_exit(1);
+ }
+
+ PQclear(res);
+ disconnect_and_exit(0);
+ }
+
+ /*
+ * init a replication slot
+ */
+ if (do_init_slot)
+ {
+ char query[256];
+
+ if (verbose)
+ fprintf(stderr,
+ _("%s: init replication slot\n"),
+ progname);
+
+ snprintf(query, sizeof(query), "INIT_LOGICAL_REPLICATION \"%s\" \"%s\"",
+ slot, plugin);
+
+ res = PQexec(conn, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+ progname, query, PQerrorMessage(conn));
+ disconnect_and_exit(1);
+ }
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
+ {
+ fprintf(stderr,
+ _("%s: could not init logical rep: got %d rows and %d fields, expected %d rows and %d fields\n"),
+ progname, PQntuples(res), PQnfields(res), 1, 4);
+ disconnect_and_exit(1);
+ }
+
+ if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+ {
+ fprintf(stderr,
+ _("%s: could not parse log location \"%s\"\n"),
+ progname, PQgetvalue(res, 0, 1));
+ disconnect_and_exit(1);
+ }
+ startpos = ((uint64) hi) << 32 | lo;
+
+ slot = strdup(PQgetvalue(res, 0, 0));
+ PQclear(res);
+ }
+
+
+ if (!do_start_slot)
+ disconnect_and_exit(0);
+
+ while (true)
+ {
+ StreamLog();
+ if (time_to_abort)
+ {
+ /*
+ * We've been Ctrl-C'ed. That's not an error, so exit without an
+ * errorcode.
+ */
+ exit(0);
+ }
+ else if (noloop)
+ {
+ fprintf(stderr, _("%s: disconnected.\n"), progname);
+ exit(1);
+ }
+ else
+ {
+ fprintf(stderr,
+ /* translator: check source for value for %d */
+ _("%s: disconnected. Waiting %d seconds to try again.\n"),
+ progname, RECONNECT_SLEEP_TIME);
+ pg_usleep(RECONNECT_SLEEP_TIME * 1000000);
+ }
+ }
+}
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index 6891c2c..64b2e003 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -22,6 +22,7 @@ char *connection_string = NULL;
char *dbhost = NULL;
char *dbuser = NULL;
char *dbport = NULL;
+char *dbname = NULL;
int dbgetpassword = 0; /* 0=auto, -1=never, 1=always */
static char *dbpassword = NULL;
PGconn *conn = NULL;
@@ -86,7 +87,7 @@ GetConnection(void)
}
keywords[i] = "dbname";
- values[i] = "replication";
+ values[i] = dbname == NULL ? "replication" : dbname;
i++;
keywords[i] = "replication";
values[i] = "true";
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 77d6b86..78f20da 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -5,6 +5,7 @@ extern char *connection_string;
extern char *dbhost;
extern char *dbuser;
extern char *dbport;
+extern char *dbname;
extern int dbgetpassword;
/* Connection kept global so we can disconnect easily */
--
1.8.2.rc2.4.g7799588.dirty
0016-wal_decoding-test_logical_decoding-Add-extension-for.patchtext/x-patch; charset=us-asciiDownload
From a3a59fa972f211aad37826ee0a6b280a5c71f916 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen <ams@2ndQuadrant.com>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 16/17] wal_decoding: test_logical_decoding: Add extension for
easier testing of logical decoding
This extension provides three functions for manipulating replication slots:
* init_logical_replication - initiate a replication slot and wait for consistent state
* start_logical_replication - return all changes since the last call up to now, without blocking
* free_logical_replication - free the logical slot again
Those are pretty direct synonyms for the replication connection commands.
Due to questions about how to integrate logical replication tests this module
also contains the current tests of logical replication itself.
Author: Abhijit Menon-Sen
---
contrib/Makefile | 1 +
contrib/test_logical_decoding/Makefile | 37 ++
contrib/test_logical_decoding/expected/ddl.out | 587 +++++++++++++++++++++
contrib/test_logical_decoding/logical.conf | 2 +
contrib/test_logical_decoding/sql/ddl.sql | 291 ++++++++++
.../test_logical_decoding--1.0.sql | 6 +
.../test_logical_decoding/test_logical_decoding.c | 237 +++++++++
.../test_logical_decoding.control | 5 +
8 files changed, 1166 insertions(+)
create mode 100644 contrib/test_logical_decoding/Makefile
create mode 100644 contrib/test_logical_decoding/expected/ddl.out
create mode 100644 contrib/test_logical_decoding/logical.conf
create mode 100644 contrib/test_logical_decoding/sql/ddl.sql
create mode 100644 contrib/test_logical_decoding/test_logical_decoding--1.0.sql
create mode 100644 contrib/test_logical_decoding/test_logical_decoding.c
create mode 100644 contrib/test_logical_decoding/test_logical_decoding.control
diff --git a/contrib/Makefile b/contrib/Makefile
index 6d2fe32..41cb892 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -51,6 +51,7 @@ SUBDIRS = \
tcn \
test_parser \
test_decoding \
+ test_logical_decoding \
tsearch2 \
unaccent \
vacuumlo \
diff --git a/contrib/test_logical_decoding/Makefile b/contrib/test_logical_decoding/Makefile
new file mode 100644
index 0000000..0e7d5d3
--- /dev/null
+++ b/contrib/test_logical_decoding/Makefile
@@ -0,0 +1,37 @@
+MODULE_big = test_logical_decoding
+OBJS = test_logical_decoding.o
+
+EXTENSION = test_logical_decoding
+DATA = test_logical_decoding--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_logical_decoding
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+test_logical_decoding.o: test_logical_decoding.c
+
+# Disabled because these tests require "wal_level=logical", which
+# typical installcheck users do not have (e.g. buildfarm clients).
+installcheck:;
+
+submake-regress:
+ $(MAKE) -C $(top_builddir)/src/test/regress
+
+submake-test_decoding:
+ $(MAKE) -C $(top_builddir)/contrib/test_decoding
+
+check: all | submake-regress submake-test_decoding
+ $(pg_regress_check) --temp-config $(top_srcdir)/contrib/test_logical_decoding/logical.conf \
+ --temp-install=./tmp_check \
+ --extra-install=contrib/test_decoding \
+ --extra-install=contrib/test_logical_decoding \
+ ddl
+
+PHONY: submake-test_decoding submake-regress
diff --git a/contrib/test_logical_decoding/expected/ddl.out b/contrib/test_logical_decoding/expected/ddl.out
new file mode 100644
index 0000000..3947093
--- /dev/null
+++ b/contrib/test_logical_decoding/expected/ddl.out
@@ -0,0 +1,587 @@
+CREATE EXTENSION test_logical_decoding;
+-- predictability
+SET synchronous_commit = on;
+-- faster startup
+CHECKPOINT;
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+-- fail because of an already existing slot
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ERROR: There already is a logical slot named "regression_slot"
+-- succeed once
+SELECT stop_logical_replication('regression_slot');
+ stop_logical_replication
+--------------------------
+ 0
+(1 row)
+
+-- fail
+SELECT stop_logical_replication('regression_slot');
+ERROR: couldn't find logical slot "regression_slot"
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+/* check whether status function reports us, only reproduceable columns */
+SELECT slot_name, plugin, active,
+ xmin::xid IS NOT NULL,
+ pg_xlog_location_diff(restart_decoding_lsn, '0/01000000') > 0
+FROM pg_stat_logical_decoding;
+ slot_name | plugin | active | ?column? | ?column?
+-----------------+---------------+--------+----------+----------
+ regression_slot | test_decoding | f | t | t
+(1 row)
+
+/*
+ * Check that changes are handled correctly when interleaved with ddl
+ */
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+COMMIT;
+ALTER TABLE replication_example ADD COLUMN bar int;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+COMMIT;
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+-- collect all changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+---------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:1 somedata[int4]:1 text[varchar]:1
+ table "replication_example": INSERT: id[int4]:2 somedata[int4]:1 text[varchar]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:3 somedata[int4]:2 text[varchar]:1 bar[int4]:4
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:4 somedata[int4]:2 text[varchar]:2 bar[int4]:4
+ table "replication_example": INSERT: id[int4]:5 somedata[int4]:2 text[varchar]:3 bar[int4]:4
+ table "replication_example": INSERT: id[int4]:6 somedata[int4]:2 text[varchar]:4 bar[int4]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:7 somedata[int4]:3 text[varchar]:1
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:8 somedata[int4]:3 text[varchar]:2
+ table "replication_example": INSERT: id[int4]:9 somedata[int4]:3 text[varchar]:3
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:10 somedata[int4]:4 somenum[varchar]:1
+ COMMIT
+(30 rows)
+
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
+-- throw away changes, they contain oids
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ count
+-------
+ 12
+(1 row)
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+BEGIN;
+INSERT INTO replication_example(somedata, somenum) VALUES (6, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod1 int;
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 2, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod2 int;
+INSERT INTO replication_example(somedata, somenum, zaphod2) VALUES (6, 3, 1);
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 4, 2);
+COMMIT;
+/*
+ * check whether the correct indexes are chosen for deletions
+ */
+CREATE TABLE tr_unique(id2 serial unique NOT NULL, data int);
+INSERT INTO tr_unique(data) VALUES(10);
+--show deletion with unique index
+DELETE FROM tr_unique;
+ALTER TABLE tr_unique RENAME TO tr_pkey;
+-- show changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "replication_example": INSERT: id[int4]:11 somedata[int4]:5 somenum[int4]:1
+ COMMIT
+ BEGIN
+ table "replication_example": INSERT: id[int4]:12 somedata[int4]:6 somenum[int4]:1
+ table "replication_example": INSERT: id[int4]:13 somedata[int4]:6 somenum[int4]:2 zaphod1[int4]:1
+ table "replication_example": INSERT: id[int4]:14 somedata[int4]:6 somenum[int4]:3 zaphod1[int4]:(null) zaphod2[int4]:1
+ table "replication_example": INSERT: id[int4]:15 somedata[int4]:6 somenum[int4]:4 zaphod1[int4]:2 zaphod2[int4]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "tr_unique": INSERT: id2[int4]:1 data[int4]:10
+ COMMIT
+ BEGIN
+ table "tr_unique": DELETE: id2[int4]:1
+ COMMIT
+ BEGIN
+ COMMIT
+(19 rows)
+
+-- hide changes bc of oid visible in full table rewrites
+ALTER TABLE tr_pkey ADD COLUMN id serial primary key;
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ count
+-------
+ 2
+(1 row)
+
+INSERT INTO tr_pkey(data) VALUES(1);
+--show deletion with primary key
+DELETE FROM tr_pkey;
+/* display results */
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+--------------------------------------------------------------
+ BEGIN
+ table "tr_pkey": INSERT: id2[int4]:2 data[int4]:1 id[int4]:1
+ COMMIT
+ BEGIN
+ table "tr_pkey": DELETE: id[int4]:1
+ COMMIT
+(6 rows)
+
+/*
+ * check that disk spooling works
+ */
+BEGIN;
+CREATE TABLE tr_etoomuch (id serial primary key, data int);
+INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+DELETE FROM tr_etoomuch WHERE id < 5000;
+UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+COMMIT;
+/* display results, but hide most of the output */
+SELECT count(*), min(data), max(data)
+FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1')
+GROUP BY substring(data, 1, 24)
+ORDER BY 1;
+ count | min | max
+-------+---------------------------------------------------------------+-------------------------------------------------------------
+ 1 | COMMIT | COMMIT
+ 1 | BEGIN | BEGIN
+ 4999 | table "tr_etoomuch": DELETE: id[int4]:1 | table "tr_etoomuch": DELETE: id[int4]:999
+ 5234 | table "tr_etoomuch": UPDATE: id[int4]:10000 data[int4]:-10000 | table "tr_etoomuch": UPDATE: id[int4]:9999 data[int4]:-9999
+ 10234 | table "tr_etoomuch": INSERT: id[int4]:10000 data[int4]:10000 | table "tr_etoomuch": INSERT: id[int4]:9 data[int4]:9
+(5 rows)
+
+/*
+ * check whether we subtransactions correctly in relation with each other
+ */
+CREATE TABLE tr_sub (id serial primary key, path text);
+-- toplevel, subtxn, toplevel, subtxn, subtxn
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('1-top-#1');
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#2');
+RELEASE SAVEPOINT a;
+SAVEPOINT b;
+SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#2');
+RELEASE SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-#1');
+RELEASE SAVEPOINT b;
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:1 path[text]:1-top-#1
+ table "tr_sub": INSERT: id[int4]:2 path[text]:1-top-1-#1
+ table "tr_sub": INSERT: id[int4]:3 path[text]:1-top-1-#2
+ table "tr_sub": INSERT: id[int4]:4 path[text]:1-top-2-1-#1
+ table "tr_sub": INSERT: id[int4]:5 path[text]:1-top-2-1-#2
+ table "tr_sub": INSERT: id[int4]:6 path[text]:1-top-2-#1
+ COMMIT
+(10 rows)
+
+-- check that we handle xlog assignments correctly
+BEGIN;
+-- nest 80 subtxns
+SAVEPOINT subtop;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+-- assign xid by inserting
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#1');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#2');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#3');
+RELEASE SAVEPOINT subtop;
+INSERT INTO tr_sub(path) VALUES ('2-top-#1');
+COMMIT;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+--------------------------------------------------------------
+ BEGIN
+ table "tr_sub": INSERT: id[int4]:7 path[text]:2-top-1...--#1
+ table "tr_sub": INSERT: id[int4]:8 path[text]:2-top-1...--#2
+ table "tr_sub": INSERT: id[int4]:9 path[text]:2-top-1...--#3
+ table "tr_sub": INSERT: id[int4]:10 path[text]:2-top-#1
+ COMMIT
+(6 rows)
+
+/*
+ * Check whether treating a table as a catalog table works somewhat
+ */
+CREATE TABLE replication_metadata (
+ id serial primary key,
+ relation name NOT NULL,
+ options text[]
+)
+WITH (treat_as_catalog_table = true)
+;
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=true
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('foo', ARRAY['a', 'b']);
+ALTER TABLE replication_metadata RESET (treat_as_catalog_table);
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('bar', ARRAY['a', 'b']);
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=true
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('blub', NULL);
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = false);
+\d+ replication_metadata
+ Table "public.replication_metadata"
+ Column | Type | Modifiers | Storage | Stats target | Description
+----------+---------+-------------------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('replication_metadata_id_seq'::regclass) | plain | |
+ relation | name | not null | plain | |
+ options | text[] | | extended | |
+Indexes:
+ "replication_metadata_pkey" PRIMARY KEY, btree (id)
+Has OIDs: no
+Options: treat_as_catalog_table=false
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('zaphod', NULL);
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+----------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:1 relation[name]:foo options[_text]:{a,b}
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:2 relation[name]:bar options[_text]:{a,b}
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:3 relation[name]:blub options[_text]:(null)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "replication_metadata": INSERT: id[int4]:4 relation[name]:zaphod options[_text]:(null)
+ COMMIT
+(20 rows)
+
+/*
+ * check whether we handle updates/deletes correct with & without a pkey
+ */
+/* we should handle the case without a key at all more gracefully */
+CREATE TABLE table_without_key(id serial, data int);
+INSERT INTO table_without_key(data) VALUES(1),(2);
+DELETE FROM table_without_key WHERE data = 1;
+UPDATE table_without_key SET data = 3 WHERE data = 2;
+UPDATE table_without_key SET id = -id;
+UPDATE table_without_key SET id = -id;
+DELETE FROM table_without_key WHERE data = 3;
+CREATE TABLE table_with_pkey(id serial primary key, data int);
+INSERT INTO table_with_pkey(data) VALUES(1), (2);
+DELETE FROM table_with_pkey WHERE data = 1;
+UPDATE table_with_pkey SET data = 3 WHERE data = 2;
+UPDATE table_with_pkey SET id = -id;
+UPDATE table_with_pkey SET id = -id;
+DELETE FROM table_with_pkey WHERE data = 3;
+CREATE TABLE table_with_unique(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id DROP NOT NULL;
+INSERT INTO table_with_unique(data) VALUES(1), (2);
+DELETE FROM table_with_unique WHERE data = 1;
+UPDATE table_with_unique SET data = 3 WHERE data = 2;
+UPDATE table_with_unique SET id = -id;
+UPDATE table_with_unique SET id = -id;
+DELETE FROM table_with_unique WHERE data = 3;
+CREATE TABLE table_with_unique_not_null(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id SET NOT NULL; --already set
+INSERT INTO table_with_unique_not_null(data) VALUES(1), (2);
+DELETE FROM table_with_unique_not_null WHERE data = 1;
+UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
+UPDATE table_with_unique_not_null SET id = -id;
+UPDATE table_with_unique_not_null SET id = -id;
+DELETE FROM table_with_unique_not_null WHERE data = 3;
+CREATE TABLE table_with_oid(id serial, data int) WITH oids;
+CREATE UNIQUE INDEX table_with_oid_oid ON table_with_oid(oid);
+INSERT INTO table_with_oid(data) VALUES(1), (2);
+DELETE FROM table_with_oid WHERE data = 1;
+UPDATE table_with_oid SET data = 3 WHERE data = 2;
+DELETE FROM table_with_oid WHERE data = 3;
+UPDATE table_with_oid SET id = -id;
+UPDATE table_with_oid SET id = -id;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_without_key": INSERT: id[int4]:1 data[int4]:1
+ table "table_without_key": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_without_key": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_without_key": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_pkey": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_pkey": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_pkey": DELETE: id[int4]:1
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: old-pkey: id[int4]:2 new-tuple: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": UPDATE: old-pkey: id[int4]:-2 new-tuple: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_pkey": DELETE: id[int4]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_unique": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_unique": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_unique": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique": DELETE: (no-tuple-data)
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": INSERT: id[int4]:1 data[int4]:1
+ table "table_with_unique_not_null": INSERT: id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": DELETE: id[int4]:1
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:2 new-tuple: id[int4]:-2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": UPDATE: old-pkey: id[int4]:-2 new-tuple: id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_unique_not_null": DELETE: id[int4]:2
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ COMMIT
+ BEGIN
+ table "table_with_oid": INSERT: oid[oid]:16484 id[int4]:1 data[int4]:1
+ table "table_with_oid": INSERT: oid[oid]:16485 id[int4]:2 data[int4]:2
+ COMMIT
+ BEGIN
+ table "table_with_oid": DELETE: oid[oid]:16484
+ COMMIT
+ BEGIN
+ table "table_with_oid": UPDATE: oid[oid]:16485 id[int4]:2 data[int4]:3
+ COMMIT
+ BEGIN
+ table "table_with_oid": DELETE: oid[oid]:16485
+ COMMIT
+(105 rows)
+
+-- check toast support
+SELECT setseed(0);
+ setseed
+---------
+
+(1 row)
+
+CREATE TABLE toasttable(
+ id serial primary key,
+ toasted_col1 text,
+ rand1 float8 DEFAULT random(),
+ toasted_col2 text,
+ rand2 float8 DEFAULT random()
+ );
+-- uncompressed external toast data
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+-- compressed external toast data
+INSERT INTO toasttable(toasted_col2) SELECT repeat(string_agg(to_char(g.i, 'FM0000'), ''), 50) FROM generate_series(1, 500) g(i);
+-- update of existing column
+UPDATE toasttable
+ SET toasted_col1 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ COMMIT
+ BEGIN
+ table "toasttable": INSERT: id[int4]:1 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.840187716763467 toasted_col2[text]:(null) rand2[float8]:0.394382926635444
+ COMMIT
+ BEGIN
+ table "toasttable": INSERT: id[int4]:2 toasted_col1[text]:(null) rand1[float8]:0.783099223393947 toasted_col2[text]:0001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500000100020003000400050006000700080009001000110012001300140015001600170018001900200021002200230024002500260027002800290030003100320033003400350036003700380039004000410042004300440045004600470048004900500051005200530054005500560057005800590060006100620063006400650066006700680069007000710072007300740075007600770078007900800081008200830084008500860087008800890090009100920093009400950096009700980099010001010102010301040105010601070108010901100111011201130114011501160117011801190120012101220123012401250126012701280129013001310132013301340135013601370138013901400141014201430144014501460147014801490150015101520153015401550156015701580159016001610162016301640165016601670168016901700171017201730174017501760177017801790180018101820183018401850186018701880189019001910192019301940195019601970198019902000201020202030204020502060207020802090210021102120213021402150216021702180219022002210222022302240225022602270228022902300231023202330234023502360237023802390240024102420243024402450246024702480249025002510252025302540255025602570258025902600261026202630264026502660267026802690270027102720273027402750276027702780279028002810282028302840285028602870288028902900291029202930294029502960297029802990300030103020303030403050306030703080309031003110312031303140315031603170318031903200321032203230324032503260327032803290330033103320333033403350336033703380339034003410342034303440345034603470348034903500351035203530354035503560357035803590360036103620363036403650366036703680369037003710372037303740375037603770378037903800381038203830384038503860387038803890390039103920393039403950396039703980399040004010402040304040405040604070408040904100411041204130414041504160417041804190420042104220423042404250426042704280429043004310432043304340435043604370438043904400441044204430444044504460447044804490450045104520453045404550456045704580459046004610462046304640465046604670468046904700471047204730474047504760477047804790480048104820483048404850486048704880489049004910492049304940495049604970498049905000001000200030004000500060007000800090010001100120013001400150016001700180019002000210022002300240025002600270028002900300031003200330034003500360037003800390040004100420043004400450046004700480049005000510052005300540055005600570058005900600061006200630064006500660067006800690070007100720073007400750076007700780079008000810082008300840085008600870088008900900091009200930094009500960097009800990100010101020103010401050106010701080109011001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501560157015801590160016101620163016401650166016701680169017001710172017301740175017601770178017901800181018201830184018501860187018801890190019101920193019401950196019701980199020002010202020302040205020602070208020902100211021202130214021502160217021802190220022102220223022402250226022702280229023002310232023302340235023602370238023902400241024202430244024502460247024802490250025102520253025402550256025702580259026002610262026302640265026602670268026902700271027202730274027502760277027802790280028102820283028402850286028702880289029002910292029302940295029602970298029903000301030203030304030503060307030803090310031103120313031403150316031703180319032003210322032303240325032603270328032903300331033203330334033503360337033803390340034103420343034403450346034703480349035003510352035303540355035603570358035903600361036203630364036503660367036803690370037103720373037403750376037703780379038003810382038303840385038603870388038903900391039203930394039503960397039803990400040104020403040404050406040704080409041004110412041304140415041604170418041904200421042204230424042504260427042804290430043104320433043404350436043704380439044004410442044304440445044604470448044904500451045204530454045504560457045804590460046104620463046404650466046704680469047004710472047304740475047604770478047904800481048204830484048504860487048804890490049104920493049404950496049704980499050000010002000300040005000600070008000900100011001200130014001500160017001800190020002100220023002400250026002700280029003000310032003300340035003600370038003900400041004200430044004500460047004800490050005100520053005400550056005700580059006000610062006300640065006600670068006900700071007200730074007500760077007800790080008100820083008400850086008700880089009000910092009300940095009600970098009901000101010201030104010501060107010801090110011101120113011401150116011701180119012001210122012301240125012601270128012901300131013201330134013501360137013801390140014101420143014401450146014701480149015001510152015301540155015601570158015901600161016201630164016501660167016801690170017101720173017401750176017701780179018001810182018301840185018601870188018901900191019201930194019501960197019801990200020102020203020402050206020702080209021002110212021302140215021602170218021902200221022202230224022502260227022802290230023102320233023402350236023702380239024002410242024302440245024602470248024902500251025202530254025502560257025802590260026102620263026402650266026702680269027002710272027302740275027602770278027902800281028202830284028502860287028802890290029102920293029402950296029702980299030003010302030303040305030603070308030903100311031203130314031503160317031803190320032103220323032403250326032703280329033003310332033303340335033603370338033903400341034203430344034503460347034803490350035103520353035403550356035703580359036003610362036303640365036603670368036903700371037203730374037503760377037803790380038103820383038403850386038703880389039003910392039303940395039603970398039904000401040204030404040504060407040804090410041104120413041404150416041704180419042004210422042304240425042604270428042904300431043204330434043504360437043804390440044104420443044404450446044704480449045004510452045304540455045604570458045904600461046204630464046504660467046804690470047104720473047404750476047704780479048004810482048304840485048604870488048904900491049204930494049504960497049804990500 rand2[float8]:0.798440033104271
+ COMMIT
+ BEGIN
+ table "toasttable": UPDATE: id[int4]:1 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.840187716763467 toasted_col2[text]:(null) rand2[float8]:0.394382926635444
+ COMMIT
+(11 rows)
+
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+-- update of second column, first column unchanged
+UPDATE toasttable
+ SET toasted_col2 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+-- make sure we decode correctly even if the toast table is gone
+DROP TABLE toasttable;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table "toasttable": INSERT: id[int4]:3 toasted_col1[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand1[float8]:0.911647357512265 toasted_col2[text]:(null) rand2[float8]:0.197551369201392
+ COMMIT
+ BEGIN
+ table "toasttable": UPDATE: id[int4]:1 toasted_col1[text]:(unchanged-toast-datum) rand1[float8]:0.840187716763467 toasted_col2[text]:12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000 rand2[float8]:0.394382926635444
+ COMMIT
+ BEGIN
+ COMMIT
+(8 rows)
+
+-- done, free logical replication slot
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+ data
+------
+(0 rows)
+
+SELECT stop_logical_replication('regression_slot');
+ stop_logical_replication
+--------------------------
+ 0
+(1 row)
+
+/* check whether we aren't visible anymore now */
+SELECT * FROM pg_stat_logical_decoding;
+ slot_name | plugin | database | active | xmin | restart_decoding_lsn
+-----------+--------+----------+--------+------+----------------------
+(0 rows)
+
diff --git a/contrib/test_logical_decoding/logical.conf b/contrib/test_logical_decoding/logical.conf
new file mode 100644
index 0000000..a7c6c86
--- /dev/null
+++ b/contrib/test_logical_decoding/logical.conf
@@ -0,0 +1,2 @@
+wal_level = logical
+max_logical_slots = 4
diff --git a/contrib/test_logical_decoding/sql/ddl.sql b/contrib/test_logical_decoding/sql/ddl.sql
new file mode 100644
index 0000000..1e46584
--- /dev/null
+++ b/contrib/test_logical_decoding/sql/ddl.sql
@@ -0,0 +1,291 @@
+CREATE EXTENSION test_logical_decoding;
+-- predictability
+SET synchronous_commit = on;
+
+-- faster startup
+CHECKPOINT;
+
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+-- fail because of an already existing slot
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+-- succeed once
+SELECT stop_logical_replication('regression_slot');
+-- fail
+SELECT stop_logical_replication('regression_slot');
+SELECT 'init' FROM init_logical_replication('regression_slot', 'test_decoding');
+
+/* check whether status function reports us, only reproduceable columns */
+SELECT slot_name, plugin, active,
+ xmin::xid IS NOT NULL,
+ pg_xlog_location_diff(restart_decoding_lsn, '0/01000000') > 0
+FROM pg_stat_logical_decoding;
+
+/*
+ * Check that changes are handled correctly when interleaved with ddl
+ */
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+COMMIT;
+
+ALTER TABLE replication_example ADD COLUMN bar int;
+
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+COMMIT;
+
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+
+-- collect all changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
+-- throw away changes, they contain oids
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, somenum) VALUES (6, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod1 int;
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 2, 1);
+ALTER TABLE replication_example ADD COLUMN zaphod2 int;
+INSERT INTO replication_example(somedata, somenum, zaphod2) VALUES (6, 3, 1);
+INSERT INTO replication_example(somedata, somenum, zaphod1) VALUES (6, 4, 2);
+COMMIT;
+
+/*
+ * check whether the correct indexes are chosen for deletions
+ */
+
+CREATE TABLE tr_unique(id2 serial unique NOT NULL, data int);
+INSERT INTO tr_unique(data) VALUES(10);
+--show deletion with unique index
+DELETE FROM tr_unique;
+
+ALTER TABLE tr_unique RENAME TO tr_pkey;
+
+-- show changes
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- hide changes bc of oid visible in full table rewrites
+ALTER TABLE tr_pkey ADD COLUMN id serial primary key;
+SELECT count(data) FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO tr_pkey(data) VALUES(1);
+--show deletion with primary key
+DELETE FROM tr_pkey;
+
+/* display results */
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+/*
+ * check that disk spooling works
+ */
+BEGIN;
+CREATE TABLE tr_etoomuch (id serial primary key, data int);
+INSERT INTO tr_etoomuch(data) SELECT g.i FROM generate_series(1, 10234) g(i);
+DELETE FROM tr_etoomuch WHERE id < 5000;
+UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+COMMIT;
+
+/* display results, but hide most of the output */
+SELECT count(*), min(data), max(data)
+FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1')
+GROUP BY substring(data, 1, 24)
+ORDER BY 1;
+
+/*
+ * check whether we subtransactions correctly in relation with each other
+ */
+CREATE TABLE tr_sub (id serial primary key, path text);
+
+-- toplevel, subtxn, toplevel, subtxn, subtxn
+BEGIN;
+INSERT INTO tr_sub(path) VALUES ('1-top-#1');
+
+SAVEPOINT a;
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-1-#2');
+RELEASE SAVEPOINT a;
+
+SAVEPOINT b;
+SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#1');
+INSERT INTO tr_sub(path) VALUES ('1-top-2-1-#2');
+RELEASE SAVEPOINT c;
+INSERT INTO tr_sub(path) VALUES ('1-top-2-#1');
+RELEASE SAVEPOINT b;
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- check that we handle xlog assignments correctly
+BEGIN;
+-- nest 80 subtxns
+SAVEPOINT subtop;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;SAVEPOINT a;
+-- assign xid by inserting
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#1');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#2');
+INSERT INTO tr_sub(path) VALUES ('2-top-1...--#3');
+RELEASE SAVEPOINT subtop;
+INSERT INTO tr_sub(path) VALUES ('2-top-#1');
+COMMIT;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+/*
+ * Check whether treating a table as a catalog table works somewhat
+ */
+CREATE TABLE replication_metadata (
+ id serial primary key,
+ relation name NOT NULL,
+ options text[]
+)
+WITH (treat_as_catalog_table = true)
+;
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('foo', ARRAY['a', 'b']);
+
+ALTER TABLE replication_metadata RESET (treat_as_catalog_table);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('bar', ARRAY['a', 'b']);
+
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = true);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('blub', NULL);
+
+ALTER TABLE replication_metadata SET (treat_as_catalog_table = false);
+\d+ replication_metadata
+
+INSERT INTO replication_metadata(relation, options)
+VALUES ('zaphod', NULL);
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+/*
+ * check whether we handle updates/deletes correct with & without a pkey
+ */
+
+/* we should handle the case without a key at all more gracefully */
+CREATE TABLE table_without_key(id serial, data int);
+INSERT INTO table_without_key(data) VALUES(1),(2);
+DELETE FROM table_without_key WHERE data = 1;
+UPDATE table_without_key SET data = 3 WHERE data = 2;
+UPDATE table_without_key SET id = -id;
+UPDATE table_without_key SET id = -id;
+DELETE FROM table_without_key WHERE data = 3;
+
+CREATE TABLE table_with_pkey(id serial primary key, data int);
+INSERT INTO table_with_pkey(data) VALUES(1), (2);
+DELETE FROM table_with_pkey WHERE data = 1;
+UPDATE table_with_pkey SET data = 3 WHERE data = 2;
+UPDATE table_with_pkey SET id = -id;
+UPDATE table_with_pkey SET id = -id;
+DELETE FROM table_with_pkey WHERE data = 3;
+
+CREATE TABLE table_with_unique(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id DROP NOT NULL;
+INSERT INTO table_with_unique(data) VALUES(1), (2);
+DELETE FROM table_with_unique WHERE data = 1;
+UPDATE table_with_unique SET data = 3 WHERE data = 2;
+UPDATE table_with_unique SET id = -id;
+UPDATE table_with_unique SET id = -id;
+DELETE FROM table_with_unique WHERE data = 3;
+
+CREATE TABLE table_with_unique_not_null(id serial unique, data int);
+ALTER TABLE table_with_unique ALTER COLUMN id SET NOT NULL; --already set
+INSERT INTO table_with_unique_not_null(data) VALUES(1), (2);
+DELETE FROM table_with_unique_not_null WHERE data = 1;
+UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
+UPDATE table_with_unique_not_null SET id = -id;
+UPDATE table_with_unique_not_null SET id = -id;
+DELETE FROM table_with_unique_not_null WHERE data = 3;
+
+CREATE TABLE table_with_oid(id serial, data int) WITH oids;
+CREATE UNIQUE INDEX table_with_oid_oid ON table_with_oid(oid);
+INSERT INTO table_with_oid(data) VALUES(1), (2);
+DELETE FROM table_with_oid WHERE data = 1;
+UPDATE table_with_oid SET data = 3 WHERE data = 2;
+DELETE FROM table_with_oid WHERE data = 3;
+UPDATE table_with_oid SET id = -id;
+UPDATE table_with_oid SET id = -id;
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- check toast support
+SELECT setseed(0);
+CREATE TABLE toasttable(
+ id serial primary key,
+ toasted_col1 text,
+ rand1 float8 DEFAULT random(),
+ toasted_col2 text,
+ rand2 float8 DEFAULT random()
+ );
+
+-- uncompressed external toast data
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+
+-- compressed external toast data
+INSERT INTO toasttable(toasted_col2) SELECT repeat(string_agg(to_char(g.i, 'FM0000'), ''), 50) FROM generate_series(1, 500) g(i);
+
+-- update of existing column
+UPDATE toasttable
+ SET toasted_col1 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+INSERT INTO toasttable(toasted_col1) SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i);
+
+-- update of second column, first column unchanged
+UPDATE toasttable
+ SET toasted_col2 = (SELECT string_agg(g.i::text, '') FROM generate_series(1, 2000) g(i))
+WHERE id = 1;
+
+-- make sure we decode correctly even if the toast table is gone
+DROP TABLE toasttable;
+
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+
+-- done, free logical replication slot
+SELECT data FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '1');
+SELECT stop_logical_replication('regression_slot');
+
+/* check whether we aren't visible anymore now */
+SELECT * FROM pg_stat_logical_decoding;
diff --git a/contrib/test_logical_decoding/test_logical_decoding--1.0.sql b/contrib/test_logical_decoding/test_logical_decoding--1.0.sql
new file mode 100644
index 0000000..b6e048c
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding--1.0.sql
@@ -0,0 +1,6 @@
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_logical_decoding" to load this file. \quit
+
+CREATE FUNCTION start_logical_replication (slotname name, pos text, VARIADIC options text[] DEFAULT '{}', OUT location text, OUT xid bigint, OUT data text) RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'start_logical_replication'
+LANGUAGE C IMMUTABLE STRICT;
diff --git a/contrib/test_logical_decoding/test_logical_decoding.c b/contrib/test_logical_decoding/test_logical_decoding.c
new file mode 100644
index 0000000..6c78319
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding.c
@@ -0,0 +1,237 @@
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalfuncs.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "storage/fd.h"
+#include "miscadmin.h"
+#include "funcapi.h"
+
+PG_MODULE_MAGIC;
+
+Datum start_logical_replication(PG_FUNCTION_ARGS);
+
+static Tuplestorestate *tupstore = NULL;
+static TupleDesc tupdesc;
+
+static void
+LogicalOutputPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ resetStringInfo(ctx->out);
+}
+
+static void
+LogicalOutputWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
+{
+ Datum values[3];
+ bool nulls[3];
+ char buf[60];
+
+ sprintf(buf, "%X/%X", (uint32) (lsn >> 32), (uint32) lsn);
+
+ memset(nulls, 0, sizeof(nulls));
+ values[0] = CStringGetTextDatum(buf);
+ values[1] = Int64GetDatum(xid);
+ values[2] = CStringGetTextDatum(ctx->out->data);
+
+ tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+}
+
+PG_FUNCTION_INFO_V1(start_logical_replication);
+
+Datum
+start_logical_replication(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext per_query_ctx;
+ MemoryContext oldcontext;
+
+ XLogRecPtr now;
+ XLogRecPtr startptr;
+ XLogRecPtr rp;
+
+ LogicalDecodingContext *ctx;
+
+ ResourceOwner old_resowner = CurrentResourceOwner;
+ ArrayType *arr;
+ Size ndim;
+ List *options = NIL;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ arr = PG_GETARG_ARRAYTYPE_P(2);
+ ndim = ARR_NDIM(arr);
+
+
+ per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+ if (ndim > 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("start_logical_replication only accept one dimension of arguments")));
+ }
+ else if (array_contains_nulls(arr))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("start_logical_replication expects NOT NULL options")));
+ }
+ else if (ndim == 1)
+ {
+ int nelems;
+ Datum *datum_opts;
+ int i;
+
+ Assert(ARR_ELEMTYPE(arr) == TEXTOID);
+
+ deconstruct_array(arr, TEXTOID, -1, false, 'i',
+ &datum_opts, NULL, &nelems);
+
+ if (nelems % 2 != 0)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("options need to be specified pairwise")));
+ }
+
+ for (i = 0; i < nelems; i += 2)
+ {
+ char *name = VARDATA(DatumGetTextP(datum_opts[i]));
+ char *opt = VARDATA(DatumGetTextP(datum_opts[i + 1]));
+
+ options = lappend(options, makeDefElem(name, (Node *) makeString(opt)));
+ }
+ }
+
+ tupstore = tuplestore_begin_heap(true, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = tupstore;
+ rsinfo->setDesc = tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * XXX: It's impolite to ignore our argument and keep decoding until the
+ * current position.
+ */
+ now = GetFlushRecPtr();
+
+ /*
+ * We need to create a normal_snapshot_reader, but adjust it to use our
+ * page_read callback, and also make its reorder buffer use our callback
+ * wrappers that don't depend on walsender.
+ */
+
+ CheckLogicalReplicationRequirements();
+ LogicalDecodingReAcquireSlot(NameStr(*name));
+
+ ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false,
+ MyLogicalDecodingSlot->confirmed_flush,
+ options,
+ logical_read_local_xlog_page,
+ LogicalOutputPrepareWrite,
+ LogicalOutputWrite);
+
+ startptr = MyLogicalDecodingSlot->restart_decoding;
+
+ elog(DEBUG1, "Starting logical replication from %X/%X to %X/%X",
+ (uint32) (MyLogicalDecodingSlot->restart_decoding >> 32),
+ (uint32) MyLogicalDecodingSlot->restart_decoding,
+ (uint32) (now >> 32), (uint32) now);
+
+ CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner, "logical decoding");
+
+ /* invalidate non-timetravel entries */
+ InvalidateSystemCaches();
+
+ PG_TRY();
+ {
+
+ while ((startptr != InvalidXLogRecPtr && startptr < now) ||
+ (ctx->reader->EndRecPtr && ctx->reader->EndRecPtr < now))
+ {
+ XLogRecord *record;
+ char *errm = NULL;
+
+ record = XLogReadRecord(ctx->reader, startptr, &errm);
+ if (errm)
+ elog(ERROR, "%s", errm);
+
+ startptr = InvalidXLogRecPtr;
+
+ if (record != NULL)
+ {
+ XLogRecordBuffer buf;
+
+ buf.origptr = ctx->reader->ReadRecPtr;
+ buf.record = *record;
+ buf.record_data = XLogRecGetData(record);
+
+ /*
+ * The {begin_txn,change,commit_txn}_wrapper callbacks above
+ * will store the description into our tuplestore.
+ */
+ DecodeRecordIntoReorderBuffer(ctx, &buf);
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ LogicalDecodingReleaseSlot();
+
+ /*
+ * clear timetravel entries: XXX allowed in aborted TXN?
+ */
+ InvalidateSystemCaches();
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ rp = ctx->reader->EndRecPtr;
+ if (rp >= now)
+ {
+ elog(DEBUG1, "Reached endpoint (wanted: %X/%X, got: %X/%X)",
+ (uint32) (now >> 32), (uint32) now,
+ (uint32) (rp >> 32), (uint32) rp);
+ }
+
+ tuplestore_donestoring(tupstore);
+
+ CurrentResourceOwner = old_resowner;
+
+ /*
+ * Next time, start where we left off. (Hunting things, the family
+ * business..)
+ */
+ MyLogicalDecodingSlot->confirmed_flush = ctx->reader->EndRecPtr;
+
+ LogicalDecodingReleaseSlot();
+
+ return (Datum) 0;
+}
diff --git a/contrib/test_logical_decoding/test_logical_decoding.control b/contrib/test_logical_decoding/test_logical_decoding.control
new file mode 100644
index 0000000..0dce19f
--- /dev/null
+++ b/contrib/test_logical_decoding/test_logical_decoding.control
@@ -0,0 +1,5 @@
+# test_logical_decoding extension
+comment = 'test logical decoding'
+default_version = '1.0'
+module_pathname = '$libdir/test_logical_decoding'
+relocatable = true
--
1.8.2.rc2.4.g7799588.dirty
0017-wal_decoding-design-document-v2.4-and-snapshot-build.patchtext/x-patch; charset=us-asciiDownload
>From b4e663f53f92a727f6f4d9832542546cbff977c8 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 17/17] wal_decoding: design document v2.4 and snapshot
building design doc v0.5
---
src/backend/replication/logical/DESIGN.txt | 593 +++++++++++++++++++++
src/backend/replication/logical/Makefile | 6 +
.../replication/logical/README.SNAPBUILD.txt | 241 +++++++++
3 files changed, 840 insertions(+)
create mode 100644 src/backend/replication/logical/DESIGN.txt
create mode 100644 src/backend/replication/logical/README.SNAPBUILD.txt
diff --git a/src/backend/replication/logical/DESIGN.txt b/src/backend/replication/logical/DESIGN.txt
new file mode 100644
index 0000000..d76fdb4
--- /dev/null
+++ b/src/backend/replication/logical/DESIGN.txt
@@ -0,0 +1,593 @@
+//-*- mode: adoc -*-
+= High Level Design for Logical Replication in Postgres =
+:copyright: PostgreSQL Global Development Group 2012
+:author: Andres Freund, 2ndQuadrant Ltd.
+:email: andres@2ndQuadrant.com
+
+== Introduction ==
+
+This document aims to first explain why we think postgres needs another
+replication solution and what that solution needs to offer in our opinion. Then
+it sketches out our proposed implementation.
+
+In contrast to an earlier version of the design document which talked about the
+implementation of four parts of replication solutions:
+
+1. Source data generation
+1. Transportation of that data
+1. Applying the changes
+1. Conflict resolution
+
+this version only plans to talk about the first part in detail as it is an
+independent and complex part usable for a wide range of use cases which we want
+to get included into postgres in a first step.
+
+=== Previous discussions ===
+
+There are two rather large threads discussing several parts of the initial
+prototype and proposed architecture:
+
+- http://archives.postgresql.org/message-id/201206131327.24092.andres@2ndquadrant.com[Logical Replication/BDR prototype and architecture]
+- http://archives.postgresql.org/message-id/201206211341.25322.andres@2ndquadrant.com[Catalog/Metadata consistency during changeset extraction from WAL]
+
+Those discussions lead to some fundamental design changes which are presented in this document.
+
+=== Changes from v1 ===
+* At least a partial decoding step required/possible on the source system
+* No intermediate ("schema only") instances required
+* DDL handling, without event triggers
+* A very simple text conversion is provided for debugging/demo purposes
+* Smaller scope
+
+== Existing approaches to replication in Postgres ==
+
+If any currently used approach to replication can be made to support every
+use-case/feature we need, it likely is not a good idea to implement something
+different. Currently three basic approaches are in use in/around postgres
+today:
+
+. Trigger based
+. Recovery based/Physical footnote:[Often referred to by terms like Hot Standby, Streaming Replication, Point In Time Recovery]
+. Statement based
+
+Statement based replication has obvious and known problems with consistency and
+correctness making it hard to use in the general case so we will not further
+discuss it here.
+
+Lets have a look at the advantages/disadvantages of the other approaches:
+
+=== Trigger based Replication ===
+
+This variant has a multitude of significant advantages:
+
+* implementable in userspace
+* easy to customize
+* just about everything can be made configurable
+* cross version support
+* cross architecture support
+* can feed into systems other than postgres
+* no overhead from writes to non-replicated tables
+* writable standbys
+* mature solutions
+* multimaster implementations possible & existing
+
+But also a number of disadvantages, some of them very hard to solve:
+
+* essentially duplicates the amount of writes (or even more!)
+* synchronous replication hard or impossible to implement
+* noticeable CPU overhead
+** trigger functions
+** text conversion of data
+* complex parts implemented in several solutions
+* not in core
+
+Especially the higher amount of writes might seem easy to solve at a first
+glance but a solution not using a normal transactional table for its log/queue
+has to solve a lot of problems. The major ones are:
+
+* crash safety, restartability & spilling to disk
+* consistency with the commit status of transactions
+* only a minimal amount of synchronous work should be done inside individual
+transactions
+
+In our opinion those problems are restricting progress/wider distribution of
+these class of solutions. It is our aim though that existing solutions in this
+space - most prominently slony and londiste - can benefit from the work we are
+doing & planning to do by incorporating at least parts of the changeset
+generation infrastructure.
+
+=== Recovery based Replication ===
+
+This type of solution, being built into postgres and of increasing popularity,
+has and will have its use cases and we do not aim to replace but to complement
+it. We plan to reuse some of the infrastructure and to make it possible to mix
+both modes of replication
+
+Advantages:
+
+* builtin
+* built on existing infrastructure from crash recovery
+* efficient
+** minimal CPU, memory overhead on primary
+** low amount of additional writes
+* synchronous operation mode
+* low maintenance once setup
+* handles DDL
+
+Disadvantages:
+
+* standbys are read only
+* no cross version support
+* no cross architecture support
+* no replication into foreign systems
+* hard to customize
+* not configurable on the level of database, tables, ...
+
+== Goals ==
+
+As seen in the previous short survey of the two major interesting classes of
+replication solution there is a significant gap between those. Our aim is to
+make it smaller.
+
+We aim for:
+
+* in core
+* low CPU overhead
+* low storage overhead
+* asynchronous, optionally synchronous operation modes
+* robust
+* modular
+* basis for other technologies (sharding, replication into other DBMS's, ...)
+* basis for at least one multi-master solution
+* make the implementation as unintrusive as possible, but not more
+
+== New Architecture ==
+
+=== Overview ===
+
+Our proposal is to reuse the basic principle of WAL based replication, namely
+reusing data that already needs to be written for another purpose, and extend
+it to allow most, but not all, the flexibility of trigger based solutions.
+We want to do that by decoding the WAL back into a non-physical form.
+
+To get the flexibility we and others want we propose that the last step of
+changeset generation, transforming it into a format that can be used by the
+replication consumer, is done in an extensible manner. In the schema the part
+that does that is described as 'Output Plugin'. To keep the amount of
+duplication between different plugins as low as possible the plugin should only
+do a a very limited amount of work.
+
+The following paragraphs contain reasoning for the individual design decisions
+made and their highlevel design.
+
+=== Schematics ===
+
+The basic proposed architecture for changeset extraction is presented in the
+following diagram. The first part should look familiar to anyone knowing
+postgres' architecture. The second is where most of the new magic happens.
+
+[[basic-schema]]
+.Architecture Schema
+["ditaa"]
+------------------------------------------------------------------------------
+ Traditional Stuff
+
+ +---------+---------+---------+---------+----+
+ | Backend | Backend | Backend | Autovac | ...|
+ +----+----+---+-----+----+----+----+----+-+--+
+ | | | | |
+ +------+ | +--------+ | |
+ +-+ | | | +----------------+ |
+ | | | | | |
+ | v v v v |
+ | +------------+ |
+ | | WAL writer |<------------------+
+ | +------------+
+ | | | | | |
+ v v v v v v +-------------------+
++--------+ +---------+ +->| Startup/Recovery |
+|{s} | |{s} | | +-------------------+
+|Catalog | | WAL |---+->| SR/Hot Standby |
+| | | | | +-------------------+
++--------+ +---------+ +->| Point in Time |
+ ^ | +-------------------+
+ ---|----------|--------------------------------
+ | New Stuff
++---+ |
+| v Running separately
+| +----------------+ +=-------------------------+
+| | Walsender | | | |
+| | v | | +-------------------+ |
+| +-------------+ | | +->| Logical Rep. | |
+| | WAL | | | | +-------------------+ |
++-| decoding | | | +->| Multimaster | |
+| +------+------/ | | | +-------------------+ |
+| | | | | +->| Slony | |
+| | v | | | +-------------------+ |
+| +-------------+ | | +->| Auditing | |
+| | TX | | | | +-------------------+ |
++-| reassembly | | | +->| Mysql/... | |
+| +-------------/ | | | +-------------------+ |
+| | | | | +->| Custom Solutions | |
+| | v | | | +-------------------+ |
+| +-------------+ | | +->| Debugging | |
+| | Output | | | | +-------------------+ |
++-| Plugin |--|--|-+->| Data Recovery | |
+ +-------------/ | | +-------------------+ |
+ | | | |
+ +----------------+ +--------------------------|
+------------------------------------------------------------------------------
+
+=== WAL enrichement ===
+
+To be able to decode individual WAL records at the very minimal they need to
+contain enough information to reconstruct what has happened to which row. The
+action is already encoded in the WAL records header in most of the cases.
+
+As an example of missing data, the WAL record emitted when a row gets deleted,
+only contains its physical location. At the very least we need a way to
+identify the deleted row: in a relational database the minimal amount of data
+that does that should be the primary key footnote:[Yes, there are use cases
+where the whole row is needed, or where no primary key can be found].
+
+We propose that for now it is enough to extend the relevant WAL record with
+additional data when the newly introduced 'WAL_level = logical' is set.
+
+Previously it has been argued on the hackers mailing list that a generic 'WAL
+record annotation' mechanism might be a good thing. That mechanism would allow
+to attach arbitrary data to individual wal records making it easier to extend
+postgres to support something like what we propose.. While we don't oppose that
+idea we think it is largely orthogonal issue to this proposal as a whole
+because the format of a WAL records is version dependent by nature and the
+necessary changes for our easy way are small, so not much effort is lost.
+
+A full annotation capability is a complex endeavour on its own as the parts of
+the code generating the relevant WAL records has somewhat complex requirements
+and cannot easily be configured from the outside.
+
+Currently this is contained in the http://archives.postgresql.org/message-id/1347669575-14371-6-git-send-email-andres@2ndquadrant.com[Log enough data into the wal to reconstruct logical changes from it] patch.
+
+=== WAL parsing & decoding ===
+
+The main complexity when reading the WAL as stored on disk is that the format
+is somewhat complex and the existing parser is too deeply integrated in the
+recovery system to be directly reusable. Once a reusable parser exists decoding
+the binary data into individual WAL records is a small problem.
+
+Currently two competing proposals for this module exist, each having its own
+merits. In the grand scheme of this proposal it is irrelevant which one gets
+picked as long as the functionality gets integrated.
+
+The mailing list post
+http:http://archives.postgresql.org/message-id/1347669575-14371-3-git-send-email-andres@2ndquadrant.com[Add
+support for a generic wal reading facility dubbed XLogReader] contains both
+competing patches and discussion around which one is preferable.
+
+Once the WAL has been decoded into individual records two major issues exist:
+
+1. records from different transactions and even individual user level actions
+are intermingled
+1. the data attached to records cannot be interpreted on its own, it is only
+meaningful with a lot of required information (including table, columns, types
+and more)
+
+The solution to the first issue is described in the next section: <<tx-reassembly>>
+
+The second problem is probably the reason why no mature solution to reuse the
+WAL for logical changeset generation exists today. See the <<snapbuilder>>
+paragraph for some details.
+
+As decoding, Transaction reassembly and Snapshot building are interdependent
+they currently are implemented in the same patch:
+http://archives.postgresql.org/message-id/1347669575-14371-8-git-send-email-andres@2ndquadrant.com[Introduce
+wal decoding via catalog timetravel]
+
+That patch also includes a small demonstration that the approach works in the
+presence of DDL:
+
+[[example-of-decoding]]
+.Decoding example
+[NOTE]
+---------------------------
+/* just so we keep a sensible xmin horizon */
+ROLLBACK PREPARED 'f';
+BEGIN;
+CREATE TABLE keepalive();
+PREPARE TRANSACTION 'f';
+
+DROP TABLE IF EXISTS replication_example;
+
+SELECT pg_current_xlog_insert_location();
+CHECKPOINT;
+CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text
+varchar(120));
+begin;
+INSERT INTO replication_example(somedata, text) VALUES (1, 1);
+INSERT INTO replication_example(somedata, text) VALUES (1, 2);
+commit;
+
+
+ALTER TABLE replication_example ADD COLUMN bar int;
+
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 1, 4);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 2, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 3, 4);
+INSERT INTO replication_example(somedata, text, bar) VALUES (2, 4, NULL);
+COMMIT;
+
+/* slightly more complex schema change, still no table rewrite */
+ALTER TABLE replication_example DROP COLUMN bar;
+INSERT INTO replication_example(somedata, text) VALUES (3, 1);
+
+BEGIN;
+INSERT INTO replication_example(somedata, text) VALUES (3, 2);
+INSERT INTO replication_example(somedata, text) VALUES (3, 3);
+commit;
+
+ALTER TABLE replication_example RENAME COLUMN text TO somenum;
+
+INSERT INTO replication_example(somedata, somenum) VALUES (4, 1);
+
+/* complex schema change, changing types of existing column, rewriting the table */
+ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING
+(somenum::int4);
+
+INSERT INTO replication_example(somedata, somenum) VALUES (5, 1);
+
+SELECT pg_current_xlog_insert_location();
+
+/* now decode what has been written to the WAL during that time */
+
+SELECT decode_xlog('0/1893D78', '0/18BE398');
+
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:1 somedata[int4]:1 text[varchar]:1
+WARNING: tuple is: id[int4]:2 somedata[int4]:1 text[varchar]:2
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:3 somedata[int4]:2 text[varchar]:1 bar[int4]:4
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:4 somedata[int4]:2 text[varchar]:2 bar[int4]:4
+WARNING: tuple is: id[int4]:5 somedata[int4]:2 text[varchar]:3 bar[int4]:4
+WARNING: tuple is: id[int4]:6 somedata[int4]:2 text[varchar]:4 bar[int4]:
+(null)
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:7 somedata[int4]:3 text[varchar]:1
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:8 somedata[int4]:3 text[varchar]:2
+WARNING: tuple is: id[int4]:9 somedata[int4]:3 text[varchar]:3
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:10 somedata[int4]:4 somenum[varchar]:1
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: COMMIT
+WARNING: BEGIN
+WARNING: tuple is: id[int4]:11 somedata[int4]:5 somenum[int4]:1
+WARNING: COMMIT
+
+---------------------------
+
+[[tx-reassembly]]
+=== TX reassembly ===
+
+In order to make usage of the decoded stream easy we want to present the user
+level code with a correctly ordered image of individual transactions at once
+because otherwise every user will have to reassemble transactions themselves.
+
+Transaction reassembly needs to solve several problems:
+
+1. changes inside a transaction can be interspersed with other transactions
+1. a top level transaction only knows which subtransactions belong to it when
+it reads the commit record
+1. individual user level actions can be smeared over multiple records (TOAST)
+
+Our proposed module solves 1) and 2) by building individual streams of records
+split by xid. While not fully implemented yet we plan to spill those individual
+xid streams to disk after a certain amount of memory is used. This can be
+implemented without any change in the external interface.
+
+As all the individual streams are already sorted by LSN by definition - we
+build them from the wal in a FIFO manner, and the position in the WAL is the
+definition of the LSN footnote:[the LSN is just the byte position int the WAL
+stream] - the individual changes can be merged efficiently by a k-way merge
+(without sorting!) by keeping the individual streams in a binary heap.
+
+To manipulate the binary heap a generic implementation is proposed. Several
+independent implementations of binary heaps already exist in the postgres code,
+but none of them is generic. The patch is available at
+http://archives.postgresql.org/message-id/1347669575-14371-2-git-send-email-andres@2ndquadrant.com[Add
+minimal binary heap implementation].
+
+[NOTE]
+============
+The reassembly component was previously coined ApplyCache because it was
+proposed to run on replication consumers just before applying changes. This is
+not the case anymore.
+
+It is still called that way in the source of the patch recently submitted.
+============
+
+[[snapbuilder]]
+=== Snapshot building ===
+
+To decode the contents of wal records describing data changes we need to decode
+and transform their contents. A single tuple is stored in a data structure
+called HeapTuple. As stored on disk that structure doesn't contain any
+information about the format of its contents.
+
+The basic problem is twofold:
+
+1. The wal records only contain the relfilenode not the relation oid of a table
+11. The relfilenode changes when an action performing a full table rewrite is performed
+1. To interpret a HeapTuple correctly the exact schema definition from back
+when the wal record was inserted into the wal stream needs to be available
+
+We chose to implement timetraveling access to the system catalog using
+postgres' MVCC nature & implementation because of the following advantages:
+
+* low amount of additional data in wal
+* genericity
+* similarity of implementation to Hot Standby, quite a bit of the infrastructure is reusable
+* all kinds of DDL can be handled in reliable manner
+* extensibility to user defined catalog like tables
+
+Timetravel access to the catalog means that we are able to look at the catalog
+just as it looked when changes were generated. That allows us to get the
+correct information about the contents of the aforementioned HeapTuple's so we
+can decode them reliably.
+
+Other solutions we thought about that fell through:
+* catalog only proxy instances that apply schema changes exactly to the point
+ were decoding using ``old fashioned'' wal replay
+* do the decoding on a 2nd machine, replicating all DDL exactly, rely on the catalog there
+* do not allow DDL at all
+* always add enough data into the WAL to allow decoding
+* build a fully versioned catalog
+
+The email thread available under
+http://archives.postgresql.org/message-id/201206211341.25322.andres@2ndquadrant.com[Catalog/Metadata
+consistency during changeset extraction from WAL] contains some details,
+advantages and disadvantages about the different possible implementations.
+
+How we build snapshots is somewhat intricate and complicated and seems to be
+out of scope for this document. We will provide a second document discussing
+the implementation in detail. Let's just assume it is possible from here on.
+
+[NOTE]
+Some details are already available in comments inside 'src/backend/replication/logical/snapbuild.{c,h}'.
+
+=== Output Plugin ===
+
+As already mentioned previously our aim is to make the implementation of output
+plugins as simple and non-redundant as possible as we expect several different
+ones with different use cases to emerge quickly. See <<basic-schema>> for a
+list of possible output plugins that we think might emerge.
+
+Although we for now only plan to tackle logical replication and based on that a
+multi-master implementation in the near future we definitely aim to provide all
+use-cases with something easily useable!
+
+To decode and translate local transaction an output plugin needs to be able to
+transform transactions as a whole so it can apply them as a meaningful
+transaction at the other side.
+
+What we do to provide that is, that very time we find a transaction commit and
+thus have completed reassembling the transaction we start to provide the
+individual changes to the output plugin. It currently only has to fill out 3
+callbacks:
+[options="header"]
+|=====================================================================================================================================
+|Callback |Passed Parameters |Called per TX | Use
+|begin |xid |once |Begin of a reassembled transaction
+|change |xid, subxid, change, mvcc snapshot |every change |Gets passed every change so it can transform it to the target format
+|commit |xid |once |End of a reassembled transaction
+|=====================================================================================================================================
+
+During each of those callback an appropriate timetraveling SnapshotNow snapshot
+is setup so the callbacks can perform all read-only catalog accesses they need,
+including using the sys/rel/catcache. For obvious reasons only read access is
+allowed.
+
+The snapshot guarantees that the result of lookups are be the same as they
+were/would have been when the change was originally created.
+
+Additionally they get passed a MVCC snapshot, to e.g. run sql queries on
+catalogs or similar.
+
+[IMPORTANT]
+============
+At the moment none of these snapshots can be used to access normal user
+tables. Adding additional tables to the allowed set is easy implementation
+wise, but every transaction changing such tables incurs a noticeably higher
+overhead.
+============
+
+For now transactions won't be decoded/output in parallel. There are ideas to
+improve on this, but we don't think the complexity is appropriate for the first
+release of this feature.
+
+This is an adoption barrier for databases where large amounts of data get
+loaded/written in one transaction.
+
+=== Setup of replication nodes ===
+
+When setting up a new standby/consumer of a primary some problem exist
+independent of the implementation of the consumer. The gist of the problem is
+that when making a base backup and starting to stream all changes since that
+point transactions that were running during all this cannot be included:
+
+* Transaction that have not committed before starting to dump a database are
+ invisible to the dumping process
+
+* Transactions that began before the point from which on the WAL is being
+ decoded are incomplete and cannot be replayed
+
+Our proposal for a solution to this is to detect points in the WAL stream where we can provide:
+
+. A snapshot exported similarly to pg_export_snapshot() footnote:[http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION] that can be imported with +SET TRANSACTION SNAPSHOT+ footnote:[http://www.postgresql.org/docs/devel/static/sql-set-transaction.html]
+. A stream of changes that will include the complete data of all transactions seen as running by the snapshot generated in 1)
+
+See the diagram.
+
+[[setup-schema]]
+.Control flow during setup of a new node
+["ditaa",scaling="0.7"]
+------------------------------------------------------------------------------
++----------------+
+| Walsender | | +------------+
+| v | | Consumer |
++-------------+ |<--IDENTIFY_SYSTEM-------------| |
+| WAL | | | |
+| decoding | |----....---------------------->| |
++------+------/ | | |
+| | | | |
+| v | | |
++-------------+ |<--INIT_LOGICAL $PLUGIN--------| |
+| TX | | | |
+| reassembly | |---FOUND_STARTING %X/%X------->| |
++-------------/ | | |
+| | |---FOUND_CONSISTENT %X/%X----->| |
+| v |---pg_dump snapshot----------->| |
++-------------+ |---replication slot %P-------->| |
+| Output | | | |
+| Plugin | | ^ | |
++-------------/ | | | |
+| | +-run pg_dump separately --| |
+| | | |
+| |<--STREAM_DATA-----------------| |
+| | | |
+| |---data ---------------------->| |
+| | | |
+| | | |
+| | ---- SHUTDOWN ------------- | |
+| | | |
+| | | |
+| |<--RESTART_LOGICAL $PLUGIN %P--| |
+| | | |
+| |---data----------------------->| |
+| | | |
+| | | |
++----------------+ +------------+
+
+------------------------------------------------------------------------------
+
+=== Disadvantages of the approach ===
+
+* somewhat intricate code for snapshot timetravel
+* output plugins/walsenders need to work per database as they access the catalog
+* when sending to multiple standbys some work is done multiple times
+* decoding/applying multiple transactions in parallel is somewhat hard
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 310a45c..6fae278 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -17,3 +17,9 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
OBJS = decode.o logical.o logicalfuncs.o reorderbuffer.o snapbuild.o
include $(top_srcdir)/src/backend/common.mk
+
+DESIGN.pdf: DESIGN.txt
+ a2x -v --fop -f pdf -D $(shell pwd) $<
+
+README.SNAPBUILD.pdf: README.SNAPBUILD.txt
+ a2x -v --fop -f pdf -D $(shell pwd) $<
diff --git a/src/backend/replication/logical/README.SNAPBUILD.txt b/src/backend/replication/logical/README.SNAPBUILD.txt
new file mode 100644
index 0000000..b6c7470
--- /dev/null
+++ b/src/backend/replication/logical/README.SNAPBUILD.txt
@@ -0,0 +1,241 @@
+= Snapshot Building =
+:author: Andres Freund, 2nQuadrant Ltd
+
+== Why do we need timetravel catalog access ==
+
+When doing WAL decoding (see DESIGN.txt for reasons to do so), we need to know
+how the catalog looked at the point a record was inserted into the WAL, because
+without that information we don't know much more about the record other than
+its length. It's just an arbitrary bunch of bytes without further information.
+Unfortunately, due the possibility that the table definition might change we
+cannot just access a newer version of the catalog and assume the table
+definition continues to be the same.
+
+If only the type information were required, it might be enough to annotate the
+wal records with a bit more information (table oid, table name, column name,
+column type) --- but as we want to be able to convert the output to more useful
+formats such as text, we additionally need to be able to call output functions.
+Those need a normal environment including the usual caches and normal catalog
+access to lookup operators, functions and other types.
+
+Our solution to this is to add the capability to access the catalog such as it
+was at the time the record was inserted into the WAL. The locking used during
+WAL generation guarantees the catalog is/was in a consistent state at that
+point. We call this 'time-travel catalog access'.
+
+Interesting cases include:
+
+- enums
+- composite types
+- extension types
+- non-C functions
+- relfilenode to table OID mapping
+
+Due to postgres' non-overwriting storage manager, regular modifications of a
+table's content are theoretically non-destructive. The problem is that there is
+no way to access an arbitrary point in time even if the data for it is there.
+
+This module adds the capability to do so in the very limited set of
+circumstances we need it in for WAL decoding. It does *not* provide a general
+time-travelling facility.
+
+A 'Snapshot' is the data structure used in postgres to describe which tuples
+are visible and which are not. We need to build a Snapshot which can be used to
+access the catalog the way it looked when the wal record was inserted.
+
+Restrictions:
+
+- Only works for catalog tables or tables explicitly marked as such.
+- Snapshot modifications are somewhat expensive
+- it cannot build initial visibility information for every point in time, it
+ needs a specific circumstances to start.
+
+== How are time-travel snapshots built ==
+
+'Hot Standby' added infrastructure to build snapshots from WAL during recovery in
+the 9.0 release. Most of that can be reused for our purposes.
+
+We cannot reuse all of the hot standby infrastructure because:
+
+- we are not in recovery
+- we need to look at interim states *inside* a transaction
+- we need the capability to have multiple different snapshots arround at the same time
+
+Normally the catalog is accessed using SnapshotNow which can legally be
+replaced by SnapshotMVCC that has been taken at the start of a scan. So catalog
+timetravel contains infrastructure to make SnapshotNow catalog access use
+appropriate MVCC snapshots. They aren't generated with GetSnapshotData()
+though, but reassembled from WAL contents.
+
+We collect our data in a normal struct SnapshotData, repurposing some fields
+creatively:
+
+- +Snapshot->xip+ contains all transaction we consider committed
+- +Snapshot->subxip+ contains all transactions belonging to our transaction,
+ including the toplevel one
+- +Snapshot->active_count+ is used as a refcount
+
+The meaning of +xip+ is inverted in comparison with non-timetravel snapshots in
+the sense that members of the array are the committed transactions, not the in
+progress ones. Because usually only a tiny percentage of comitted transactions
+will have modified the catalog between xmin and xmax this allows us to keep the
+array small in the usual cases. It also makes subtransaction handling easier
+since we neither need to query pg_subtrans (which we couldn't anyway since it's
+truncated at restart) nor have problems with suboverflowed snapshots.
+
+== Building of initial snapshot ==
+
+We can start building an initial snapshot as soon as we find either an
++XLOG_RUNNING_XACTS+ or an +XLOG_CHECKPOINT_SHUTDOWN+ record because they allow us
+to know how many transactions are running.
+
+We need to know which transactions were running when we start to build a
+snapshot/start decoding as we don't have enough information about them (they
+could have done catalog modifications before we started watching). Also, we
+wouldn't have the complete contents of those transactions, because we started
+reading after they began. (The latter is also important when building
+snapshots that can be used to build a consistent initial clone.)
+
+There also is the problem that +XLOG_RUNNING_XACT+ records can be
+'suboverflowed' which means there were more running subtransactions than
+fitting into shared memory. In that case we use the same incremental building
+trick hot standby uses which is either
+
+1. wait till further +XLOG_RUNNING_XACT+ records have a running->oldestRunningXid
+after the initial xl_runnign_xacts->nextXid
+2. wait for a further +XLOG_RUNNING_XACT+ that is not overflowed or
+a +XLOG_CHECKPOINT_SHUTDOWN+
+
+When we start building a snapshot we are in the +SNAPBUILD_START+ state. As
+soon as we find any visibility information, even if incomplete, we change to
++SNAPBUILD_INITIAL_POINT+.
+
+When we have collected enough information to decode any transaction starting
+after that point in time we fall over to +SNAPBUILD_FULL_SNAPSHOT+. If those
+transactions commit before the next state is reached, we throw their complete
+contents away.
+
+As soon as all transactions that were running when we switched over to
++SNAPBUILD_FULL_SNAPSHOT+ commit, we change state to +SNAPBUILD_CONSISTENT+.
+Every transaction that commits from now on gets handed to the output plugin.
+When doing the switch to +SNAPBUILD_CONSISTENT+ we optionally export a snapshot
+which makes all transactions that committed up to this point visible. This
+exported snapshot can be used to run pg_dump; replaying all changes emitted
+by the output plugin on a database restored from such a dump will result in
+a consistent clone.
+
+["ditaa",scaling="0.8"]
+---------------
+
+ +-------------------------+
+ +----|SNAPBUILD_START |-------------+
+ | +-------------------------+ |
+ | | |
+ | | |
+ | running_xacts with running xacts |
+ | | |
+ | | |
+ | v |
+ | +-------------------------+ v
+ | |SNAPBUILD_FULL_SNAPSHOT |------------>|
+ | +-------------------------+ |
+XLOG_RUNNING_XACTS | saved snapshot
+ with zero xacts | at running_xacts's lsn
+ | | |
+ | all running toplevel TXNs finished |
+ | | |
+ | v |
+ | +-------------------------+ |
+ +--->|SNAPBUILD_CONSISTENT |<------------+
+ +-------------------------+
+
+---------------
+
+== Snapshot Management ==
+
+Whenever a transaction is detected as having started during decoding in
++SNAPBUILD_FULL_SNAPSHOT+ state, we distribute the currently maintained
+snapshot to it (i.e. call ReorderBufferSetBaseSnapshot). This serves as its
+initial snapshot. Unless there are concurrent catalog changes that snapshot
+will be used for the decoding the entire transaction's changes.
+
+Whenever a transaction-with-catalog-changes commits, we iterate over all
+concurrently active transactions and add a new SnapshotNow to it
+(ReorderBufferAddSnapshot(current_lsn)). This is required because any row
+written from now that point on will have used the changed catalog contents.
+
+When decoding a transaction that made catalog changes itself we tell that
+transaction that (ReorderBufferAddNewCommandId(current_lsn)) which will cause
+the decoding to use the appropriate command id from that point on.
+
+SnapshotNow's need to be setup globally so the syscache and other pieces access
+it transparently. This is done using two new tqual.h functions:
+SetupDecodingSnapshots() and RevertFromDecodingSnapshots().
+
+== Catalog/User Table Detection ==
+
+Since we only want to store committed transactions that actually modified the
+catalog we need a way to detect that from WAL:
+
+Right now, we assume that every transaction that commits before we reach
++SNAPBUILD_CONSISTENT+ state has made catalog modifications since we can't rely
+on having seen the entire transaction before that. That's not harmful beside
+incurring some price in memory usage and runtime.
+
+After having reached consistency we recognize catalog modifying transactions
+via HEAP2_NEW_CID and HEAP_INPLACE that are logged by catalog modifying
+actions.
+
+== mixed DDL/DML transaction handling ==
+
+When a transactions uses DDL and DML in the same transaction things get a bit
+more complicated because we need to handle CommandIds and ComboCids as we need
+to use the correct version of the catalog when decoding the individual tuples.
+
+For that we emit the new HEAP2_NEW_CID records which contain the physical tuple
+location, cmin and cmax when the catalog is modified. If we need to detect
+visibility of a catalog tuple that has been modified in our own transaction -
+which we can detect via xmin/xmax - we look in a hash table using the location
+as key to get correct cmin/cmax values.
+From those values we can also extract the commandid that generated the record.
+
+All this only needs to happen in the transaction performing the DDL.
+
+== Cache Handling ==
+
+As we allow usage of the normal {sys,cat,rel,..}cache we also need to integrate
+cache invalidation. For transactions that only do DDL thats easy as everything
+is already provided by HS. Everytime we read a commit record we apply the
+sinval messages contained therein.
+
+For transactions that contain DDL and DML cache invalidation needs to happen
+more frequently because we need to all tore down all caches that just got
+modified. To do that we simply apply all invalidation messages that got
+collected at the end of transaction and apply them everytime we've decoded
+single change. At some point this can get optimized by generating new local
+invalidation messages, but that seems too complicated for now.
+
+XXX: talk about syscache handling of relmapped relation.
+
+== xmin Horizon Handling ==
+
+Reusing MVCC for timetravel access has one obvious major problem: VACUUM. Rows
+we still need for decoding cannot be removed but at the same time we cannot
+keep data in the catalog indefinitely.
+
+For that we peg the xmin horizon that's used to decide which rows can be
+removed. We only need to prevent removal of those rows for catalog like
+relations, not for all user tables. For that reason a separate xmin horizon
+RecentGlobalDataXmin got introduced.
+
+Since we need to persist that knowledge across restarts we keep the xmin for a
+in the logical slots which are safed in a crashsafe manner. They are restored
+from disk into memory at server startup.
+
+== Restartable Decoding ==
+
+As we want to generate a consistent stream of changes we need to have the
+ability to start from a previously decoded location without waiting possibly
+very long to reach consistency. For that reason we dump the current visibility
+information to disk everytime we read an xl_running_xacts record.
+
--
1.8.2.rc2.4.g7799588.dirty
0001-Add-support-for-multiple-kinds-of-external-toast-dat.patchtext/x-patch; charset=us-asciiDownload
>From 654e24e9a615dcacea4d9714cf8cdbf6953983d5 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 01/17] Add support for multiple kinds of external toast datums
There are several usecases where our current representation of external toast
datums is limiting:
* adding new compression schemes
* avoidance of repeated detoasting
* externally decoded toast tuples
For that support 'tags' on external (varattrib_1b_e) varlenas which recoin the
current va_len_1be field to store the tag (or type) of a varlena. To determine
the actual length a macro VARTAG_SIZE(tag) is added which can be used to map
from a tag to the actual length.
This patch adds support for 'indirect' tuples which point to some externally
allocated memory containing a toast tuple. It also implements the stub for a
different compression algorithm.
---
src/backend/access/heap/tuptoaster.c | 100 +++++++++++++++++++++++++++++++----
src/include/c.h | 2 +
src/include/postgres.h | 83 +++++++++++++++++++++--------
3 files changed, 153 insertions(+), 32 deletions(-)
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..99044d0 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -128,7 +128,7 @@ heap_tuple_fetch_attr(struct varlena * attr)
struct varlena *
heap_tuple_untoast_attr(struct varlena * attr)
{
- if (VARATT_IS_EXTERNAL(attr))
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
{
/*
* This is an externally stored datum --- fetch it back from there
@@ -145,6 +145,15 @@ heap_tuple_untoast_attr(struct varlena * attr)
pfree(tmp);
}
}
+ else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ attr = (struct varlena *)redirect.pointer;
+ Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+ attr = heap_tuple_untoast_attr(attr);
+ }
else if (VARATT_IS_COMPRESSED(attr))
{
/*
@@ -191,7 +200,7 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
char *attrdata;
int32 attrsize;
- if (VARATT_IS_EXTERNAL(attr))
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
{
struct varatt_external toast_pointer;
@@ -204,6 +213,13 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
/* fetch it back (compressed marker will get set automatically) */
preslice = toast_fetch_datum(attr);
}
+ else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ return heap_tuple_untoast_attr_slice(redirect.pointer,
+ sliceoffset, slicelength);
+ }
else
preslice = attr;
@@ -267,7 +283,7 @@ toast_raw_datum_size(Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
Size result;
- if (VARATT_IS_EXTERNAL(attr))
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
{
/* va_rawsize is the size of the original datum -- including header */
struct varatt_external toast_pointer;
@@ -275,6 +291,13 @@ toast_raw_datum_size(Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
result = toast_pointer.va_rawsize;
}
+ else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect toast_pointer;
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+ return toast_raw_datum_size(PointerGetDatum(toast_pointer.pointer));
+ }
else if (VARATT_IS_COMPRESSED(attr))
{
/* here, va_rawsize is just the payload size */
@@ -308,7 +331,7 @@ toast_datum_size(Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
Size result;
- if (VARATT_IS_EXTERNAL(attr))
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
{
/*
* Attribute is stored externally - return the extsize whether
@@ -320,6 +343,13 @@ toast_datum_size(Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
result = toast_pointer.va_extsize;
}
+ else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect toast_pointer;
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+ return toast_datum_size(PointerGetDatum(toast_pointer.pointer));
+ }
else if (VARATT_IS_SHORT(attr))
{
result = VARSIZE_SHORT(attr);
@@ -387,12 +417,56 @@ toast_delete(Relation rel, HeapTuple oldtup)
{
Datum value = toast_values[i];
- if (!toast_isnull[i] && VARATT_IS_EXTERNAL(PointerGetDatum(value)))
+ if (toast_isnull[i])
+ continue;
+ else if (VARATT_IS_EXTERNAL_ONDISK(PointerGetDatum(value)))
toast_delete_datum(rel, value);
+ else if (VARATT_IS_EXTERNAL_INDIRECT(PointerGetDatum(value)))
+ elog(ERROR, "cannot delete tuples with indirect toast tuples for now");
}
}
}
+/* ----------
+ * toast_datum_differs -
+ *
+ * Determine whether two toasted datums are the same and don't have to be
+ * stored again.
+ * ----------
+ */
+static bool
+toast_datum_differs(struct varlena *old_value, struct varlena *new_value)
+{
+ Assert(VARATT_IS_EXTERNAL(old_value));
+ Assert(VARATT_IS_EXTERNAL(new_value));
+
+ /* fast path for the common case where we have the toast oid available */
+ if (VARATT_IS_EXTERNAL_ONDISK(old_value) &&
+ VARATT_IS_EXTERNAL_ONDISK(new_value))
+ return memcmp((char *) old_value, (char *) new_value,
+ VARSIZE_EXTERNAL(old_value)) != 0;
+
+ /*
+ * compare size of tuples, so we don't uselessly detoast/decompress tuples
+ * if they can't be the same anyway.
+ */
+ if (toast_raw_datum_size(PointerGetDatum(old_value)) !=
+ toast_raw_datum_size(PointerGetDatum(new_value)))
+ return false;
+
+ old_value = heap_tuple_untoast_attr(old_value);
+ new_value = heap_tuple_untoast_attr(new_value);
+
+ Assert(!VARATT_IS_EXTERNAL(old_value));
+ Assert(!VARATT_IS_EXTERNAL(new_value));
+ Assert(!VARATT_IS_COMPRESSED(old_value));
+ Assert(!VARATT_IS_COMPRESSED(new_value));
+ Assert(VARSIZE_ANY_EXHDR(old_value) == VARSIZE_ANY_EXHDR(new_value));
+
+ /* compare payload, we're fine with unaligned data */
+ return memcmp(VARDATA_ANY(old_value), VARDATA_ANY(new_value),
+ VARSIZE_ANY_EXHDR(old_value)) != 0;
+}
/* ----------
* toast_insert_or_update -
@@ -497,8 +571,7 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
VARATT_IS_EXTERNAL(old_value))
{
if (toast_isnull[i] || !VARATT_IS_EXTERNAL(new_value) ||
- memcmp((char *) old_value, (char *) new_value,
- VARSIZE_EXTERNAL(old_value)) != 0)
+ toast_datum_differs(old_value, new_value))
{
/*
* The old external stored value isn't needed any more
@@ -1258,6 +1331,8 @@ toast_save_datum(Relation rel, Datum value,
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ Assert(!VARATT_IS_EXTERNAL(value));
+
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
@@ -1341,7 +1416,7 @@ toast_save_datum(Relation rel, Datum value,
{
struct varatt_external old_toast_pointer;
- Assert(VARATT_IS_EXTERNAL(oldexternal));
+ Assert(VARATT_IS_EXTERNAL_ONDISK(oldexternal));
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(old_toast_pointer, oldexternal);
if (old_toast_pointer.va_toastrelid == rel->rd_toastoid)
@@ -1456,7 +1531,7 @@ toast_save_datum(Relation rel, Datum value,
* Create the TOAST pointer value that we'll return
*/
result = (struct varlena *) palloc(TOAST_POINTER_SIZE);
- SET_VARSIZE_EXTERNAL(result, TOAST_POINTER_SIZE);
+ SET_VARTAG_EXTERNAL(result, VARTAG_ONDISK);
memcpy(VARDATA_EXTERNAL(result), &toast_pointer, sizeof(toast_pointer));
return PointerGetDatum(result);
@@ -1483,6 +1558,8 @@ toast_delete_datum(Relation rel, Datum value)
if (!VARATT_IS_EXTERNAL(attr))
return;
+ Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1608,6 +1685,9 @@ toast_fetch_datum(struct varlena * attr)
char *chunkdata;
int32 chunksize;
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ elog(ERROR, "shouldn't be called this way");
+
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1775,7 +1855,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chcpystrt;
int32 chcpyend;
- Assert(VARATT_IS_EXTERNAL(attr));
+ Assert(VARATT_IS_EXTERNAL_ONDISK(attr));
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
diff --git a/src/include/c.h b/src/include/c.h
index f2c9e12..7193af6 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -573,6 +573,8 @@ typedef NameData *Name;
#define AssertMacro(condition) ((void)true)
#define AssertArg(condition)
#define AssertState(condition)
+#define TrapMacro(condition, errorType) (true)
+
#elif defined(FRONTEND)
#include <assert.h>
diff --git a/src/include/postgres.h b/src/include/postgres.h
index 30e1dee..d982e93 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -54,23 +54,52 @@
*/
/*
- * struct varatt_external is a "TOAST pointer", that is, the information
- * needed to fetch a stored-out-of-line Datum. The data is compressed
- * if and only if va_extsize < va_rawsize - VARHDRSZ. This struct must not
- * contain any padding, because we sometimes compare pointers using memcmp.
+ * struct varatt_external is a "TOAST pointer", that is, the information needed
+ * to fetch a Datum stored in an out-of-line on-disk Datum. The data is
+ * compressed if and only if va_extsize < va_rawsize - VARHDRSZ. This struct
+ * must not contain any padding, because we sometimes compare pointers using
+ * memcmp.
*
* Note that this information is stored unaligned within actual tuples, so
* you need to memcpy from the tuple into a local struct variable before
* you can look at these fields! (The reason we use memcmp is to avoid
* having to do that just to detect equality of two TOAST pointers...)
*/
-struct varatt_external
+typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
int32 va_extsize; /* External saved size (doesn't) */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
-};
+} varatt_external;
+
+/*
+ * Out-of-line Datum thats stored in memory in contrast to varatt_external
+ * pointers which points to data in an external toast relation.
+ *
+ * Note that just as varatt_external's this is stored unaligned within the
+ * tuple.
+ */
+typedef struct varatt_indirect
+{
+ struct varlena *pointer; /* Pointer to in-memory varlena */
+} varatt_indirect;
+
+
+/*
+ * Type of external toast datum stored. The peculiar value for VARTAG_ONDISK
+ * comes from the requirement for on-disk compatibility with the older
+ * definitions of varattrib_1b_e where v_tag was named va_len_1be...
+ */
+typedef enum vartag_external {
+ VARTAG_INDIRECT = 1,
+ VARTAG_ONDISK = 18
+} vartag_external;
+
+#define VARTAG_SIZE(tag) \
+ ((tag) == VARTAG_INDIRECT ? sizeof(varatt_indirect) : \
+ (tag) == VARTAG_ONDISK ? sizeof(varatt_external) : \
+ TrapMacro(false, "unknown vartag"))
/*
* These structs describe the header of a varlena object that may have been
@@ -102,11 +131,12 @@ typedef struct
char va_data[1]; /* Data begins here */
} varattrib_1b;
+/* inline portion of a short varlena pointing to an external resource */
typedef struct
{
uint8 va_header; /* Always 0x80 or 0x01 */
- uint8 va_len_1be; /* Physical length of datum */
- char va_data[1]; /* Data (for now always a TOAST pointer) */
+ uint8 va_tag; /* Type of datum */
+ char va_data[1]; /* Data (of the type indicated by va_tag) */
} varattrib_1b_e;
/*
@@ -130,6 +160,9 @@ typedef struct
* first byte. Also, it is not possible for a 1-byte length word to be zero;
* this lets us disambiguate alignment padding bytes from the start of an
* unaligned datum. (We now *require* pad bytes to be filled with zero!)
+ *
+ * In TOAST datums the tag field in varattrib_1b_e is used to discern whether
+ * its an indirection pointer or more commonly an on-disk tuple.
*/
/*
@@ -161,8 +194,8 @@ typedef struct
(((varattrib_4b *) (PTR))->va_4byte.va_header & 0x3FFFFFFF)
#define VARSIZE_1B(PTR) \
(((varattrib_1b *) (PTR))->va_header & 0x7F)
-#define VARSIZE_1B_E(PTR) \
- (((varattrib_1b_e *) (PTR))->va_len_1be)
+#define VARTAG_1B_E(PTR) \
+ (((varattrib_1b_e *) (PTR))->va_tag)
#define SET_VARSIZE_4B(PTR,len) \
(((varattrib_4b *) (PTR))->va_4byte.va_header = (len) & 0x3FFFFFFF)
@@ -170,9 +203,9 @@ typedef struct
(((varattrib_4b *) (PTR))->va_4byte.va_header = ((len) & 0x3FFFFFFF) | 0x40000000)
#define SET_VARSIZE_1B(PTR,len) \
(((varattrib_1b *) (PTR))->va_header = (len) | 0x80)
-#define SET_VARSIZE_1B_E(PTR,len) \
+#define SET_VARTAG_1B_E(PTR,tag) \
(((varattrib_1b_e *) (PTR))->va_header = 0x80, \
- ((varattrib_1b_e *) (PTR))->va_len_1be = (len))
+ ((varattrib_1b_e *) (PTR))->va_tag = (tag))
#else /* !WORDS_BIGENDIAN */
#define VARATT_IS_4B(PTR) \
@@ -193,8 +226,8 @@ typedef struct
((((varattrib_4b *) (PTR))->va_4byte.va_header >> 2) & 0x3FFFFFFF)
#define VARSIZE_1B(PTR) \
((((varattrib_1b *) (PTR))->va_header >> 1) & 0x7F)
-#define VARSIZE_1B_E(PTR) \
- (((varattrib_1b_e *) (PTR))->va_len_1be)
+#define VARTAG_1B_E(PTR) \
+ (((varattrib_1b_e *) (PTR))->va_tag)
#define SET_VARSIZE_4B(PTR,len) \
(((varattrib_4b *) (PTR))->va_4byte.va_header = (((uint32) (len)) << 2))
@@ -202,12 +235,12 @@ typedef struct
(((varattrib_4b *) (PTR))->va_4byte.va_header = (((uint32) (len)) << 2) | 0x02)
#define SET_VARSIZE_1B(PTR,len) \
(((varattrib_1b *) (PTR))->va_header = (((uint8) (len)) << 1) | 0x01)
-#define SET_VARSIZE_1B_E(PTR,len) \
+#define SET_VARTAG_1B_E(PTR,tag) \
(((varattrib_1b_e *) (PTR))->va_header = 0x01, \
- ((varattrib_1b_e *) (PTR))->va_len_1be = (len))
+ ((varattrib_1b_e *) (PTR))->va_tag = (tag))
#endif /* WORDS_BIGENDIAN */
-#define VARHDRSZ_SHORT 1
+#define VARHDRSZ_SHORT offsetof(varattrib_1b, va_data)
#define VARATT_SHORT_MAX 0x7F
#define VARATT_CAN_MAKE_SHORT(PTR) \
(VARATT_IS_4B_U(PTR) && \
@@ -215,7 +248,7 @@ typedef struct
#define VARATT_CONVERTED_SHORT_SIZE(PTR) \
(VARSIZE(PTR) - VARHDRSZ + VARHDRSZ_SHORT)
-#define VARHDRSZ_EXTERNAL 2
+#define VARHDRSZ_EXTERNAL offsetof(varattrib_1b_e, va_data)
#define VARDATA_4B(PTR) (((varattrib_4b *) (PTR))->va_4byte.va_data)
#define VARDATA_4B_C(PTR) (((varattrib_4b *) (PTR))->va_compressed.va_data)
@@ -249,26 +282,32 @@ typedef struct
#define VARSIZE_SHORT(PTR) VARSIZE_1B(PTR)
#define VARDATA_SHORT(PTR) VARDATA_1B(PTR)
-#define VARSIZE_EXTERNAL(PTR) VARSIZE_1B_E(PTR)
+#define VARTAG_EXTERNAL(PTR) VARTAG_1B_E(PTR)
+#define VARSIZE_EXTERNAL(PTR) (VARHDRSZ_EXTERNAL + VARTAG_SIZE(VARTAG_EXTERNAL(PTR)))
#define VARDATA_EXTERNAL(PTR) VARDATA_1B_E(PTR)
#define VARATT_IS_COMPRESSED(PTR) VARATT_IS_4B_C(PTR)
#define VARATT_IS_EXTERNAL(PTR) VARATT_IS_1B_E(PTR)
+#define VARATT_IS_EXTERNAL_ONDISK(PTR) \
+ (VARATT_IS_EXTERNAL(PTR) && VARTAG_EXTERNAL(PTR) == VARTAG_ONDISK)
+#define VARATT_IS_EXTERNAL_INDIRECT(PTR) \
+ (VARATT_IS_EXTERNAL(PTR) && VARTAG_EXTERNAL(PTR) == VARTAG_INDIRECT)
#define VARATT_IS_SHORT(PTR) VARATT_IS_1B(PTR)
#define VARATT_IS_EXTENDED(PTR) (!VARATT_IS_4B_U(PTR))
#define SET_VARSIZE(PTR, len) SET_VARSIZE_4B(PTR, len)
#define SET_VARSIZE_SHORT(PTR, len) SET_VARSIZE_1B(PTR, len)
#define SET_VARSIZE_COMPRESSED(PTR, len) SET_VARSIZE_4B_C(PTR, len)
-#define SET_VARSIZE_EXTERNAL(PTR, len) SET_VARSIZE_1B_E(PTR, len)
+
+#define SET_VARTAG_EXTERNAL(PTR, tag) SET_VARTAG_1B_E(PTR, tag)
#define VARSIZE_ANY(PTR) \
- (VARATT_IS_1B_E(PTR) ? VARSIZE_1B_E(PTR) : \
+ (VARATT_IS_1B_E(PTR) ? VARSIZE_EXTERNAL(PTR) : \
(VARATT_IS_1B(PTR) ? VARSIZE_1B(PTR) : \
VARSIZE_4B(PTR)))
#define VARSIZE_ANY_EXHDR(PTR) \
- (VARATT_IS_1B_E(PTR) ? VARSIZE_1B_E(PTR)-VARHDRSZ_EXTERNAL : \
+ (VARATT_IS_1B_E(PTR) ? VARSIZE_EXTERNAL(PTR)-VARHDRSZ_EXTERNAL : \
(VARATT_IS_1B(PTR) ? VARSIZE_1B(PTR)-VARHDRSZ_SHORT : \
VARSIZE_4B(PTR)-VARHDRSZ))
--
1.8.2.rc2.4.g7799588.dirty
0002-wal_decoding-Add-pg_xlog_wait_remote_-apply-receive-.patchtext/x-patch; charset=us-asciiDownload
>From d86b884c00fbb0eb52523b322c6d4cb83e0e351f Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen <ams@2ndQuadrant.com>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 02/17] wal_decoding: Add pg_xlog_wait_remote_{apply,receive}
functions
We want to use these in isolationtester tests, but they're more
generally useful for "inter-node synchronisation".
---
src/backend/replication/walsender.c | 73 +++++++++++++++++++++++++++++++++++++
src/include/catalog/pg_proc.h | 5 +++
src/include/replication/walsender.h | 2 +
3 files changed, 80 insertions(+)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 717cbfd..9f5f766 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2083,3 +2083,76 @@ GetOldestWALSendPointer(void)
}
#endif
+
+static XLogRecPtr
+text_to_xlogrecptr(text *str)
+{
+ uint32 hi, lo;
+ char *pos = text_to_cstring(str);
+
+ if (sscanf(pos, "%X/%X", &hi, &lo) != 2)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse transaction log location \"%s\"",
+ pos)));
+
+ return ((uint64) hi) << 32 | lo;
+}
+
+static void
+wait_for_remote_lsn(int32 pid, XLogRecPtr ptr, bool wait_for_apply)
+{
+ int i;
+ bool done;
+
+ do {
+ done = true;
+
+ for (i = 0; i < max_wal_senders; i++)
+ {
+ volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+
+ SpinLockAcquire(&walsnd->mutex);
+
+ if (walsnd->pid != 0 && (pid == 0 || pid == walsnd->pid))
+ {
+ XLogRecPtr rptr = wait_for_apply ? walsnd->apply : walsnd->flush;
+ if (rptr < ptr)
+ done = false;
+ }
+
+ SpinLockRelease(&walsnd->mutex);
+
+ if (!done)
+ break;
+ }
+
+ if (!done)
+ pg_usleep(10*1000);
+ }
+ while (!done);
+}
+
+Datum
+pg_xlog_wait_remote_apply(PG_FUNCTION_ARGS)
+{
+ text *pos = PG_GETARG_TEXT_P(0);
+ int32 pid = PG_GETARG_INT32(1);
+
+ XLogRecPtr startpos = text_to_xlogrecptr(pos);
+ wait_for_remote_lsn(pid, startpos, true);
+
+ PG_RETURN_VOID();
+}
+
+Datum
+pg_xlog_wait_remote_receive(PG_FUNCTION_ARGS)
+{
+ text *pos = PG_GETARG_TEXT_P(0);
+ int32 pid = PG_GETARG_INT32(1);
+
+ XLogRecPtr startpos = text_to_xlogrecptr(pos);
+ wait_for_remote_lsn(pid, startpos, false);
+
+ PG_RETURN_VOID();
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b5be075..6d3d702 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4722,6 +4722,11 @@ DATA(insert OID = 3473 ( spg_range_quad_leaf_consistent PGNSP PGUID 12 1 0 0 0
DESCR("SP-GiST support for quad tree over range");
+DATA(insert OID = 3781 ( pg_xlog_wait_remote_apply PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2278 "25 23" _null_ _null_ _null_ _null_ pg_xlog_wait_remote_apply _null_ _null_ _null_ ));
+DESCR("wait for an lsn to be applied by a remote node");
+DATA(insert OID = 3782 ( pg_xlog_wait_remote_receive PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2278 "25 23" _null_ _null_ _null_ _null_ pg_xlog_wait_remote_receive _null_ _null_ _null_ ));
+DESCR("wait for an lsn to be received by a remote node");
+
/* event triggers */
DATA(insert OID = 3566 ( pg_event_trigger_dropped_objects PGNSP PGUID 12 10 100 0 0 f f f f t t s 0 0 2249 "" "{26,26,23,25,25,25,25}" "{o,o,o,o,o,o,o}" "{classid, objid, objsubid, object_type, schema_name, object_name, object_identity}" _null_ pg_event_trigger_dropped_objects _null_ _null_ _null_ ));
DESCR("list objects dropped by the current command");
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2cc7ddf..84a418a 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -35,6 +35,8 @@ extern void WalSndWakeup(void);
extern void WalSndRqstFileReload(void);
extern Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS);
+extern Datum pg_xlog_wait_remote_apply(PG_FUNCTION_ARGS);
+extern Datum pg_xlog_wait_remote_receive(PG_FUNCTION_ARGS);
/*
* Remember that we want to wakeup walsenders later
--
1.8.2.rc2.4.g7799588.dirty
0003-wal_decoding-Add-a-new-RELFILENODE-syscache-to-fetch.patchtext/x-patch; charset=us-asciiDownload
>From 6ee904e27e4e01c4e46f671fc807ece5da40ff28 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 03/17] wal_decoding: Add a new RELFILENODE syscache to fetch a
pg_class entry via (reltablespace, relfilenode)
This cache is theoretically problematic because formally indexes used by
syscaches needs to be unique, this one is not. This is "just" because of
0/InvalidOid are stored in pg_class.relfilenode for nailed/shared catalog
relations. This syscache will never be queried for InvalidOid relfilenodes
however so it seems to be safe even if it bends the rules somewhat.
It might be nicer to add infrastructure to do this properly, like using a
partial index, its not clear what the best way to do this is though and the
benefit very well might not be worth the overhead.
Needs a CATVERSION bump.
---
src/backend/utils/cache/syscache.c | 11 +++++++++++
src/include/catalog/indexing.h | 2 ++
src/include/utils/syscache.h | 1 +
3 files changed, 14 insertions(+)
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index ecb0f96..e83b5f1 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -591,6 +591,17 @@ static const struct cachedesc cacheinfo[] = {
},
64
},
+ {RelationRelationId, /* RELFILENODE */
+ ClassTblspcRelfilenodeIndexId,
+ 2,
+ {
+ Anum_pg_class_reltablespace,
+ Anum_pg_class_relfilenode,
+ 0,
+ 0
+ },
+ 1024
+ },
{RelationRelationId, /* RELNAMENSP */
ClassNameNspIndexId,
2,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 19268fb..4860e98 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -106,6 +106,8 @@ DECLARE_UNIQUE_INDEX(pg_class_oid_index, 2662, on pg_class using btree(oid oid_o
#define ClassOidIndexId 2662
DECLARE_UNIQUE_INDEX(pg_class_relname_nsp_index, 2663, on pg_class using btree(relname name_ops, relnamespace oid_ops));
#define ClassNameNspIndexId 2663
+DECLARE_INDEX(pg_class_tblspc_relfilenode_index, 3455, on pg_class using btree(reltablespace oid_ops, relfilenode oid_ops));
+#define ClassTblspcRelfilenodeIndexId 3455
DECLARE_UNIQUE_INDEX(pg_collation_name_enc_nsp_index, 3164, on pg_collation using btree(collname name_ops, collencoding int4_ops, collnamespace oid_ops));
#define CollationNameEncNspIndexId 3164
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index d1d8abe..2a14905 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -75,6 +75,7 @@ enum SysCacheIdentifier
PROCNAMEARGSNSP,
PROCOID,
RANGETYPE,
+ RELFILENODE,
RELNAMENSP,
RELOID,
RULERELNAME,
--
1.8.2.rc2.4.g7799588.dirty
0004-wal_decoding-Add-RelationMapFilenodeToOid-function-t.patchtext/x-patch; charset=us-asciiDownload
>From b0ea75b0e4e594b645ba7e779b6f630c3628b5f7 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 04/17] wal_decoding: Add RelationMapFilenodeToOid function to
relmapper.c
This function maps (reltablespace, relfilenode) to the table oid and thus acts
as a reverse of RelationMapOidToFilenode.
---
src/backend/utils/cache/relmapper.c | 53 +++++++++++++++++++++++++++++++++++++
src/include/utils/relmapper.h | 2 ++
2 files changed, 55 insertions(+)
diff --git a/src/backend/utils/cache/relmapper.c b/src/backend/utils/cache/relmapper.c
index 2c7d9f3..039aa29 100644
--- a/src/backend/utils/cache/relmapper.c
+++ b/src/backend/utils/cache/relmapper.c
@@ -180,6 +180,59 @@ RelationMapOidToFilenode(Oid relationId, bool shared)
return InvalidOid;
}
+/* RelationMapFilenodeToOid
+ *
+ * Do the reverse of the normal direction of mapping done in
+ * RelationMapOidToFilenode.
+ *
+ * This is not supposed to be used during normal running but rather for
+ * information purposes when looking at the filesystem or the xlog.
+ *
+ * Returns InvalidOid if the OID is not know which can easily happen if the
+ * filenode is not of a relation that is nailed or shared or if it simply
+ * doesn't exists anywhere.
+ */
+Oid
+RelationMapFilenodeToOid(Oid filenode, bool shared)
+{
+ const RelMapFile *map;
+ int32 i;
+
+ /* If there are active updates, believe those over the main maps */
+ if (shared)
+ {
+ map = &active_shared_updates;
+ for (i = 0; i < map->num_mappings; i++)
+ {
+ if (filenode == map->mappings[i].mapfilenode)
+ return map->mappings[i].mapoid;
+ }
+ map = &shared_map;
+ for (i = 0; i < map->num_mappings; i++)
+ {
+ if (filenode == map->mappings[i].mapfilenode)
+ return map->mappings[i].mapoid;
+ }
+ }
+ else
+ {
+ map = &active_local_updates;
+ for (i = 0; i < map->num_mappings; i++)
+ {
+ if (filenode == map->mappings[i].mapfilenode)
+ return map->mappings[i].mapoid;
+ }
+ map = &local_map;
+ for (i = 0; i < map->num_mappings; i++)
+ {
+ if (filenode == map->mappings[i].mapfilenode)
+ return map->mappings[i].mapoid;
+ }
+ }
+
+ return InvalidOid;
+}
+
/*
* RelationMapUpdateMap
*
diff --git a/src/include/utils/relmapper.h b/src/include/utils/relmapper.h
index 8f0b438..071bc98 100644
--- a/src/include/utils/relmapper.h
+++ b/src/include/utils/relmapper.h
@@ -36,6 +36,8 @@ typedef struct xl_relmap_update
extern Oid RelationMapOidToFilenode(Oid relationId, bool shared);
+extern Oid RelationMapFilenodeToOid(Oid relationId, bool shared);
+
extern void RelationMapUpdateMap(Oid relationId, Oid fileNode, bool shared,
bool immediate);
--
1.8.2.rc2.4.g7799588.dirty
0005-wal_decoding-Add-pg_relation_by_filenode-to-lookup-u.patchtext/x-patch; charset=us-asciiDownload
>From f77a55bdf01c6997428bbf7e1bedac771998a95c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 05/17] wal_decoding: Add pg_relation_by_filenode to lookup up
a relation by (tablespace, filenode)
This requires the previously added RELFILENODE syscache and the added
RelationMapFilenodeToOid function added in previous two commits.
---
doc/src/sgml/func.sgml | 23 ++++++++++++++-
src/backend/utils/adt/dbsize.c | 63 ++++++++++++++++++++++++++++++++++++++++++
src/include/catalog/pg_proc.h | 2 ++
src/include/utils/builtins.h | 2 ++
4 files changed, 89 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4c5af4b..a8f83e2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15726,7 +15726,7 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
<para>
The functions shown in <xref linkend="functions-admin-dblocation"> assist
- in identifying the specific disk files associated with database objects.
+ in identifying the specific disk files associated with database objects or doing the reverse.
</para>
<indexterm>
@@ -15735,6 +15735,9 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
<indexterm>
<primary>pg_relation_filepath</primary>
</indexterm>
+ <indexterm>
+ <primary>pg_relation_by_filenode</primary>
+ </indexterm>
<table id="functions-admin-dblocation">
<title>Database Object Location Functions</title>
@@ -15763,6 +15766,15 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
File path name of the specified relation
</entry>
</row>
+ <row>
+ <entry>
+ <literal><function>pg_relation_by_filenode(<parameter>tablespace</parameter> <type>oid</type>, <parameter>filenode</parameter> <type>oid</type>)</function></literal>
+ </entry>
+ <entry><type>regclass</type></entry>
+ <entry>
+ Find the associated relation of a filenode
+ </entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -15786,6 +15798,15 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
the relation.
</para>
+ <para>
+ <function>pg_relation_by_filenode</> is the reverse of
+ <function>pg_relation_filenode</>. Given a <quote>tablespace</> OID and
+ a <quote>filenode</> it returns the associated relation. The default
+ tablespace for user tables can be replaced with 0. Check the
+ documentation of <function>pg_relation_filenode</> for an explanation why
+ this cannot always easily answered by querying <structname>pg_class</>.
+ </para>
+
</sect2>
<sect2 id="functions-admin-genfile">
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4c4e1ed..ce5f49e 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -746,6 +746,69 @@ pg_relation_filenode(PG_FUNCTION_ARGS)
}
/*
+ * Get the relation via (reltablespace, relfilenode)
+ *
+ * This is expected to be used when somebody wants to match an individual file
+ * on the filesystem back to its table. Thats not trivially possible via
+ * pg_class because that doesn't contain the relfilenodes of shared and nailed
+ * tables.
+ *
+ * We don't fail but return NULL if we cannot find a mapping.
+ *
+ * Instead of knowing DEFAULTTABLESPACE_OID you can pass 0.
+ */
+Datum
+pg_relation_by_filenode(PG_FUNCTION_ARGS)
+{
+ Oid reltablespace = PG_GETARG_OID(0);
+ Oid relfilenode = PG_GETARG_OID(1);
+ Oid lookup_tablespace;
+ Oid heaprel = InvalidOid;
+ HeapTuple tuple;
+
+ if (reltablespace == 0)
+ reltablespace = DEFAULTTABLESPACE_OID;
+
+ /* in global tablespace, has to be a shared table */
+ if (reltablespace == GLOBALTABLESPACE_OID)
+ {
+ heaprel = RelationMapFilenodeToOid(relfilenode, true);
+ }
+ else
+ {
+ /*
+ * relations in the default tablespace are stored with InvalidOid as
+ * pg_class."reltablespace".
+ */
+ if (reltablespace == DEFAULTTABLESPACE_OID)
+ lookup_tablespace = InvalidOid;
+ else
+ lookup_tablespace = reltablespace;
+
+
+ tuple = SearchSysCache2(RELFILENODE,
+ lookup_tablespace,
+ relfilenode);
+ /* ok, found it */
+ if (HeapTupleIsValid(tuple))
+ {
+ heaprel = HeapTupleHeaderGetOid(tuple->t_data);
+ ReleaseSysCache(tuple);
+ }
+ /* has to be nonexistant or a nailed table, but not shared */
+ else
+ {
+ heaprel = RelationMapFilenodeToOid(relfilenode, false);
+ }
+ }
+
+ if (!OidIsValid(heaprel))
+ PG_RETURN_NULL();
+ else
+ PG_RETURN_OID(heaprel);
+}
+
+/*
* Get the pathname (relative to $PGDATA) of a relation
*
* See comments for pg_relation_filenode.
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 6d3d702..8d268dd 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3446,6 +3446,8 @@ DATA(insert OID = 2998 ( pg_indexes_size PGNSP PGUID 12 1 0 0 0 f f f f t f v 1
DESCR("disk space usage for all indexes attached to the specified table");
DATA(insert OID = 2999 ( pg_relation_filenode PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 26 "2205" _null_ _null_ _null_ _null_ pg_relation_filenode _null_ _null_ _null_ ));
DESCR("filenode identifier of relation");
+DATA(insert OID = 3454 ( pg_relation_by_filenode PGNSP PGUID 12 1 0 0 0 f f f f t f s 2 0 2205 "26 26" _null_ _null_ _null_ _null_ pg_relation_by_filenode _null_ _null_ _null_ ));
+DESCR("filenode identifier of relation");
DATA(insert OID = 3034 ( pg_relation_filepath PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 25 "2205" _null_ _null_ _null_ _null_ pg_relation_filepath _null_ _null_ _null_ ));
DESCR("file path of relation");
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 667c58b..ddbedea 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -459,8 +459,10 @@ extern Datum pg_size_pretty(PG_FUNCTION_ARGS);
extern Datum pg_size_pretty_numeric(PG_FUNCTION_ARGS);
extern Datum pg_table_size(PG_FUNCTION_ARGS);
extern Datum pg_indexes_size(PG_FUNCTION_ARGS);
+extern Datum pg_relation_by_filenode(PG_FUNCTION_ARGS);
extern Datum pg_relation_filenode(PG_FUNCTION_ARGS);
extern Datum pg_relation_filepath(PG_FUNCTION_ARGS);
+extern Datum pg_relation_is_scannable(PG_FUNCTION_ARGS);
/* genfile.c */
extern bytea *read_binary_file(const char *filename,
--
1.8.2.rc2.4.g7799588.dirty
0006-wal_decoding-Introduce-InvalidCommandId-and-declare-.patchtext/x-patch; charset=us-asciiDownload
>From cb12f56b401bba484ad82f14079450cd83dfe673 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 06/17] wal_decoding: Introduce InvalidCommandId and declare
that to be the new maximum for CommandCounterIncrement
This is useful to be able to represent a CommandId thats invalid. There was no
such value before.
This decreases the possible number of subtransactions by one which seems
unproblematic. Its also not a problem for pg_upgrade because cmin/cmax are
never looked at outside the context of their own transaction (spare timetravel
access, but thats new anyway).
---
src/backend/access/transam/xact.c | 4 ++--
src/include/c.h | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 31e868d..0591f3f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -766,12 +766,12 @@ CommandCounterIncrement(void)
if (currentCommandIdUsed)
{
currentCommandId += 1;
- if (currentCommandId == FirstCommandId) /* check for overflow */
+ if (currentCommandId == InvalidCommandId)
{
currentCommandId -= 1;
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("cannot have more than 2^32-1 commands in a transaction")));
+ errmsg("cannot have more than 2^32-2 commands in a transaction")));
}
currentCommandIdUsed = false;
diff --git a/src/include/c.h b/src/include/c.h
index 7193af6..e4940a9 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -368,6 +368,7 @@ typedef uint32 MultiXactOffset;
typedef uint32 CommandId;
#define FirstCommandId ((CommandId) 0)
+#define InvalidCommandId (~(CommandId)0)
/*
* Array indexing support
--
1.8.2.rc2.4.g7799588.dirty
0007-wal_decoding-Adjust-all-Satisfies-routines-to-take-a.patchtext/x-patch; charset=us-asciiDownload
>From 01b26c322b3f02beea0bfb42ab783c70e4a9c970 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 07/17] wal_decoding: Adjust all *Satisfies routines to take a
HeapTuple instead of a HeapTupleHeader
For the regular satisfies routines this is needed in prepareation of logical
decoding. I changed the non-regular ones for consistency as well.
The naming between htup, tuple and similar is rather confused, I could not find
any consistent naming anywhere.
This is preparatory work for the logical decoding feature which needs to be
able to get to a valid relfilenode from when checking the visibility of a
tuple.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/heap/heapam.c | 13 ++++----
src/backend/access/heap/pruneheap.c | 17 +++++++++--
src/backend/catalog/index.c | 2 +-
src/backend/commands/analyze.c | 3 +-
src/backend/commands/cluster.c | 2 +-
src/backend/commands/vacuumlazy.c | 11 ++++---
src/backend/executor/nodeBitmapHeapscan.c | 1 +
src/backend/storage/lmgr/predicate.c | 2 +-
src/backend/utils/time/tqual.c | 50 +++++++++++++++++++++++++------
src/include/utils/snapshot.h | 4 +--
src/include/utils/tqual.h | 20 ++++++-------
12 files changed, 90 insertions(+), 37 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 075d781..8d8e78e 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -131,7 +131,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
/* must hold a buffer lock to call HeapTupleSatisfiesUpdate */
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
- htsu = HeapTupleSatisfiesUpdate(tuple->t_data,
+ htsu = HeapTupleSatisfiesUpdate(tuple,
GetCurrentCommandId(false),
scan->rs_cbuf);
xmax = HeapTupleHeaderGetRawXmax(tuple->t_data);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e88dd30..fdf0ccd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -384,6 +384,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
HeapTupleData loctup;
bool valid;
+ loctup.t_tableOid = RelationGetRelid(scan->rs_rd);
loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
loctup.t_len = ItemIdGetLength(lpp);
ItemPointerSet(&(loctup.t_self), page, lineoff);
@@ -1698,7 +1699,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
heapTuple->t_len = ItemIdGetLength(lp);
- heapTuple->t_tableOid = relation->rd_id;
+ heapTuple->t_tableOid = RelationGetRelid(relation);
heapTuple->t_self = *tid;
/*
@@ -1746,7 +1747,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
* transactions.
*/
if (all_dead && *all_dead &&
- !HeapTupleIsSurelyDead(heapTuple->t_data, RecentGlobalXmin))
+ !HeapTupleIsSurelyDead(heapTuple, RecentGlobalXmin))
*all_dead = false;
/*
@@ -1876,6 +1877,7 @@ heap_get_latest_tid(Relation relation,
tp.t_self = ctid;
tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
tp.t_len = ItemIdGetLength(lp);
+ tp.t_tableOid = RelationGetRelid(relation);
/*
* After following a t_ctid link, we might arrive at an unrelated
@@ -2574,12 +2576,13 @@ heap_delete(Relation relation, ItemPointer tid,
lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
Assert(ItemIdIsNormal(lp));
+ tp.t_tableOid = RelationGetRelid(relation);
tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
tp.t_len = ItemIdGetLength(lp);
tp.t_self = *tid;
l1:
- result = HeapTupleSatisfiesUpdate(tp.t_data, cid, buffer);
+ result = HeapTupleSatisfiesUpdate(&tp, cid, buffer);
if (result == HeapTupleInvisible)
{
@@ -3053,7 +3056,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
l2:
checked_lockers = false;
locker_remains = false;
- result = HeapTupleSatisfiesUpdate(oldtup.t_data, cid, buffer);
+ result = HeapTupleSatisfiesUpdate(&oldtup, cid, buffer);
/* see below about the "no wait" case */
Assert(result != HeapTupleBeingUpdated || wait);
@@ -3924,7 +3927,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
tuple->t_tableOid = RelationGetRelid(relation);
l3:
- result = HeapTupleSatisfiesUpdate(tuple->t_data, cid, *buffer);
+ result = HeapTupleSatisfiesUpdate(tuple, cid, *buffer);
if (result == HeapTupleInvisible)
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2ab723d..3b68705 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -339,6 +339,9 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
OffsetNumber chainitems[MaxHeapTuplesPerPage];
int nchain = 0,
i;
+ HeapTupleData tup;
+
+ tup.t_tableOid = RelationGetRelid(relation);
rootlp = PageGetItemId(dp, rootoffnum);
@@ -348,6 +351,12 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
if (ItemIdIsNormal(rootlp))
{
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
+
+ tup.t_data = htup;
+ tup.t_len = ItemIdGetLength(rootlp);
+ tup.t_tableOid = RelationGetRelid(relation);
+ ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), rootoffnum);
+
if (HeapTupleHeaderIsHeapOnly(htup))
{
/*
@@ -368,7 +377,7 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (HeapTupleSatisfiesVacuum(htup, OldestXmin, buffer)
+ if (HeapTupleSatisfiesVacuum(&tup, OldestXmin, buffer)
== HEAPTUPLE_DEAD && !HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -431,6 +440,10 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
Assert(ItemIdIsNormal(lp));
htup = (HeapTupleHeader) PageGetItem(dp, lp);
+ tup.t_data = htup;
+ tup.t_len = ItemIdGetLength(lp);
+ ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), offnum);
+
/*
* Check the tuple XMIN against prior XMAX, if any
*/
@@ -448,7 +461,7 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (HeapTupleSatisfiesVacuum(htup, OldestXmin, buffer))
+ switch (HeapTupleSatisfiesVacuum(&tup, OldestXmin, buffer))
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5f61ecb..ba5c84b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2271,7 +2271,7 @@ IndexBuildHeapScan(Relation heapRelation,
*/
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
- switch (HeapTupleSatisfiesVacuum(heapTuple->t_data, OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(heapTuple, OldestXmin,
scan->rs_cbuf))
{
case HEAPTUPLE_DEAD:
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d6d20fd..9845b0b 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1138,10 +1138,11 @@ acquire_sample_rows(Relation onerel, int elevel,
ItemPointerSet(&targtuple.t_self, targblock, targoffset);
+ targtuple.t_tableOid = RelationGetRelid(onerel);
targtuple.t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
targtuple.t_len = ItemIdGetLength(itemid);
- switch (HeapTupleSatisfiesVacuum(targtuple.t_data,
+ switch (HeapTupleSatisfiesVacuum(&targtuple,
OldestXmin,
targbuffer))
{
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 095d5e4..5064081 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -958,7 +958,7 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
LockBuffer(buf, BUFFER_LOCK_SHARE);
- switch (HeapTupleSatisfiesVacuum(tuple->t_data, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
{
case HEAPTUPLE_DEAD:
/* Definitely dead */
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 078b822..2ea0590 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -151,7 +151,7 @@ static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
ItemPointer itemptr);
static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
static int vac_cmp_itemptr(const void *left, const void *right);
-static bool heap_page_is_all_visible(Buffer buf,
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId *visibility_cutoff_xid);
@@ -756,10 +756,11 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
+ tuple.t_tableOid = RelationGetRelid(onerel);
tupgone = false;
- switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
{
case HEAPTUPLE_DEAD:
@@ -1168,7 +1169,7 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
* check if the page has become all-visible.
*/
if (!visibilitymap_test(onerel, blkno, vmbuffer) &&
- heap_page_is_all_visible(buffer, &visibility_cutoff_xid))
+ heap_page_is_all_visible(onerel, buffer, &visibility_cutoff_xid))
{
Assert(BufferIsValid(*vmbuffer));
PageSetAllVisible(page);
@@ -1676,7 +1677,7 @@ vac_cmp_itemptr(const void *left, const void *right)
* xmin amongst the visible tuples.
*/
static bool
-heap_page_is_all_visible(Buffer buf, TransactionId *visibility_cutoff_xid)
+heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId *visibility_cutoff_xid)
{
Page page = BufferGetPage(buf);
OffsetNumber offnum,
@@ -1718,6 +1719,8 @@ heap_page_is_all_visible(Buffer buf, TransactionId *visibility_cutoff_xid)
Assert(ItemIdIsNormal(itemid));
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+ tuple.t_len = ItemIdGetLength(itemid);
+ tuple.t_tableOid = RelationGetRelid(rel);
switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
{
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index d2b2721..9534439 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -258,6 +258,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
scan->rs_ctup.t_len = ItemIdGetLength(lp);
+ scan->rs_ctup.t_tableOid = scan->rs_rd->rd_id;
ItemPointerSet(&scan->rs_ctup.t_self, tbmres->blockno, targoffset);
pgstat_count_heap_fetch(scan->rs_rd);
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index b012df1..d656d62 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -3895,7 +3895,7 @@ CheckForSerializableConflictOut(bool visible, Relation relation,
* tuple is visible to us, while HeapTupleSatisfiesVacuum checks what else
* is going on with it.
*/
- htsvResult = HeapTupleSatisfiesVacuum(tuple->t_data, TransactionXmin, buffer);
+ htsvResult = HeapTupleSatisfiesVacuum(tuple, TransactionXmin, buffer);
switch (htsvResult)
{
case HEAPTUPLE_LIVE:
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index ab4020a..3254a2d 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -163,8 +163,12 @@ HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
* Xmax is not committed))) that has not been committed
*/
bool
-HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
{
+ HeapTupleHeader tuple = htup->t_data;
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -351,8 +355,12 @@ HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
*
*/
bool
-HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesNow(HeapTuple htup, Snapshot snapshot, Buffer buffer)
{
+ HeapTupleHeader tuple = htup->t_data;
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -526,7 +534,7 @@ HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
* Dummy "satisfies" routine: any tuple satisfies SnapshotAny.
*/
bool
-HeapTupleSatisfiesAny(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesAny(HeapTuple htup, Snapshot snapshot, Buffer buffer)
{
return true;
}
@@ -546,9 +554,13 @@ HeapTupleSatisfiesAny(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
* table.
*/
bool
-HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
+HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
Buffer buffer)
{
+ HeapTupleHeader tuple = htup->t_data;
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -627,9 +639,13 @@ HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
* distinguish that case must test for it themselves.)
*/
HTSU_Result
-HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
+HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer)
{
+ HeapTupleHeader tuple = htup->t_data;
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -849,9 +865,13 @@ HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
* for snapshot->xmax and the tuple's xmax.
*/
bool
-HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
+HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
Buffer buffer)
{
+ HeapTupleHeader tuple = htup->t_data;
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
snapshot->xmin = snapshot->xmax = InvalidTransactionId;
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
@@ -1040,9 +1060,13 @@ HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
* can't see it.)
*/
bool
-HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
+HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
Buffer buffer)
{
+ HeapTupleHeader tuple = htup->t_data;
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
@@ -1233,9 +1257,13 @@ HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
* even if we see that the deleting transaction has committed.
*/
HTSV_Result
-HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
+HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer)
{
+ HeapTupleHeader tuple = htup->t_data;
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
/*
* Has inserting transaction committed?
*
@@ -1464,8 +1492,12 @@ HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
* just whether or not the tuple is surely dead).
*/
bool
-HeapTupleIsSurelyDead(HeapTupleHeader tuple, TransactionId OldestXmin)
+HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
{
+ HeapTupleHeader tuple = htup->t_data;
+ Assert(ItemPointerIsValid(&htup->t_self));
+ Assert(htup->t_tableOid != InvalidOid);
+
/*
* If the inserting transaction is marked invalid, then it aborted, and
* the tuple is definitely dead. If it's marked neither committed nor
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index e747191..ed3f586 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -27,8 +27,8 @@ typedef struct SnapshotData *Snapshot;
* The specific semantics of a snapshot are encoded by the "satisfies"
* function.
*/
-typedef bool (*SnapshotSatisfiesFunc) (HeapTupleHeader tuple,
- Snapshot snapshot, Buffer buffer);
+typedef bool (*SnapshotSatisfiesFunc) (HeapTuple htup,
+ Snapshot snapshot, Buffer buffer);
typedef struct SnapshotData
{
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 465231c..800e366 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -52,7 +52,7 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
* if so, the indicated buffer is marked dirty.
*/
#define HeapTupleSatisfiesVisibility(tuple, snapshot, buffer) \
- ((*(snapshot)->satisfies) ((tuple)->t_data, snapshot, buffer))
+ ((*(snapshot)->satisfies) (tuple, snapshot, buffer))
/* Result codes for HeapTupleSatisfiesVacuum */
typedef enum
@@ -65,25 +65,25 @@ typedef enum
} HTSV_Result;
/* These are the "satisfies" test routines for the various snapshot types */
-extern bool HeapTupleSatisfiesMVCC(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesMVCC(HeapTuple htup,
Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesNow(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesNow(HeapTuple htup,
Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesSelf(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesSelf(HeapTuple htup,
Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesAny(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesAny(HeapTuple htup,
Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesToast(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesToast(HeapTuple htup,
Snapshot snapshot, Buffer buffer);
-extern bool HeapTupleSatisfiesDirty(HeapTupleHeader tuple,
+extern bool HeapTupleSatisfiesDirty(HeapTuple htup,
Snapshot snapshot, Buffer buffer);
/* Special "satisfies" routines with different APIs */
-extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTupleHeader tuple,
+extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple htup,
CommandId curcid, Buffer buffer);
-extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTupleHeader tuple,
+extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup,
TransactionId OldestXmin, Buffer buffer);
-extern bool HeapTupleIsSurelyDead(HeapTupleHeader tuple,
+extern bool HeapTupleIsSurelyDead(HeapTuple htup,
TransactionId OldestXmin);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
1.8.2.rc2.4.g7799588.dirty
0008-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patchtext/x-patch; charset=us-asciiDownload
>From 19bb80af95eee295361dc8882e7032e6c3505898 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 08/17] wal_decoding: Allow walsender's to connect to a
specific database
Currently the decision whether to connect to a database or not is made by
checking whether the passed "dbname" parameter is "replication". Unfortunately
this makes it impossible to connect a to a database named replication...
This is useful for future walsender commands which need database interaction.
---
src/backend/postmaster/postmaster.c | 7 ++++--
.../libpqwalreceiver/libpqwalreceiver.c | 4 ++--
src/backend/replication/walsender.c | 27 ++++++++++++++++++----
src/backend/utils/init/postinit.c | 5 ++++
src/bin/pg_basebackup/pg_basebackup.c | 4 ++--
src/bin/pg_basebackup/pg_receivexlog.c | 4 ++--
src/bin/pg_basebackup/receivelog.c | 4 ++--
7 files changed, 41 insertions(+), 14 deletions(-)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 87e6062..86f0686 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1955,10 +1955,13 @@ retry1:
if (strlen(port->user_name) >= NAMEDATALEN)
port->user_name[NAMEDATALEN - 1] = '\0';
- /* Walsender is not related to a particular database */
- if (am_walsender)
+ /* Generic Walsender is not related to a particular database */
+ if (am_walsender && strcmp(port->database_name, "replication") == 0)
port->database_name[0] = '\0';
+ if (am_walsender)
+ elog(WARNING, "connecting to %s", port->database_name);
+
/*
* Done putting stuff in TopMemoryContext.
*/
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6bc0aa1..ee0f1fe 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -130,7 +130,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
"the primary server: %s",
PQerrorMessage(streamConn))));
}
- if (PQnfields(res) != 3 || PQntuples(res) != 1)
+ if (PQnfields(res) != 4 || PQntuples(res) != 1)
{
int ntuples = PQntuples(res);
int nfields = PQnfields(res);
@@ -138,7 +138,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
PQclear(res);
ereport(ERROR,
(errmsg("invalid response from primary server"),
- errdetail("Expected 1 tuple with 3 fields, got %d tuples with %d fields.",
+ errdetail("Expected 1 tuple with 4 fields, got %d tuples with %d fields.",
ntuples, nfields)));
}
primary_sysid = PQgetvalue(res, 0, 0);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9f5f766..a421ec5 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -46,6 +46,7 @@
#include "access/transam.h"
#include "access/xlog_internal.h"
#include "catalog/pg_type.h"
+#include "commands/dbcommands.h"
#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -242,10 +243,12 @@ IdentifySystem(void)
char tli[11];
char xpos[MAXFNAMELEN];
XLogRecPtr logptr;
+ char* dbname = NULL;
/*
- * Reply with a result set with one row, three columns. First col is
- * system ID, second is timeline ID, and third is current xlog location.
+ * Reply with a result set with one row, four columns. First col is system
+ * ID, second is timeline ID, third is current xlog location and the fourth
+ * contains the database name if we are connected to one.
*/
snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
@@ -264,9 +267,14 @@ IdentifySystem(void)
snprintf(xpos, sizeof(xpos), "%X/%X", (uint32) (logptr >> 32), (uint32) logptr);
+ if (MyDatabaseId != InvalidOid)
+ dbname = get_database_name(MyDatabaseId);
+ else
+ dbname = "(none)";
+
/* Send a RowDescription message */
pq_beginmessage(&buf, 'T');
- pq_sendint(&buf, 3, 2); /* 3 fields */
+ pq_sendint(&buf, 4, 2); /* 4 fields */
/* first field */
pq_sendstring(&buf, "systemid"); /* col name */
@@ -294,17 +302,28 @@ IdentifySystem(void)
pq_sendint(&buf, -1, 2);
pq_sendint(&buf, 0, 4);
pq_sendint(&buf, 0, 2);
+
+ /* fourth field */
+ pq_sendstring(&buf, "dbname");
+ pq_sendint(&buf, 0, 4);
+ pq_sendint(&buf, 0, 2);
+ pq_sendint(&buf, TEXTOID, 4);
+ pq_sendint(&buf, -1, 2);
+ pq_sendint(&buf, 0, 4);
+ pq_sendint(&buf, 0, 2);
pq_endmessage(&buf);
/* Send a DataRow message */
pq_beginmessage(&buf, 'D');
- pq_sendint(&buf, 3, 2); /* # of columns */
+ pq_sendint(&buf, 4, 2); /* # of columns */
pq_sendint(&buf, strlen(sysid), 4); /* col1 len */
pq_sendbytes(&buf, (char *) &sysid, strlen(sysid));
pq_sendint(&buf, strlen(tli), 4); /* col2 len */
pq_sendbytes(&buf, (char *) tli, strlen(tli));
pq_sendint(&buf, strlen(xpos), 4); /* col3 len */
pq_sendbytes(&buf, (char *) xpos, strlen(xpos));
+ pq_sendint(&buf, strlen(dbname), 4); /* col4 len */
+ pq_sendbytes(&buf, (char *) dbname, strlen(dbname));
pq_endmessage(&buf);
}
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e0abff1..ca803cb 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -730,7 +730,12 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
ereport(FATAL,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser or replication role to start walsender")));
+ }
+ if (am_walsender &&
+ (in_dbname == NULL || in_dbname[0] == '\0') &&
+ dboid == InvalidOid)
+ {
/* process any options passed in the startup packet */
if (MyProcPort != NULL)
process_startup_options(MyProcPort, am_superuser);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 56657a4..93ee489 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1361,11 +1361,11 @@ BaseBackup(void)
progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
disconnect_and_exit(1);
}
- if (PQntuples(res) != 1 || PQnfields(res) != 3)
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
{
fprintf(stderr,
_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
- progname, PQntuples(res), PQnfields(res), 1, 3);
+ progname, PQntuples(res), PQnfields(res), 1, 4);
disconnect_and_exit(1);
}
sysidentifier = pg_strdup(PQgetvalue(res, 0, 0));
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 1850787..5fdae7d 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -252,11 +252,11 @@ StreamLog(void)
progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
disconnect_and_exit(1);
}
- if (PQntuples(res) != 1 || PQnfields(res) != 3)
+ if (PQntuples(res) != 1 || PQnfields(res) != 4)
{
fprintf(stderr,
_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
- progname, PQntuples(res), PQnfields(res), 1, 3);
+ progname, PQntuples(res), PQnfields(res), 1, 4);
disconnect_and_exit(1);
}
servertli = atoi(PQgetvalue(res, 0, 1));
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index 7ce8112..4a2eb78 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -532,11 +532,11 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
PQclear(res);
return false;
}
- if (PQnfields(res) != 3 || PQntuples(res) != 1)
+ if (PQnfields(res) != 4 || PQntuples(res) != 1)
{
fprintf(stderr,
_("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d fields\n"),
- progname, PQntuples(res), PQnfields(res), 1, 3);
+ progname, PQntuples(res), PQnfields(res), 1, 4);
PQclear(res);
return false;
}
--
1.8.2.rc2.4.g7799588.dirty
0009-wal_decoding-Add-alreadyLocked-parameter-to-GetOldes.patchtext/x-patch; charset=us-asciiDownload
>From 2c9d0b952cce025d4daa70b85b5a6456463f88b0 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 11 Jun 2013 23:25:27 +0200
Subject: [PATCH 09/17] wal_decoding: Add alreadyLocked parameter to
GetOldestXminNoLock
This is useful because it allows to compute the current OldestXmin while
already holding the procarray lock which enables setting the own xmin horizon
safely.
---
src/backend/access/transam/xlog.c | 4 ++--
src/backend/catalog/index.c | 3 ++-
src/backend/commands/analyze.c | 2 +-
src/backend/commands/vacuum.c | 4 ++--
src/backend/replication/walreceiver.c | 2 +-
src/backend/storage/ipc/procarray.c | 16 ++++++++--------
src/include/storage/procarray.h | 2 +-
7 files changed, 17 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 654c9c1..ac51193 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7165,7 +7165,7 @@ CreateCheckPoint(int flags)
* StartupSUBTRANS hasn't been called yet.
*/
if (!RecoveryInProgress())
- TruncateSUBTRANS(GetOldestXmin(true, false));
+ TruncateSUBTRANS(GetOldestXmin(true, false, false));
/* Real work is done, but log and update stats before releasing lock. */
LogCheckpointEnd(false);
@@ -7522,7 +7522,7 @@ CreateRestartPoint(int flags)
* this because StartupSUBTRANS hasn't been called yet.
*/
if (EnableHotStandby)
- TruncateSUBTRANS(GetOldestXmin(true, false));
+ TruncateSUBTRANS(GetOldestXmin(true, false, false));
/* Real work is done, but log and update before releasing lock. */
LogCheckpointEnd(true);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ba5c84b..bfad8b1 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2198,7 +2198,8 @@ IndexBuildHeapScan(Relation heapRelation,
{
snapshot = SnapshotAny;
/* okay to ignore lazy VACUUMs here */
- OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true);
+ OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true,
+ false);
}
scan = heap_beginscan_strat(heapRelation, /* relation */
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9845b0b..7968319 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1081,7 +1081,7 @@ acquire_sample_rows(Relation onerel, int elevel,
totalblocks = RelationGetNumberOfBlocks(onerel);
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
- OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true);
+ OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true, false);
/* Prepare for sampling block numbers */
BlockSampler_Init(&bs, totalblocks, targrows);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 641c740..924a12e 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -399,7 +399,7 @@ vacuum_set_xid_limits(int freeze_min_age,
* working on a particular table at any time, and that each vacuum is
* always an independent transaction.
*/
- *oldestXmin = GetOldestXmin(sharedRel, true);
+ *oldestXmin = GetOldestXmin(sharedRel, true, false);
Assert(TransactionIdIsNormal(*oldestXmin));
@@ -720,7 +720,7 @@ vac_update_datfrozenxid(void)
* committed pg_class entries for new tables; see AddNewRelationTuple().
* So we cannot produce a wrong minimum by starting with this.
*/
- newFrozenXid = GetOldestXmin(true, true);
+ newFrozenXid = GetOldestXmin(true, true, false);
/*
* Similarly, initialize the MultiXact "min" with the value that would be
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index a30464b..4c74d1b 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1137,7 +1137,7 @@ XLogWalRcvSendHSFeedback(bool immed)
* everything else has been checked.
*/
if (hot_standby_feedback)
- xmin = GetOldestXmin(true, false);
+ xmin = GetOldestXmin(true, false, false);
else
xmin = InvalidTransactionId;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b5f66fb..993efac 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1100,7 +1100,7 @@ TransactionIdIsActive(TransactionId xid)
* GetOldestXmin() move backwards, with no consequences for data integrity.
*/
TransactionId
-GetOldestXmin(bool allDbs, bool ignoreVacuum)
+GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked)
{
ProcArrayStruct *arrayP = procArray;
TransactionId result;
@@ -1109,7 +1109,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
/* Cannot look for individual databases during recovery */
Assert(allDbs || !RecoveryInProgress());
- LWLockAcquire(ProcArrayLock, LW_SHARED);
+ if (!alreadyLocked)
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
/*
* We initialize the MIN() calculation with latestCompletedXid + 1. This
@@ -1164,7 +1165,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
*/
TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
- LWLockRelease(ProcArrayLock);
+ if (!alreadyLocked)
+ LWLockRelease(ProcArrayLock);
if (TransactionIdIsNormal(kaxmin) &&
TransactionIdPrecedes(kaxmin, result))
@@ -1172,10 +1174,8 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
}
else
{
- /*
- * No other information needed, so release the lock immediately.
- */
- LWLockRelease(ProcArrayLock);
+ if (!alreadyLocked)
+ LWLockRelease(ProcArrayLock);
/*
* Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
@@ -1249,7 +1249,7 @@ GetMaxSnapshotSubxidCount(void)
* older than this are known not running any more.
* RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
* running transactions, except those running LAZY VACUUM). This is
- * the same computation done by GetOldestXmin(true, true).
+ * the same computation done by GetOldestXmin(true, true, ...).
*
* Note: this function should probably not be called with an argument that's
* not statically allocated (see xip allocation below).
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..fe0bad7 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -49,7 +49,7 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
extern bool TransactionIdIsActive(TransactionId xid);
-extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum);
+extern TransactionId GetOldestXmin(bool allDbs, bool ignoreVacuum, bool alreadyLocked);
extern TransactionId GetOldestActiveTransactionId(void);
extern VirtualTransactionId *GetVirtualXIDsDelayingChkpt(int *nvxids);
--
1.8.2.rc2.4.g7799588.dirty
Andres Freund <andres@2ndquadrant.com> wrote:
0007: Adjust Satisfies* interface: required, mechanical,
Version v5-01 attached
I'm still working on a review and hope to post something more
substantive by this weekend, but when applying patches in numeric
order, this one did not compile cleanly.
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o allpaths.o allpaths.c -MMD -MP -MF .deps/allpaths.Po
vacuumlazy.c: In function ‘heap_page_is_all_visible’:
vacuumlazy.c:1725:3: warning: passing argument 1 of ‘HeapTupleSatisfiesVacuum’ from incompatible pointer type [enabled by default]
In file included from vacuumlazy.c:61:0:
../../../src/include/utils/tqual.h:84:20: note: expected ‘HeapTuple’ but argument is of type ‘HeapTupleHeader’
Could you post a new version of that?
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Kevin!
On 2013-06-20 15:57:07 -0700, Kevin Grittner wrote:
Andres Freund <andres@2ndquadrant.com> wrote:
0007: Adjust Satisfies* interface: required, mechanical,
Version v5-01 attached
I'm still working on a review and hope to post something more
substantive by this weekend
Cool!
, but when applying patches in numeric
order, this one did not compile cleanly.gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o allpaths.o allpaths.c -MMD -MP -MF .deps/allpaths.Po
vacuumlazy.c: In function ‘heap_page_is_all_visible’:
vacuumlazy.c:1725:3: warning: passing argument 1 of ‘HeapTupleSatisfiesVacuum’ from incompatible pointer type [enabled by default]
In file included from vacuumlazy.c:61:0:
../../../src/include/utils/tqual.h:84:20: note: expected ‘HeapTuple’ but argument is of type ‘HeapTupleHeader’Could you post a new version of that?
Hrmpf. There was one hunk in 0013 instead of 0007.
I made sure that every commit again applies and compiles cleanly. git
rebase -i --exec to the rescue.
Found two other issues:
* recptr not assigned in 0010
* unsafe use of non-volatile variable across longjmp() 0013
Pushed and attached.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0013-wal_decoding-Introduce-wal-decoding-via-catalog-time.patch.gzapplication/x-patch-gzipDownload
����Q 0013-wal_decoding-Introduce-wal-decoding-via-catalog-time.patch �\{w�6��[�h��T�eY/[����q�D7��c�M�vsx ���H��h��~g ���4�����y�0��&Vl8�9B����;�8}.�Ao.����l��a�;���`�������u���9���^����6���wB�7��}��������C���������Kq���h�n��O�}��z����qo�v;��~��+ly�~�<���c��~w���q�r�8��8fc_�����ig�.g6��u&���!�^�>]�s
Q�"q+B��K ���;��P�#�Y�d+����k/Y(x���R`�G��n�Gl�*�,��c��9�!�h
����%�"B����������������G�$�\\2�q@�is���{0��|��9�
%�;����z��cG���
hs�#2 ��-yS��Y���rK��b6bw�\����f����l8��g�����V�����z�C��m�(��%���b9��F����`/����+��y�1��b����Skr�v|z2���.'�a:�8g�P���]P�t=���5t#)|�������H5 ��bzb.�;�>W
�X{�T4yH� +w�����6��za��u�F�� ve�C���'W���"[i%��I�����Pae�Ah��mJ
�8Z�LHI �"�H*[��������b��"�������$����<�
^�!8fp�w���'�u<������2o��:<�G�O<�9?k����z0xH�>�0�<��/�&���L�d�������0�� +�����|�d����B�D��'t���`�
4������R�lDr�q�-cI3q@�����h���Z���u��=�� ���e���
,/��������������\x���������]�>���{���_
\��K,�����`&B��q`v���H���>KX,v
R ��i��
D�E�z�#����Kd��b�(my�����O<O�[���l}oo���������6��>F�����j�m�%_�_��v+)�������4 ������_��D�c_����6���������$��m4\- �F6����e�
�&��`����
���V��.X/�Yr�Fq
�=x`I�H��=I
�V�v��: ����n��!�U�$|�4z�����v��W"-�!�.xp� �{�CT�k,�8�!���
#�@ ��X<B��<�-����i��S����rUq�<%��9f>V���4h&���7b�B��JIYJ�yJhMK�VEI�(��@yp�b���=!��4*��c1�}��#�?����sX����`0����{�_b��`>�~���fV����j?�sX6�i �a���<N
i^'���.Z U�l�����{��y�b�]���}�Q7S"��a���F�=* �%�$x���O��A!�{���j�e�n�YR����"/v����HW.$�ElWn�-6R��;���6��5�9Qv7�;C2f�����*fH��q����w��ih]w
�L�m�OUR���� �B�*|���GB�����,��+����j��Q2h�Tg�2YVM����e)�6
����A�+3(�CZ�U#��R�t������DYH�)����S"������A%����H�/-L6���Vi
��c�&�I��G
�$U�l�4M�U�jk��m��rFIE���M�9�~��f�4��'k2�f�$e�I���;���! ��A����7Y�������y�MKfo(�S�k�kH#�*6Z
H*�>B�yPE
f#����fR��6���yR�(�G�^�,(O<6�~���bb6����Z�R8��a��0��ZM��<]����h2^aT����z��._����p�<j{n$�(WG�x�kXN��:�s�H�Th�6[Xf���j�k��
A
,�
�����`k��M��q���F��������G�o�,�|Q�V��Jg��"-����-$����$��tO��(�\�;�|����d|�r�l��u��0{�
;�v�:���H�����CV�~��f{��dUx�|���tm��Y3�S_��w���o���:�����?����j�gX_��,�-�g�x�RO���Q�KN�t��&��C��,������kfWg�����~|��zwvr����9�@�H.���l�#���@�r�s:��=c����9��F"7bo���GSfG�g��n��T��vZG�C�<�������X��"�2]�-����K���H�`�����saHy,���l7�s��r� �����=a��
��I�V��Y$�8t��5����Y���M�}yf������H�T��s�Tp��l�j.J��^��i%��l��H(�$=g5�*�b��x�6_<���*y�K��h'�;��m;������9���L6�o����A7s����w�g�g�{��#a)�� U��Phe�K���j���h��{�w��l�����/Y�M3��I������-A�Wnh�e <�\���m�����d�����^i3t�f�K���[7r�X,�x�� �1��*�����Ls�[�so�� ���c=��C#\��h�������^<�RV�Z)5�c�\��kp���W���3�7�A��D$�Nd����Uj�IiXQMF}��Z����x�V;#uB���I����&�['H��w���z����L�`���F�M�Y����Ce4���*��i���|��X2��2������ z��p#�b-��:G*�l���HK�'h�|��ld��~n�]��.��<�1|h���w\��{O�L���9��P�����"K��et���H�����d���I���e�����2=���������z�?����9
�r
[�y����"�V�����p�f��&{`�jl�]����>��n��!_�`��d1`�n�(H����9��61�>�(��,k��9ff��t�����0��������]�{���)��c�gH�Er Dc]�a��\����|a��8z�BO�KF��Mt��'�PX��� X�����G�����L�����`UpE':��+1wK<3��n�iuI9��w�v���v���fE�)����5�����.v0�Y���u��A�A�7\��w�L����7�$�����Ab8�7O�k&|��K���E0��rGV���l<o,������5�po���|C�
O&�#���s��#�3'[%����bwz��"��g[#��0q�$�����*�C*c+�R�����.�}�&�9�.p��'��9�C�E,4L+���� [�g��3�>]��t�.��9D��.E����R��<7�%&�h�����2�������5�:G���z��a�7H�q[�X �(��geX�{<?|�}��#
x�tBz���]��iEd!����Lj,?|&>����:<h*1>lmdW����
��4�xK�Ik|~}v5�!0��jxs�"��p��.."m`v�I�^��T��Z
��L�
#�������5&�f{�rv+�����
�=�Fe�c��G����T�5>}��!�b��|�G��$��z`XkxN�������l+Mx�E�����=zf*��U�A �Xm���Y�� ��U5,����2&������A����eH��D��{��4��)B���H���Ej%h�w\����6����D�h%Q�������W��N&��������:���\��f���=9Ilh���3�B�r,l+����LL��n�-:��Z�
2�i�������a�����V��{8���j�n�����>��Kv��!.�8�;��>:������_~X4����+���qB�J�V$Q�2�E��/�%<x�P-��`4�|;�����p2=V��f��a���K;x=�3~�����[�Y
���,'=�MK%����0 B��`���"�T�4�
�������Y���������%�?�e2I4nb���B'E-j$�V�'7�zc����HV�-��;7�w}�� |S��x�/p��
cNa�MUb� S��J�sSg��i0����{�V�Sh��~�iN��9*I�1�K6��^�OO Z�g��/��3b�W������,A��S��F��=�b�9��a��x����
�;3�q&�\L�H9������a(p��Y��a��
x�m��#<Sk�0����G���
@�}_�n��|���.��4'����Kw������3����V���weW�q��]1\qr5�u6�>h�_�hO�R
��������T���Le�����~��
�����L����G�A�
q��6��*��7���=/��f�Q�ZN ��a
��YnZ������� ��� �s�3�9v�9f� S�"@M%R�������~�
���h�� ������8lG���Z��)��a�L7 ������t� y��iKIJz��Jx��\l�������4�~g5!^P�H��1~����%�hE!�����D�,j�`��Ni��H���������z��N3������
A���E@
�?*�z
:]�:��A����F5HmZ\ �{?�f(1z����<S�W��.(��Q���E&HNJ���+d�B��~�C=N��6OU
!�.�������F���%9�HYp+]xl���/M�O�jr���M�t{�.9�a�����G�7!Q��|(�N�e��������G3�^@x$��&,�������l�^.�`:����xtG������b�L�A�F�������4���(����
��oX�El�-!�2�e�d�MK���2�^��<+0��`M�r��X8���~D��3j��+jU���X�N���6:*��dt�#��nD����:j���_U�1 4T���1
R������>����,�F<4�����t$�����Z�/I���)x�w���u�������+ ���sU���_`I� K*N����<]���\�U5������s�lcW�#0�SuV�����L��)p���<������p#�I������)��kx��
�0}���s�zq(�D����*����*O�R�{.���_TdW�/>���>;�7�s!���!��n�hp���R�����2�L)|��w�����.���kU�MJ���n����M���8�=
�Z���U�VX�����er�o�S��JY��^��N����o�����>�?H
�k��dO$�L���2?G�B>Z��Mh�(?�a2���3S[�/��r@����g������)�<�2��7Q��|
n0�@%T��i��u����u����� ���XQNLj�%%�,�u ��"�c M������j�F�|k��b��z���d�E+���lr6=K�Y��%^9�>o�R��gv.#����<��&����[� � �t��-���aw|��e�)%�Zz�*s��� �Z���>�3O����Y4@YG������V1�uj��r)�)��w`f����V�zK�2j�S����l;�nZ��s����A{)��_9K�����v(!��� ��n�\�"��1(b8��:����rH�v �h\? �P�����%�����N���8Z*fK�^��r�;hp�p&���m��9���'���5�HtH�����K)&}�����g�����������6�$o�>E�]�����i{r7oc`�=��G��
��n��L���~���������;���UYy������E����{�:���;Q�
U�Y�.ukKU���������������\�'� r�7$b.�_� kgf��*~�xc�%����O5�;�#�D������~<HG�n(�s:����^�H��+�#��\C�:t7���4Q��$Q�`<Y��J�H�&���J]�`f}p=�o�<r�a�=�y�>(q������w��k!��@��DC����4�p��,����!��u��0�BO^�e��3t3�"Iw��s����z|g�-~MR�sQ/XQaME!���w��{�R���0XU��<G�j==<�Z'�����Z������v���eP��[;��v�ZI����0o/^4�6��A�(�D��"W�*_E{��Z@����=���-��kK{Q�|C,r������W�@m��*��%�����L��!�9
��Z}��X�c%l������|-������b(�4����}�;����?��"���/�>nlY�?+���� Fg5C���!�����`�!�Z7�0'��������e�=����J����TR,-H�G����O8���8G �r�������e<�f���2�Xu_t�7�<�����>b����5��nL���Wn[�W���6{���i����0�5g��Q^��,�7]w��t�}|K.6�~BR��F���G���M���(xx��l5�h>���L��n����x�I�/�kQ��R_/aS�#����N$�����r�[a]z�}w��.l�&a����P�3�f�?�8W�h����N���suy�����3R �[��6bo�����k:d�0����g�����A���3@���hr��5���HmJ�:���i�D�1�e�����T��$�l;�N��A�0C1�<>Gm`%q���)���f����m�<����VnL%_�
y��O������RDe�d���A������3���
^�3�G�i��f'����r���D������1���:#.�/�N_�y�����N��d5�#a���fE|��p��%��P�=g\@��l3u%��"�g'���.����jICD6^3P�6QG{�G���l����.���f0� �_2!`3X�������d�y����;K��y��o84�[���5��P��������l�('u���[ ��D_>}�j��h���gO[d�7���2H��R�R�>8P����P1���g_��$���g���_����1�,��%����3f��^�9��0��z��lU�k�����?H�1t����H%'zJ�*���7�D8���o`��+��"�#�����SP����5��(�df"�@�l3�~O_<���/_>Q������fl^u�d�k�������pnSc�#' ��a}O��O�T���U�� vf�}6Tv�j�Ub����Z�>s���3�p�K�X�d�V��������r�������|��>���66�/6�B��.
3"���>��� ��T~�~ n���C �.���/gt����=�X/��B�_8��o�(0T�T�K2�(f
���L��]/��,�xW���t���rThy���;&�jR�uC��8��X���������k����}�X7m#�tL��=�����k�N">�_�:o����Z�9���n�n&��~�M���J<W\�E#�# ��j[?_q���5D����L�C�B�7��MH'&=G.�J}X[* I��"r�b47�w�����p��/�UV(��](�w�^,�QlO��w ;�����������
���2��d�tG�[WwV��A�Tl�4�h���P��.��R��u8�3|(��y�b'<��9������;�y�p�5������<.9A�D>�<�g����K���G������I�m��:�5�?�����]�����a�
�+�Y��|���xq�j�Q
c�At�`��4�Lq���W�3���B�2�Q�Z,����7���]����9������E������[�M�LDe����q�CS�[C�*Bs%c8��(NP�D��Y���'�2~E_��XK�s�������z���/�.(����i�P\DDG
��d�t,�z�������4�Q��4�����q��}��k�@���w������!��WwG��Z���E�C����{)���wD�|��������ns�d�9��A��<"'�Ucb�Gm��{6�qM�Ya ��lY��
������A�t��=�}`����\��>���M.0$�����2�?b�Z��_�w���=z6R$w�-h���XW}�-\b���|a�]��� 1��a���w���_�$9W��Q~z�� ��_wx�&��r����-�d��#�vS�`"-j p����c7A���w�����1�.V$��1��89E���ar�K�f�H��`�b*N�-���pi{@p(gX�,��Z�Y���+"���nB�&��7w%��M�X�����X
>�#�i|������H� �����/T ��^/T�����D�R��u���^$��U����,�#�d�m�6[��� �&�x��{����{�J����L������~�q�����k��Q��9�!��T���5�_S�y4�wd-��.J����_�������\2p-H��U�~0+�<��������W� �J�ko�z����/����JQ�{Qj��n�����6�F��<%`�\5���ia���8���-���.!�;��x�
4x�������-�p/\�����X�eP�'5����d
g7q%����a��+J�+��6��C���� ��&Y���(��;
x2�����W�����
'4SBi�����
*�!v�#Y��1�������o���9�[�
����({`kb���� �L�kx�Y�z$�r���lz/4�A�e�qT� _�; X�]J��.��H���L��Z�#���+7KK_�eM���-���/�i�������9N��HP�nzD�g�H�l)���N�
����}���,Sv���^��r��n�:"�vL�3p�u<����}:|&����S����x^�<������/G2~�A�h�^l>����ti&��+������������?���?���>���������N/�������Gl�^�a��e�������JJ�
T#X%��%����7�$*_d��
��n\����.�W���@�����Z����b�$���d<����x�����N���m�"��UWFy>����O^��/y�^<������S�b���s��AS��W��
DY�PT9%>E�����WJ�/�m"�L������Jc`*/KV�MP� ;E�jD+�c�h���(��,B_aE�oE��x_��
Z
a�,P�.� ��dH6T���xM�)���(��i�I�3���nA�;�6;�x�C$���w��������Uf �{�c����+P���
����;��>������B���pB����W��XEp���U����?���[���^p
i�t�y��
����}[��z���/x��
������u�#B-m��j*_^�-�5��0s�M
��'�a��.����<�����(/���|���m��0�@�^�Q�g��(c>�P�d�Y)Y��ut��@�&N����Kj��A������pn����=�o>����O�<� [4 s�PK�����vn8F��=�
X���2��U*m��b������zZ��3�c��{oh;������)��y����������������>��������ti-@se'��F�[e�B���(|�t67��f��E�����|9
�z���%�2��� �K�
����2���IF��^D� �*� �{A��YN������3eI����O���,��\EZ��<�d�������/6�>m6C�'���k���t �Y!`�
!��7 ��V�3X9L���Z ���1P�\-���Lo������N���\��K4��j�u��F|��W�.�E�JO�K #u�M�[�����K�F,��pcM����0�\�����)wX�Q�e�<9�h�|i��,��������6�b����#�S��$�6M���s���KX�y�����;��G`���o(��V�' G�4gSJ��@J8F���PN���������/����/e�"PP�E�����.:��n\$3�E��R�*L��^<�T��a����D�t��>z�QDE��$�f�-���D�k�Q$"���g�p��;�m��WA�r��}m�X�t�Y�
O�q��oC�M=2>MK�v�����H�W�
��].�l;B��W 4�<�f*����+{��Lo�?0K��!�r���J��p�����C�_gK)�.���E��������O�����O�n=��p��f�S���^0@+k�� �0���<��e;� ���o�K][�{`�t���}�?��w������������\�+��1��xwNv��k�/HI"'����z�::k�������� hyEDV����y5��>������YK,'���z8�A�l���d�����^��6��'������T����������>�k���������m�����4 MV;�o��t����k�vF����(�'}+2��/�/��r~n������d|,i"�7������h2 ������d���/
� �,K/�%Qa��$f��n})Sg���`x�:����9�I�h�!��������N!��7���8�:��x|r�K�J�;�?�����~��;�[9&WIm�Z�?�`_����J�02���]���r`�d�����6RhYR7�����f�(.#�i��������:/'���MR���$�h=&p?�Al2���_��D��+�����^�\��� �A��8��'I6��;��� ��2.�a���Z��6Bo8@M$���������]���8���#/,�^P��M��u1 =y�����g��([������:;�f��)4A$J{��z��m%2���2b`�9�J(���jb�����`�(�B���F7����+�((��^����)��,���3'�����d2�����2��."���L���������������8z B������<}�YeY����7�����}����}���KR�l��������5� V1yj�;����$�#U��<Bf*��M�4��.�����fR��Ub��8�+�R� �����h-����������V3�x�<�~0I:WR �������9%=�A_#l� &i,�'G�����oq'���:3���/q��?z����
Gk�2��GK������2y.���iF���1��5�������c���|H��y�F���}%!�"dD{�7d�=�-�@R�r^�����!V*�'�����+L������s�c�c��,��]���1�/t�����q��<�QNZ�D�D6������',�,�9���@����T�H�(K�i�E�
Z,]%���l�i�a �hy>!��L�b��D�@�!��`�p���G@����$�}��N����,xi��Ricn�����}.]_��O�#���z�%0������@J��/RU^���2��gIeTF���g��q�?���<�m���2[*�rt�P�Y�mFp���dP��d:��N��k<z: �����+�Ao��j����w��^tK^�����n�����x��<:�����������y42j����_3J9 1����'��t��lC��msO����a#&��;c� c� qb�,�$�2�����&M.\��-iD��;�5��x���������`N���y��#1��.��H~3m��aJ������1H� �$��y
����u�r�=�����R-�����|��\�}��d����t�)'��X�����)JP(�R� I�]W(�c���p.���0��%�Yg�� &/��j�`�~j(��ED�0������������p�Y���0� �\,��H@���0�TI����i�d��5��j
�5M��Wo[�w����
w~A��i�0��*��e�[��e����^h�%�D���������rR<���v�j����������b
�S�X�lC�_�?�!E`�8�8�\�0�{~Hy�{6��H�2�b��F�����0���zhr�!_���O �]_r�����x����%����D��r�;�;f�D�����a�g��>�A`(�Y�ggC8U��l�������u?�i`�j`��)��n�Q�4o��N����7�h"%�3���9��qOC����[�ox�6���k�����|��<�t��7*�����CKp�x�"0�I�E�5��i�#�K�����]#���^�H,������rd1q=%t>� V[��C����b���I�(��VA���,�o�(�q�,8�q��#g1W����Z=�/u��,c,e�x�1�����3�$H����1��Pg���6������3��2f�:�;|�m�b9�h0���C ��M���_��
3���#�������1{O<~�q&�� t����z��%�A��� G����l�Jg�G��Q1up�N8����G��N4I:�Y@��j�f\Z����tI��c���o�������V�K�\g���^�>���b�������L6�]t���f���yG�(9�ln1��+��u��9kE�o}�`�Q�k;*R�s�L7
j����&�]P1i���s����=�6G��#�$$��(!m��
����������4��p�������������+�q�EL���c�v��c�DM'��l�e���7c�L��ic�A��qE��i���f�7�{<����a��r�d��l=����;����|�����f�y������X�"@��B,���d���\������XX�GP����B�����+�������(:���������9�g5��
�d���t�~������O��:�5�L�}f�������g��z��y���������C8
@��i:Ni~�m��k��F�L#��/q�+�]S��5I���e
q=R�o6_t^>}�l�A\��
�����^��9�x���[Io���������W��r���E'�o:2Fg�B��=�����}�$F?�$mX�6����t��I�zT� ���-R������X2kD�Q?�A��LF�w���l��l2��S+��Pg������y��}��E���[&PE�B$%?~JF�I���I:oO8L���f�[�%bk��>����5��u;���@�����;�e����'�� �����R>�P�C������������z���l�,�!�?Z�zP�4�����?A(���lO����;���������'��Vl%��x2��ej� 'S����#P�+����,��c�-�V�"�]�ED����U���s'|�������n��"�8m�e��R�����W�l��n��#,�pL�F(t�I���}C���� �/`c
�w�KMn��/$�TTa�������l��3����r����3g=s�k�
5��������om�����b<�G2��6�������/E���II
!�`;�.{����������h<�N;�w=RR�A��58��H�u$0"4������Q7?)�\� T����$$�H��b�w���;@��9�"tq\�.'�BU�����l ��X���y�$��>n6��=�H�
��p-3Y���{5re~���g�@���2��������P�C���}������e%�%������[��|�yhW�S��?�@:�O�GP.�_DJaV&E���e�����W%�W�z���)�I�����A��������0�v�������9�8;W`��aU���m�����JJ����������g����/��WY=�
VV����d(����x�"��eG���/��*���d<�g����;;S���&n�m��^�ms��6�9@�W�����/��]������FX��7m��/���M��~/�����~�"�.�$X�p�.g�q��!��V������P�3L�l9g��?��]C $�P����������Cq(d �[���M�F���h���/��:�nJ^��;9�D�'�!�&lS��[W:2v!c��m����
S���j�pko�8[�<�����B
���� ���4 �u��s��e��������
���A�~���&>P3-� j��,� �������?�y�y�h�]�=t�M'�9ND
���p�>�����
�����&�;a��� �%BE�)mu����f���l>N�*���J�)X����b�@%m�Lv�����x�z���{
����z��5}�p���9�E�2��q���:���� �NMG�����u�k_�AjF���7�h�=��vo�����O=����8����pB|�3pR|s��o� E��1������|�)��gY�E�Q��oH��U�v���|��?y���^�&_�eq�����hl���H[k��?��?��������)���R��^��l����f��z�$������M�����fv��oL�|�������S��b:��7/)�]h�Ay�w����z���t�����%� ���#�
���|7�7�B;Cz������3����'/�w��!gg<}�������HDLo�#��=���
�$PlCg-��"o#�C|��!W�yh!�v|�#����L%����� Ic�%�k�Gg�W�_����@3X�%|uv�F��<�\��YQ��X2���Uu�1b�+P&
�jO�r������ ; �����E�?����v���1\�:�h���g��m>n��3v��-G{(�
G�4��x8���I��K�N�TDx@e(���������V�S�B��T.���+������$c��E?����p���fQ�Z����`A��� ��J��EtWmh�b�
�\Y��<wKJ�L����~
w\�e���������,������`���/b���9G�� CG��\�}e�����x�a���|K�G����owOn�E`�u&7j���������.G!�I�KR�7B*
�F�8MU�ZE�_dj�5����\��*���&����K�����$&��
;�{��~��M��^
wg#�f�K������?��Y��y��pH{]����`�w�<�`�0\^�I��.��"F�<��?D����BOo��9���(�&��'7X)�����<