logical changeset generation v3

Started by Andres Freundover 13 years ago133 messageshackers
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

In response to this you will soon find the 14 patches that currently
implement $subject. I'll go over each one after showing off for a bit:

Start postgres:

Start postgres instance (with pg_hba.conf allowing replication cons):

$ postgres -D ~/tmp/pgdev-lcr \
-c wal_level=logical \
-c max_wal_senders=10 \
-c max_logical_slots=10 \
-c wal_keep_segments=100 \
-c log_line_prefix="[%p %x] "

Start the changelog receiver:
$ pg_receivellog -h /tmp -f /dev/stderr -d postgres -v

Generate changes:
$ psql -h /tmp postgres <<EOF

DROP TABLE IF EXISTS replication_example;

CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));

-- plain insert
INSERT INTO replication_example(somedata, text) VALUES (1, 1);

-- plain update
UPDATE replication_example SET somedata = - somedata WHERE id = (SELECT currval('replication_example_id_seq'));

-- plain delete
DELETE FROM replication_example WHERE id = (SELECT currval('replication_example_id_seq'));

-- wrapped in a transaction
BEGIN;
INSERT INTO replication_example(somedata, text) VALUES (1, 1);
UPDATE replication_example SET somedate = - somedata WHERE id = (SELECT currval('replication_example_id_seq'));
DELETE FROM replication_example WHERE id = (SELECT currval('replication_example_id_seq'));
COMMIT;

-- dont write out aborted data
BEGIN;
INSERT INTO replication_example(somedata, text) VALUES (2, 1);
UPDATE replication_example SET somedate = - somedata WHERE id = (SELECT currval('replication_example_id_seq'));
DELETE FROM replication_example WHERE id = (SELECT currval('replication_example_id_seq'));
ROLLBACK;

-- add a column
BEGIN;
INSERT INTO replication_example(somedata, text) VALUES (3, 1);
ALTER TABLE replication_example ADD COLUMN bar int;
INSERT INTO replication_example(somedata, text, bar) VALUES (3, 1, 1);
COMMIT;

-- once more outside
INSERT INTO replication_example(somedata, text, bar) VALUES (4, 1, 1);

-- DDL with table rewrite
BEGIN;
INSERT INTO replication_example(somedata, text) VALUES (5, 1);
ALTER TABLE replication_example RENAME COLUMN text TO somenum;
INSERT INTO replication_example(somedata, somenum) VALUES (5, 2);
ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4);
INSERT INTO replication_example(somedata, somenum) VALUES (5, 3);
COMMIT;

EOF

And the results printed by llog:

BEGIN 16556826
COMMIT 16556826
BEGIN 16556827
table "replication_example_id_seq": INSERT: sequence_name[name]:replication_example_id_seq last_value[int8]:1 start_value[int8]:1 increment_by[int8]:1 max_value[int8]:9223372036854775807 min_value[int8]:1 cache_value[int8]:1 log_cnt[int8]:0 is_cycled[bool]:f is_called[bool]:f
COMMIT 16556827
BEGIN 16556828
table "replication_example": INSERT: id[int4]:1 somedata[int4]:1 text[varchar]:1
COMMIT 16556828
BEGIN 16556829
table "replication_example": UPDATE: id[int4]:1 somedata[int4]:-1 text[varchar]:1
COMMIT 16556829
BEGIN 16556830
table "replication_example": DELETE (pkey): id[int4]:1
COMMIT 16556830
BEGIN 16556833
table "replication_example": INSERT: id[int4]:4 somedata[int4]:3 text[varchar]:1
table "replication_example": INSERT: id[int4]:5 somedata[int4]:3 text[varchar]:1 bar[int4]:1
COMMIT 16556833
BEGIN 16556834
table "replication_example": INSERT: id[int4]:6 somedata[int4]:4 text[varchar]:1 bar[int4]:1
COMMIT 16556834
BEGIN 16556835
table "replication_example": INSERT: id[int4]:7 somedata[int4]:5 text[varchar]:1 bar[int4]:(null)
table "replication_example": INSERT: id[int4]:8 somedata[int4]:5 somenum[varchar]:2 bar[int4]:(null)
table "pg_temp_74943": INSERT: id[int4]:4 somedata[int4]:3 somenum[int4]:1 bar[int4]:(null)
table "pg_temp_74943": INSERT: id[int4]:5 somedata[int4]:3 somenum[int4]:1 bar[int4]:1
table "pg_temp_74943": INSERT: id[int4]:6 somedata[int4]:4 somenum[int4]:1 bar[int4]:1
table "pg_temp_74943": INSERT: id[int4]:7 somedata[int4]:5 somenum[int4]:1 bar[int4]:(null)
table "pg_temp_74943": INSERT: id[int4]:8 somedata[int4]:5 somenum[int4]:2 bar[int4]:(null)
table "replication_example": INSERT: id[int4]:9 somedata[int4]:5 somenum[int4]:3 bar[int4]:(null)
COMMIT 16556835

As you can see above we can decode WAL in the presence of nearly all
forms of DDL. The plugin that outputted these changes is supposed to be
added to contrib and is fairly small and uncomplicated.

An interesting piece of information might be that in the very
preliminary benchmarking I have done on this even the textual decoding
could keep up with a full tilt pgbench -c16 -j16 -M prepared on my
(somewhat larger) workstation. The wal space overhead was less than 1%
between two freshly initdb'ed clusters, comparing
wal_level=hot_standby with =logical.
With a custom pgbench script I can saturate the decoding to the effect
that it lags a second or so, but once I write out the data in a binary
format it can keep up again.
The biggest overhead is currently the more slowly increasing
Global/RecentXmin, but that can be greatly improved by logging
xl_running_xact's more than just every checkpoint.

A short overview over the patches in this series:

* Add minimal binary heap implementation
Abhijit submitted a nicer version of this, the plan is to rebase ontop
of that once people are happy with the interface.
(unchanged)

* Add support for a generic wal reading facility dubbed XLogReader
There's some discussion about whats the best way to implement this in a
separate CF topic.
(unchanged)

* Add simple xlogdump tool
Very nice for debugging, couldn't have developed this without. Obviously
not a prerequisite for comitting this feature but still pretty worthy.
(quite a bit updated, still bad build infrastructure)

* Add a new RELFILENODE syscache to fetch a pg_class entry via
(reltablespace, relfilenode)
Relatively simple, somewhat contentious due to some uniqueness
issues. Would very much welcome input from somebody with syscache
experience on this. It was previously suggested to write something like
attoptcache.c for this, but to me that seems to be code-duplication. We
can go that route though.
(unchanged)

* Add a new relmapper.c function RelationMapFilenodeToOid that acts as a
reverse of RelationMapOidToFilenode
Simple. I don't even think its contentious... Just wasn't needed before.
(unchanged)

* Add a new function pg_relation_by_filenode to lookup up a relation
given the tablespace and the filenode OIDs
Just a nice to have thing for debugging, not a prerequisite for the
feature.
(unchanged)

* Introduce InvalidCommandId and declare that to be the new maximum for
CommandCounterIncrement
Uncomplicated and I hope uncontentious.
(new)

*Store the number of subtransactions in xl_running_xacts separately from
toplevel xids
Increases the size of xl_running_xacts by 4bytes in the worst case,
decreases it in some others. Improves the efficiency of some HS
operations.
Should be ok?
(new)

* Adjust all *Satisfies routines to take a HeapTuple instead of a
HeapTupleHeader
Not sure if people will complain about this? Its rather simple due to
the fact that the HeapTupleSatisfiesVisibility wrapper already took a
HeapTupleHeader as parameter.
(new)

* Allow walsender's to connect to a specific database
This has been requested by others. I think we need to work on the
external interface a bit, should be ok otherwise.
(new)

* Introduce wal decoding via catalog timetravel
This is the meat of the feature. I think this is going in a good
direction, still needs some work, but architectural review can really
start now. (more later)
(heavily changed)

* Add a simple decoding module in contrib named 'test_decoding'
The much requested example contrib module.
(new)

* Introduce pg_receivellog, the pg_receivexlog equivalent for logical
changes
Debugging tool to receive changes and write them to a file. Needs some
more options and probably shouldn't live inside pg_basebackup's
directory.
(new)

* design document v2.3 and snapshot building design doc v0.2
(unchanged)

There remains quite a bit to be done but I think the state of the patch
has improved quite a bit. The biggest thing now is to get input about
the user facing parts so we can get some aggreement there.

Todo:
* testing infrastructure (isolationtester)
* persistence/spilling to disk of built snapshots, longrunning
transactions
* user docs
* more frequent lowering of xmins
* more docs about the internals
* support for user declared catalog tables
* actual exporting of initial pg_export snapshots after
INIT_LOGICAL_REPLICATION
* own shared memory segment instead of piggybacking on walsender's
* nicer interface between snapbuild.c, reorderbuffer.c, decode.c and the
outside.
* more frequent xl_running_xid's so xmin can be upped more frequently

Please comment!

Happy and tired,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#2Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 01/14] Add minimal binary heap implementation

Will be replaces by the "binaryheap.[ch]" from Abhijit once its been reviewed.
---
src/backend/lib/Makefile | 3 +-
src/backend/lib/simpleheap.c | 255 +++++++++++++++++++++++++++++++++++++++++++
src/include/lib/simpleheap.h | 91 +++++++++++++++
3 files changed, 348 insertions(+), 1 deletion(-)
create mode 100644 src/backend/lib/simpleheap.c
create mode 100644 src/include/lib/simpleheap.h

Attachments:

0001-Add-minimal-binary-heap-implementation.patchtext/x-patch; name=0001-Add-minimal-binary-heap-implementation.patchDownload+348-1
#3Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader

Features:
- streaming reading/writing
- filtering
- reassembly of records

Reusing the ReadRecord infrastructure in situations where the code that wants
to do so is not tightly integrated into xlog.c is rather hard and would require
changes to rather integral parts of the recovery code which doesn't seem to be
a good idea.

Missing:
- "compressing" the stream when removing uninteresting records
- writing out correct CRCs
- separating reader/writer
---
src/backend/access/transam/Makefile | 2 +-
src/backend/access/transam/xlogreader.c | 1032 +++++++++++++++++++++++++++++++
src/include/access/xlogreader.h | 264 ++++++++
3 files changed, 1297 insertions(+), 1 deletion(-)
create mode 100644 src/backend/access/transam/xlogreader.c
create mode 100644 src/include/access/xlogreader.h

Attachments:

0002-Add-support-for-a-generic-wal-reading-facility-dubbe.patchtext/x-patch; name=0002-Add-support-for-a-generic-wal-reading-facility-dubbe.patchDownload+1297-1
#4Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 03/14] Add simple xlogdump tool

---
src/bin/Makefile | 2 +-
src/bin/xlogdump/Makefile | 25 +++
src/bin/xlogdump/xlogdump.c | 468 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 494 insertions(+), 1 deletion(-)
create mode 100644 src/bin/xlogdump/Makefile
create mode 100644 src/bin/xlogdump/xlogdump.c

Attachments:

0003-Add-simple-xlogdump-tool.patchtext/x-patch; name=0003-Add-simple-xlogdump-tool.patchDownload+494-1
#5Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 04/14] Add a new RELFILENODE syscache to fetch a pg_class entry via (reltablespace, relfilenode)

This cache is somewhat problematic because formally indexes used by syscaches
needs to be unique, this one is not. This is "just" because of 0/InvalidOids
stored in pg_class.relfilenode for nailed/shared catalog relations. The
syscache will never be queried for InvalidOid relfilenodes however so it seems
to be safe even if it violates the rules somewhat.

It might be nicer to add infrastructure to do this properly, like using a
partial index, its not clear what the best way to do this is though.

Needs a CATVERSION bump.
---
src/backend/utils/cache/syscache.c | 11 +++++++++++
src/include/catalog/indexing.h | 2 ++
src/include/catalog/pg_proc.h | 1 +
src/include/utils/syscache.h | 1 +
4 files changed, 15 insertions(+)

Attachments:

0004-Add-a-new-RELFILENODE-syscache-to-fetch-a-pg_class-e.patchtext/x-patch; name=0004-Add-a-new-RELFILENODE-syscache-to-fetch-a-pg_class-e.patchDownload+15-0
#6Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 05/14] Add a new relmapper.c function RelationMapFilenodeToOid that acts as a reverse of RelationMapOidToFilenode

---
src/backend/utils/cache/relmapper.c | 53 +++++++++++++++++++++++++++++++++++++
src/include/catalog/indexing.h | 4 +--
src/include/utils/relmapper.h | 2 ++
3 files changed, 57 insertions(+), 2 deletions(-)

Attachments:

0005-Add-a-new-relmapper.c-function-RelationMapFilenodeTo.patchtext/x-patch; name=0005-Add-a-new-relmapper.c-function-RelationMapFilenodeTo.patchDownload+57-2
#7Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 06/14] Add a new function pg_relation_by_filenode to lookup up a relation given the tablespace and the filenode OIDs

This requires the previously added RELFILENODE syscache and the added
RelationMapFilenodeToOid function added in previous commits.
---
doc/src/sgml/func.sgml | 23 +++++++++++-
src/backend/utils/adt/dbsize.c | 79 ++++++++++++++++++++++++++++++++++++++++++
src/include/catalog/pg_proc.h | 2 ++
src/include/utils/builtins.h | 1 +
4 files changed, 104 insertions(+), 1 deletion(-)

Attachments:

0006-Add-a-new-function-pg_relation_by_filenode-to-lookup.patchtext/x-patch; name=0006-Add-a-new-function-pg_relation_by_filenode-to-lookup.patchDownload+104-1
#8Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 07/14] Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement

This is useful to be able to represent a CommandId thats invalid. There was no
such value before.

This decreases the possible number of subtransactions by one which seems
unproblematic. Its also not a problem for pg_upgrade because cmin/cmax are
never looked at outside the context of their own transaction (spare timetravel
access, but thats new anyway).
---
src/backend/access/transam/xact.c | 4 ++--
src/include/c.h | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)

Attachments:

0007-Introduce-InvalidCommandId-and-declare-that-to-be-th.patchtext/x-patch; name=0007-Introduce-InvalidCommandId-and-declare-that-to-be-th.patchDownload+3-2
#9Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 08/14] Store the number of subtransactions in xl_running_xacts separately from toplevel xids

To avoid complicating logic we store both, the toplevel and the subxids, in
->xip, first ->xcnt toplevel ones, and then ->subxcnt subxids.
Also skip logging any subxids if the snapshot is suboverflowed, they aren't
useful in that case anyway.

This allows to make some operations cheaper and it allows faster startup for
the future logical decoding feature because that doesn't care about
subtransactions/suboverflow'edness.
---
src/backend/access/transam/xlog.c | 2 ++
src/backend/storage/ipc/procarray.c | 65 ++++++++++++++++++++++++-------------
src/backend/storage/ipc/standby.c | 8 +++--
src/include/storage/standby.h | 2 ++
4 files changed, 52 insertions(+), 25 deletions(-)

Attachments:

0008-Store-the-number-of-subtransactions-in-xl_running_xa.patchtext/x-patch; name=0008-Store-the-number-of-subtransactions-in-xl_running_xa.patchDownload+52-25
#10Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 09/14] Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader

For the regular satisfies routines this is needed in prepareation of logical
decoding. I changed the non-regular ones for consistency as well.

The naming between htup, tuple and similar is rather confused, I could not find
any consistent naming anywhere.

This is preparatory work for the logical decoding feature which needs to be
able to get to a valid relfilenode from when checking the visibility of a
tuple.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/heap/heapam.c | 13 ++++++----
src/backend/access/heap/pruneheap.c | 16 ++++++++++--
src/backend/catalog/index.c | 2 +-
src/backend/commands/analyze.c | 3 ++-
src/backend/commands/cluster.c | 2 +-
src/backend/commands/vacuumlazy.c | 3 ++-
src/backend/storage/lmgr/predicate.c | 2 +-
src/backend/utils/time/tqual.c | 50 +++++++++++++++++++++++++++++-------
src/include/utils/snapshot.h | 4 +--
src/include/utils/tqual.h | 20 +++++++--------
11 files changed, 83 insertions(+), 34 deletions(-)

Attachments:

0009-Adjust-all-Satisfies-routines-to-take-a-HeapTuple-in.patchtext/x-patch; name=0009-Adjust-all-Satisfies-routines-to-take-a-HeapTuple-in.patchDownload+83-34
#11Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 10/14] Allow walsender's to connect to a specific database

Currently the decision whether to connect to a database or not is made by
checking whether the passed "dbname" parameter is "replication". Unfortunately
this makes it impossible to connect a to a database named replication...

This is useful for future walsender commands which need database interaction.
---
src/backend/postmaster/postmaster.c | 7 ++++--
.../libpqwalreceiver/libpqwalreceiver.c | 4 ++--
src/backend/replication/walsender.c | 27 ++++++++++++++++++----
src/backend/utils/init/postinit.c | 5 ++++
src/bin/pg_basebackup/pg_basebackup.c | 4 ++--
src/bin/pg_basebackup/pg_receivexlog.c | 4 ++--
src/bin/pg_basebackup/receivelog.c | 4 ++--
7 files changed, 41 insertions(+), 14 deletions(-)

Attachments:

0010-Allow-walsender-s-to-connect-to-a-specific-database.patchtext/x-patch; name=0010-Allow-walsender-s-to-connect-to-a-specific-database.patchDownload+41-14
#12Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 11/14] Introduce wal decoding via catalog timetravel

This introduces several things:
* 'reorderbuffer' module which reassembles transactions from a stream of interspersed changes
* 'snapbuilder' which builds catalog snapshots so that tuples from wal can be understood
* logging more data into wal to facilitate logical decoding
* wal decoding into an reorderbuffer
* shared library output plugins with 5 callbacks
* init
* begin
* change
* commit
* walsender infrastructur to stream out changes and to keep the global xmin low enough
* INIT_LOGICAL_REPLICATION $plugin; waits till a consistent snapshot is built and returns
* initial LSN
* replication slot identifier
* id of a pg_export() style snapshot
* START_LOGICAL_REPLICATION $id $lsn; streams out changes
* uses named output plugins for output specification

Todo:
* testing infrastructure (isolationtester)
* persistence/spilling to disk of built snapshots, longrunning
transactions
* user docs
* more frequent lowering of xmins
* more docs about the internals
* support for user declared catalog tables
* actual exporting of initial pg_export snapshots after
INIT_LOGICAL_REPLICATION
* own shared memory segment instead of piggybacking on walsender's
* nicer interface between snapbuild.c, reorderbuffer.c, decode.c and the
outside.
* more frequent xl_running_xid's so xmin can be upped more frequently
* add STOP_LOGICAL_REPLICATION $id
---
src/backend/access/heap/heapam.c | 280 +++++-
src/backend/access/transam/xlog.c | 1 +
src/backend/catalog/index.c | 74 ++
src/backend/replication/Makefile | 2 +
src/backend/replication/logical/Makefile | 19 +
src/backend/replication/logical/decode.c | 496 ++++++++++
src/backend/replication/logical/logicalfuncs.c | 247 +++++
src/backend/replication/logical/reorderbuffer.c | 1156 +++++++++++++++++++++++
src/backend/replication/logical/snapbuild.c | 1144 ++++++++++++++++++++++
src/backend/replication/repl_gram.y | 32 +-
src/backend/replication/repl_scanner.l | 2 +
src/backend/replication/walsender.c | 566 ++++++++++-
src/backend/storage/ipc/procarray.c | 23 +
src/backend/storage/ipc/standby.c | 8 +-
src/backend/utils/cache/inval.c | 2 +-
src/backend/utils/cache/relcache.c | 3 +-
src/backend/utils/misc/guc.c | 11 +
src/backend/utils/time/tqual.c | 249 +++++
src/bin/pg_controldata/pg_controldata.c | 2 +
src/include/access/heapam_xlog.h | 23 +
src/include/access/transam.h | 5 +
src/include/access/xlog.h | 3 +-
src/include/catalog/index.h | 4 +
src/include/nodes/nodes.h | 2 +
src/include/nodes/replnodes.h | 22 +
src/include/replication/decode.h | 21 +
src/include/replication/logicalfuncs.h | 44 +
src/include/replication/output_plugin.h | 76 ++
src/include/replication/reorderbuffer.h | 284 ++++++
src/include/replication/snapbuild.h | 128 +++
src/include/replication/walsender.h | 1 +
src/include/replication/walsender_private.h | 34 +-
src/include/storage/itemptr.h | 3 +
src/include/storage/sinval.h | 2 +
src/include/utils/tqual.h | 31 +-
35 files changed, 4966 insertions(+), 34 deletions(-)
create mode 100644 src/backend/replication/logical/Makefile
create mode 100644 src/backend/replication/logical/decode.c
create mode 100644 src/backend/replication/logical/logicalfuncs.c
create mode 100644 src/backend/replication/logical/reorderbuffer.c
create mode 100644 src/backend/replication/logical/snapbuild.c
create mode 100644 src/include/replication/decode.h
create mode 100644 src/include/replication/logicalfuncs.h
create mode 100644 src/include/replication/output_plugin.h
create mode 100644 src/include/replication/reorderbuffer.h
create mode 100644 src/include/replication/snapbuild.h

Attachments:

0011-Introduce-wal-decoding-via-catalog-timetravel.patchtext/x-patch; name=0011-Introduce-wal-decoding-via-catalog-timetravel.patchDownload+4966-34
#13Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 12/14] Add a simple decoding module in contrib named 'test_decoding'

---
contrib/Makefile | 1 +
contrib/test_decoding/Makefile | 16 +++
contrib/test_decoding/test_decoding.c | 192 ++++++++++++++++++++++++++++++++++
3 files changed, 209 insertions(+)
create mode 100644 contrib/test_decoding/Makefile
create mode 100644 contrib/test_decoding/test_decoding.c

Attachments:

0012-Add-a-simple-decoding-module-in-contrib-named-test_d.patchtext/x-patch; name=0012-Add-a-simple-decoding-module-in-contrib-named-test_d.patchDownload+209-0
#14Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 13/14] Introduce pg_receivellog, the pg_receivexlog equivalent for logical changes

---
src/bin/pg_basebackup/Makefile | 7 +-
src/bin/pg_basebackup/pg_receivellog.c | 717 +++++++++++++++++++++++++++++++++
src/bin/pg_basebackup/streamutil.c | 3 +-
src/bin/pg_basebackup/streamutil.h | 1 +
4 files changed, 725 insertions(+), 3 deletions(-)
create mode 100644 src/bin/pg_basebackup/pg_receivellog.c

Attachments:

0013-Introduce-pg_receivellog-the-pg_receivexlog-equivale.patchtext/x-patch; name=0013-Introduce-pg_receivellog-the-pg_receivexlog-equivale.patchDownload+725-3
#15Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
[PATCH 14/14] design document v2.3 and snapshot building design doc v0.2

---
src/backend/replication/logical/DESIGN.txt | 603 +++++++++++++++++++++
src/backend/replication/logical/Makefile | 6 +
.../replication/logical/README.SNAPBUILD.txt | 298 ++++++++++
3 files changed, 907 insertions(+)
create mode 100644 src/backend/replication/logical/DESIGN.txt
create mode 100644 src/backend/replication/logical/README.SNAPBUILD.txt

Attachments:

0014-design-document-v2.3-and-snapshot-building-design-do.patchtext/x-patch; name=0014-design-document-v2.3-and-snapshot-building-design-do.patchDownload+907-0
#16Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: logical changeset generation v3 - git repository

On 2012-11-15 01:27:46 +0100, Andres Freund wrote:

In response to this you will soon find the 14 patches that currently
implement $subject.

As its not very wieldly to send around that many/big patches all the
time, until the next "major" version I will just update the git tree at:

Web:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf3

Git:
git clone git://git.postgresql.org/git/users/andresfreund/postgres.git xlog-decoding-rebasing-cf3

Greetings,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: lcr - walsender integration

Hi,

The current logical walsender integration looks like the following:

=# INIT_LOGICAL_REPLICATION 'text';
WARNING: Initiating logical rep
WARNING: reached consistent point, stopping!
replication_id | consistent_point | snapshot_name | plugin
----------------+------------------+---------------+--------
id-2 | 3/CACBDF98 | 0xDEADBEEF | text
(1 row)

=# START_LOGICAL_REPLICATION 'id-2' 3/CACBDF98;
...

So the current protocol is:
INIT_LOGICAL_REPLICATION '$plugin';
returns
* slot
* first consistent point
* snapshot id

START_LOGICAL_REPLICATION '$slot' $last_received_lsn;

streams changes, each wrapped in a 'w' message with (start, end) set to
the same value. The content of the data is completely free-format and
only depends on the output plugin.

Feedback is provided from the client via the normal 'r' messages.

I think thats not a bad start, but we probably can improve it a bit:

INIT_LOGICAL_REPLICATION '$slot' '$plugin' ($value = $key, ...);
START_LOGICAL_REPLICATION '$slot' $last_received_lsn;
STOP_LOGICAL_REPLICATION '$slot';

The option to INIT_LOGICAL_REPLICATION would then get passed to the
'pg_decode_init' output plugin function (i.e. a function of that name
would get dlsym()'ed using the pg infrastructure for that).

Does that look good to you? Any suggestions?

Greetings,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#18Josh Berkus
josh@agliodbs.com
In reply to: Andres Freund (#1)
Re: logical changeset generation v3

On 11/14/12 4:27 PM, Andres Freund wrote:

Hi,

In response to this you will soon find the 14 patches that currently
implement $subject. I'll go over each one after showing off for a bit:

Lemme be the first to say, "wow". Impressive work.

Now the debugging starts ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#19Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#1)
Re: logical changeset generation v3

Looks like cool stuff @-@
I might be interested in looking at that a bit as I think I will hopefully
be hopefully be able to grab some time in the next couple of weeks.
Are some of those patches already submitted to a CF?
--
Michael Paquier
http://michael.otacoo.com

#20Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#19)
Re: logical changeset generation v3

Hi,

On Thursday, November 15, 2012 05:08:26 AM Michael Paquier wrote:

Looks like cool stuff @-@
I might be interested in looking at that a bit as I think I will hopefully
be hopefully be able to grab some time in the next couple of weeks.
Are some of those patches already submitted to a CF?

I added the patchset as one entry to the CF this time, it seems to me they are
too hard to judge individually to make them really separately reviewable.

I can split it off there, but really all the complicated stuff is in one patch
anyway...

Greetings,

Andres

#21Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#9)
#22Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#21)
#23Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#3)
#24Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#23)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#23)
#26Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#25)
#27Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#26)
#28Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alvaro Herrera (#25)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#28)
#30Peter Eisentraut
peter_e@gmx.net
In reply to: Andres Freund (#4)
#31Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#30)
#32Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#4)
#33Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#32)
#34Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#23)
#35Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#20)
#36Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#6)
#37Andrea Suisani
sickpig@opinioni.net
In reply to: Michael Paquier (#35)
#38Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#36)
#39Markus Wanner
markus@bluegap.ch
In reply to: Andres Freund (#1)
#40Andres Freund
andres@anarazel.de
In reply to: Markus Wanner (#39)
#41Andres Freund
andres@anarazel.de
In reply to: Markus Wanner (#39)
#42Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#10)
#43Markus Wanner
markus@bluegap.ch
In reply to: Andres Freund (#40)
#44Markus Wanner
markus@bluegap.ch
In reply to: Andres Freund (#41)
#45Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#38)
#46Hannu Krosing
hannu@tm.ee
In reply to: Markus Wanner (#39)
#47Markus Wanner
markus@bluegap.ch
In reply to: Hannu Krosing (#46)
#48Hannu Krosing
hannu@tm.ee
In reply to: Markus Wanner (#47)
#49Hannu Krosing
hannu@tm.ee
In reply to: Markus Wanner (#47)
#50Markus Wanner
markus@bluegap.ch
In reply to: Hannu Krosing (#48)
#51Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#1)
#52Andres Freund
andres@anarazel.de
In reply to: Steve Singer (#51)
#53Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#45)
#54Michael Paquier
michael@paquier.xyz
In reply to: Andrea Suisani (#37)
#55Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#14)
#56Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#1)
#57Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#56)
#58Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#57)
#59Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#52)
#60Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#58)
#61Andres Freund
andres@anarazel.de
In reply to: Steve Singer (#59)
#62Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#60)
#63Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#60)
#64Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#63)
#65Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#62)
#66Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#62)
#67Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#65)
#68Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#64)
#69Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#68)
#70Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#66)
#71Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#33)
#72Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#71)
#73Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#70)
#74Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#73)
#75Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#74)
#76Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#34)
#77Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#21)
#78Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#12)
#79Andres Freund
andres@anarazel.de
In reply to: Steve Singer (#78)
#80Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#79)
#81Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#80)
#82Andres Freund
andres@anarazel.de
In reply to: Steve Singer (#81)
#83Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#82)
#84Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#30)
#85Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#16)
#86Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#23)
#87Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#57)
#88Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#86)
#89Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#88)
#90Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#86)
#91Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#90)
#92Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#90)
#93Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#92)
#94Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#93)
#95Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#90)
#96Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#92)
#97Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#96)
#98Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dimitri Fontaine (#97)
#99Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#98)
In reply to: Tom Lane (#98)
#101Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Peter Geoghegan (#100)
#102Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#100)
In reply to: Andres Freund (#85)
#104Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#95)
#105Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#104)
#106Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#103)
#107Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#106)
#108Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#107)
#109Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#108)
#110Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#109)
#111Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#108)
#112Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#110)
#113Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#112)
#114Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#113)
#115Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#114)
#116Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#107)
#117Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#116)
#118Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#114)
#119Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#118)
#120Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#119)
#121Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#120)
#122Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#121)
#123Noah Misch
noah@leadboat.com
In reply to: Hannu Krosing (#48)
In reply to: Noah Misch (#123)
In reply to: Hannu Krosing (#124)
#126Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#125)
#127Markus Wanner
markus@bluegap.ch
In reply to: Hannu Krosing (#125)
#128Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#126)
In reply to: Markus Wanner (#127)
#130Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Hannu Krosing (#125)
In reply to: Noah Misch (#123)
In reply to: Dimitri Fontaine (#130)
#133Markus Wanner
markus@bluegap.ch
In reply to: Hannu Krosing (#129)