Logical Replication WIP
Hi,
as promised here is WIP version of logical replication patch.
This is by no means anywhere close to be committable, but it should be
enough for discussion on the approaches chosen. I do plan to give this
some more time before September CF as well as during the CF itself.
You've seen some preview of ideas in the doc Simon posted [1]/messages/by-id/CANP8+j+NMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_=-HA@mail.gmail.com, not all
of them are implemented yet in this patch though.
I'll start with the overview of the state of things.
What works:
- Replication of INSERT/UPDATE/DELETE operations on tables in
publication.
- Initial copy of data in publication.
- Automatic management of things like slots and origin tracking.
- Some psql support (\drp, \drs and additional info in \d for
tables, it's mainly missing ACLs as those are not implemented
(see bellow) yet and tab completion.
What's missing:
- sequences, I'd like to have them in 10.0 but I don't have good
way to implement it. PGLogical uses periodical syncing with some
buffer value but that's suboptimal. I would like to decode them
but that has proven to be complicated due to their sometimes
transactional sometimes nontransactional nature, so I probably
won't have time to do it within 10.0 by myself.
- ACLs, I still expect to have it the way it's documented in the
logical replication docs, but currently the code just assumes
superuser/REPLICATION role. This can be probably discussed in the
design thread more [1]/messages/by-id/CANP8+j+NMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_=-HA@mail.gmail.com.
- pg_dump, same as above, I want to have publications and membership
in those dumped unconditionally and potentially dump also
subscription definitions if user asks for it using commandline
option as I don't think subscriptions should be dumped by default as
automatically starting replication when somebody dumps and restores
the db goes against POLA.
- DDL, I see several approaches we could do here for 10.0. a) don't
deal with DDL at all yet, b) provide function which pushes the DDL
into replication queue and then executes on downstream (like
londiste, slony, pglogical do), c) capture the DDL query as text
and allow user defined function to be called with that DDL text on
the subscriber (that's what oracle did with CDC)
- FDW support on downstream, currently only INSERTs should work
there but that should be easy to fix.
- Monitoring, I'd like to add some pg_stat_subscription view on the
downstream (the rest of monitoring is very similar to physical
streaming so that needs mostly docs).
- TRUNCATE, this is handled using triggers in BDR and pglogical but
I am not convinced that's the right way to do it for incore as it
brings limitations (fe. inability to use restart identity).
The parts I am not overly happy with:
- The fact that subscription handles slot creation/drop means we do
some automagic that might fail and user might need to fix that up
manually. I am not saying this is necessarily problem as that's how
most of the publish/subscribe replication systems work but I wonder
if there is better way of doing this that I missed.
- The initial copy patch adds some interfaces for getting table list
and data into the DecodingContext and I wonder if that's good place
for those or if we should create some TableSync API instead that
would load plugin as well and have these two new interfaces and put
into the tablesync module. One reason why I didn't do it is that
the interface would be almost the same and the plugin then would
have to do separate init for DecodingContext and TableSync.
- The initial copy uses the snapshot from slot creation in the
walsender. I currently just push it as active snapshot inside
snapbuilder which is probably not the right thing to do (tm). That
is mostly because I don't really know what the right thing is there.
About individual pathes:
0001-Add-PUBLICATION-catalogs-and-DDL.patch: This patch defines a
Publication which his basically same thing as replication set. It adds
database local catalog pg_publication which stores the publications and
DML filters, and pg_publication_rel catalog for storing membership of
relation in the publication. Adds the DDL, dependency handling and all
the necessary boilerplate around that including some basic regression
tests for the DDL.
0002-Add-SUBSCRIPTION-catalog-and-DDL.patch: Adds Subscriptions with
shared nailed (!) catalog pg_subscription which stores the individual
subscriptions for each database. The reason why this is nailed is that
it needs to be accessible without connection to database so that the
logical replication launcher can read it and start/stop workers as
necessary. This does not include regression tests as I am usure how to
test this within regression testing framework given that it is
supposed to start workers (those are added in later patches).
0003-Define-logical-replication-protocol-and-output-plugi.patch:
Adds the logical replication protocol (api and docs) and "standard"
output plugin for logical decoding that produces output based on that
protocol and the publication definitions.
0004-Make-libpqwalreceiver-reentrant.patch: Redesigns the
libpqwalreceiver to be reusable outside of walreceiver by exporting
the api as struct and opaque connection handle. Also adds couple of
additional functions for logical replication.
0005-Add-logical-replication-workers.patch: This patch adds the actual
logical replication workers that use all above to implement the data
change replication from publisher to subscriber. It adds two different
background workers. First is Launcher which works like the autovacuum
laucnher in that it gets list of subscriptions and starts/stops the
apply workers for those subscriptions as needed. Apply workers connect
to the output plugin via streaming protocol and handle the actual data
replication. I exported the ExecUpdate/ExecInsert/ExecDelete functions
from nodeModifyTable to handle the actual database updates so that
things like triggers, etc are handled automatically without special
code. This also adds couple of TAP tests that test basic replication
setup and also wide variety of type support. Also the overview doc for
logical replication that Simon previously posted to the list is part
of this one.
0006-Logical-replication-support-for-initial-data-copy.patch: PoC of
initial sync. It adds another mode into apply worker which just applies
updates for single table and some handover logic for when the table is
given synchronized and can be replicated normally. It also adds new
catalog pg_subscription_rel which keeps information about
synchronization status of individual tables. Note that tables added to
publications at later time are not yet synchronized, there is also no
resynchronization UI yet.
On the upstream side it adds two new commands into replication protocol
for getting list of tables and for streaming existing table data. I
discussed this part as suboptimal above so won't repeat here.
Feedback is welcome.
[1]: /messages/by-id/CANP8+j+NMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_=-HA@mail.gmail.com
/messages/by-id/CANP8+j+NMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_=-HA@mail.gmail.com
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Add-PUBLICATION-catalogs-and-DDL.patchapplication/x-patch; name=0001-Add-PUBLICATION-catalogs-and-DDL.patchDownload
From 21bfbbe60ec81eb028ede812216c3ca2f035499c Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 13 Jul 2016 18:11:54 +0200
Subject: [PATCH 1/6] Add PUBLICATION catalogs and DDL
---
doc/src/sgml/catalogs.sgml | 128 ++++++
doc/src/sgml/ref/allfiles.sgml | 3 +
doc/src/sgml/ref/alter_publication.sgml | 159 ++++++++
doc/src/sgml/ref/create_publication.sgml | 164 ++++++++
doc/src/sgml/ref/drop_publication.sgml | 86 ++++
src/backend/catalog/Makefile | 1 +
src/backend/catalog/dependency.c | 19 +
src/backend/catalog/objectaddress.c | 162 ++++++++
src/backend/commands/Makefile | 5 +-
src/backend/commands/event_trigger.c | 5 +
src/backend/commands/publicationcmds.c | 550 ++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 31 ++
src/backend/nodes/equalfuncs.c | 29 ++
src/backend/parser/gram.y | 117 +++++-
src/backend/replication/logical/Makefile | 4 +-
src/backend/replication/logical/publication.c | 343 ++++++++++++++++
src/backend/tcop/utility.c | 38 ++
src/backend/utils/cache/relcache.c | 9 +
src/backend/utils/cache/syscache.c | 46 +++
src/bin/psql/command.c | 27 +-
src/bin/psql/describe.c | 96 +++++
src/bin/psql/describe.h | 3 +
src/bin/psql/help.c | 1 +
src/include/catalog/dependency.h | 2 +
src/include/catalog/indexing.h | 12 +
src/include/catalog/pg_publication.h | 64 +++
src/include/catalog/pg_publication_rel.h | 53 +++
src/include/commands/replicationcmds.h | 26 ++
src/include/nodes/nodes.h | 2 +
src/include/nodes/parsenodes.h | 24 ++
src/include/parser/kwlist.h | 1 +
src/include/replication/publication.h | 47 +++
src/include/utils/rel.h | 4 +
src/include/utils/syscache.h | 4 +
src/test/regress/expected/publication.out | 76 ++++
src/test/regress/expected/sanity_check.out | 2 +
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/publication.sql | 40 ++
38 files changed, 2370 insertions(+), 15 deletions(-)
create mode 100644 doc/src/sgml/ref/alter_publication.sgml
create mode 100644 doc/src/sgml/ref/create_publication.sgml
create mode 100644 doc/src/sgml/ref/drop_publication.sgml
create mode 100644 src/backend/commands/publicationcmds.c
create mode 100644 src/backend/replication/logical/publication.c
create mode 100644 src/include/catalog/pg_publication.h
create mode 100644 src/include/catalog/pg_publication_rel.h
create mode 100644 src/include/commands/replicationcmds.h
create mode 100644 src/include/replication/publication.h
create mode 100644 src/test/regress/expected/publication.out
create mode 100644 src/test/regress/sql/publication.sql
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 8f5332a..6d505ae 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -236,6 +236,16 @@
</row>
<row>
+ <entry><link linkend="catalog-pg-publication"><structname>pg_publication</structname></link></entry>
+ <entry>publications for logical replication</entry>
+ </row>
+
+ <row>
+ <entry><link linkend="catalog-pg-publication-rel"><structname>pg_publication_rel</structname></link></entry>
+ <entry>relation to publication mapping</entry>
+ </row>
+
+ <row>
<entry><link linkend="catalog-pg-range"><structname>pg_range</structname></link></entry>
<entry>information about range types</entry>
</row>
@@ -5110,6 +5120,124 @@
</sect1>
+ <sect1 id="catalog-pg-publication">
+ <title><structname>pg_publication</structname></title>
+
+ <indexterm zone="catalog-pg-publication">
+ <primary>pg_publication</primary>
+ </indexterm>
+
+ <para>
+ The <structname>pg_publication</structname> catalog contains
+ all publications created in the database.
+ </para>
+
+ <table>
+
+ <title><structname>pg_publication</structname> Columns</title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><structfield>oid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry></entry>
+ <entry>Row identifier (hidden attribute; must be explicitly selected)</entry>
+ </row>
+
+ <row>
+ <entry><structfield>pubname</structfield></entry>
+ <entry><type>Name</type></entry>
+ <entry></entry>
+ <entry>A unique, database-wide identifier for the publication.</entry>
+ </row>
+
+ <row>
+ <entry><structfield>pubreplins</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>If true, INSERT operations are replicated for tables in the
+ publication.</entry>
+ </row>
+
+ <row>
+ <entry><structfield>pubreplupd</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>If true, UPDATE operations are replicated for tables in the
+ publication.</entry>
+ </row>
+
+ <row>
+ <entry><structfield>pubrepldel</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>If true, DELETE operations are replicated for tables in the
+ publication.</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </sect1>
+
+ <sect1 id="catalog-pg-publication-rel">
+ <title><structname>pg_publication_rel</structname></title>
+
+ <indexterm zone="catalog-pg-publication-rel">
+ <primary>pg_publication_rel</primary>
+ </indexterm>
+
+ <para>
+ The <structname>pg_publication_rel</structname> catalog contains
+ mapping between tables and publications in the database. This is many to
+ many mapping.
+ </para>
+
+ <table>
+
+ <title><structname>pg_publication_rel</structname> Columns</title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+
+ <row>
+ <entry><structfield>pubid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry><literal><link linkend="catalog-pg-publication"><structname>pg_publication</structname></link>.oid</literal></entry>
+ <entry>Publication reference.</entry>
+ </row>
+
+ <row>
+ <entry><structfield>relid</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
+ <entry>Relation reference.</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </sect1>
+
<sect1 id="catalog-pg-range">
<title><structname>pg_range</structname></title>
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 77667bd..371a7b7 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -26,6 +26,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY alterOperatorClass SYSTEM "alter_opclass.sgml">
<!ENTITY alterOperatorFamily SYSTEM "alter_opfamily.sgml">
<!ENTITY alterPolicy SYSTEM "alter_policy.sgml">
+<!ENTITY alterPublication SYSTEM "alter_publication.sgml">
<!ENTITY alterRole SYSTEM "alter_role.sgml">
<!ENTITY alterRule SYSTEM "alter_rule.sgml">
<!ENTITY alterSchema SYSTEM "alter_schema.sgml">
@@ -72,6 +73,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createOperatorClass SYSTEM "create_opclass.sgml">
<!ENTITY createOperatorFamily SYSTEM "create_opfamily.sgml">
<!ENTITY createPolicy SYSTEM "create_policy.sgml">
+<!ENTITY createPublication SYSTEM "create_publication.sgml">
<!ENTITY createRole SYSTEM "create_role.sgml">
<!ENTITY createRule SYSTEM "create_rule.sgml">
<!ENTITY createSchema SYSTEM "create_schema.sgml">
@@ -116,6 +118,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropOperatorFamily SYSTEM "drop_opfamily.sgml">
<!ENTITY dropOwned SYSTEM "drop_owned.sgml">
<!ENTITY dropPolicy SYSTEM "drop_policy.sgml">
+<!ENTITY dropPublication SYSTEM "drop_publication.sgml">
<!ENTITY dropRole SYSTEM "drop_role.sgml">
<!ENTITY dropRule SYSTEM "drop_rule.sgml">
<!ENTITY dropSchema SYSTEM "drop_schema.sgml">
diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
new file mode 100644
index 0000000..246b39b
--- /dev/null
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -0,0 +1,159 @@
+<!--
+doc/src/sgml/ref/alter_publication.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-ALTERPUBLICATION">
+ <indexterm zone="sql-alterpublication">
+ <primary>ALTER PUBLICATION</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>ALTER PUBLICATION</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>ALTER PUBLICATION</refname>
+ <refpurpose>change the definition of a publication</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+ALTER PUBLICATION <replaceable class="PARAMETER">name</replaceable> [ [ WITH ] <replaceable class="PARAMETER">option</replaceable> [ ... ] ]
+
+<phrase>where <replaceable class="PARAMETER">option</replaceable> can be:</phrase>
+
+ REPLICATE_INSERT | NOREPLICATE_INSERT
+ | REPLICATE_UPDATE | NOREPLICATE_UPDATE
+ | REPLICATE_DELETE | NOREPLICATE_DELETE
+
+ALTER PUBLICATION <replaceable class="PARAMETER">name</replaceable> ADD TABLE <replaceable class="PARAMETER">table_name</replaceable> [, ...]
+ALTER PUBLICATION <replaceable class="PARAMETER">name</replaceable> ADD TABLE ALL IN SCHEMA <replaceable class="PARAMETER">schema_name</replaceable>
+ALTER PUBLICATION <replaceable class="PARAMETER">name</replaceable> DROP TABLE <replaceable class="PARAMETER">table_name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ The first variant of this command listed in the synopsis can change
+ all of the publication attributes that can be specified in
+ <xref linkend="sql-createpublication">.
+ Attributes not mentioned in the command retain their previous settings.
+ Database superusers can change any of these settings for any role.
+ </para>
+
+ <para>
+ The other variants this command deal with table membership in the
+ publication. The <literal>ADD TABLE</literal> subcommand will add one
+ or more tables to the publication. If the optional
+ <literal>ALL IN SCHEMA</literal> is specified all tables in that schema
+ will be added. The <literal>ALL IN SCHEMA</literal> variant of this
+ command will not complain about tables that are already present in the
+ publication, it only adds the missing ones.
+ The <literal>DROP TABLE</literal> will remove one or more table from
+ publication.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing publication whose attributes are to be
+ altered.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>REPLICATE_INSERT</literal></term>
+ <term><literal>NOREPLICATE_INSERT</literal></term>
+ <term><literal>REPLICATE_UPDATE</literal></term>
+ <term><literal>NOREPLICATE_UPDATE</literal></term>
+ <term><literal>REPLICATE_DELETE</literal></term>
+ <term><literal>NOREPLICATE_DELETE</literal></term>
+ <listitem>
+ <para>
+ These clauses alter attributes originally set by
+ <xref linkend="SQL-CREATEPUBLICATION">. For more information, see the
+ <command>CREATE PUBLICATION</command> reference page.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">table_name</replaceable></term>
+ <listitem>
+ <para>
+ Name of an existing table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">schema_name</replaceable></term>
+ <listitem>
+ <para>
+ Name of a schema.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </refsect1>
+
+ <refsect1>
+ <title>Examples</title>
+
+ <para>
+ Change the publication to not replicate inserts:
+<programlisting>
+ALTER PUBLICATION noinsert NOREPLICATE_INSERT;
+</programlisting>
+ </para>
+
+ <para>
+ Add some tables to the publication:
+<programlisting>
+ALTER PUBLICATION mypublication ADD TABLE users, departments;
+</programlisting>
+ </para>
+
+ <para>
+ Add all tables from public schema to the publication:
+<programlisting>
+ALTER PUBLICATION mypublication ADD TABLE ALL IN SCHEMA public;
+</programlisting>
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ <command>ALTER PUBLICATION</command> is a <productname>PostgreSQL</>
+ extension.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createpublication"></member>
+ <member><xref linkend="sql-droppublication"></member>
+ <member><xref linkend="sql-createsubscription"></member>
+ </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
new file mode 100644
index 0000000..ac8b71e
--- /dev/null
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -0,0 +1,164 @@
+<!--
+doc/src/sgml/ref/create_publication.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATEPUBLICATION">
+ <indexterm zone="sql-createpublication">
+ <primary>CREATE PUBLICATION</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE PUBLICATION</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE PUBLICATION</refname>
+ <refpurpose>define new publication</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE PUBLICATION <replaceable class="PARAMETER">name</replaceable> [ [ WITH ] <replaceable class="PARAMETER">option</replaceable> [ ... ] ]
+
+<phrase>where <replaceable class="PARAMETER">option</replaceable> can be:</phrase>
+
+ REPLICATE_INSERT | NOREPLICATE_INSERT
+ | REPLICATE_UPDATE | NOREPLICATE_UPDATE
+ | REPLICATE_DELETE | NOREPLICATE_DELETE
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE PUBLICATION</command> adds a new publication
+ into the current database. The publication name must be distinct from
+ the name of any existing publication in the current database.
+ </para>
+
+ <para>
+ A publication is essentially a group of tables intended for managing
+ logical replication. See
+ <xref linkend="logical-replication-publication"> for details about how
+ publications fit into logical replication setup.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the new publication.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>REPLICATE_INSERT</literal></term>
+ <term><literal>NOREPLICATE_INSERT</literal></term>
+ <listitem>
+ <para>
+ These clauses determine whether the new publication will send
+ the <command>INSERT</command> operations to the subscribers.
+ <literal>REPLICATE_INSERT</literal> is the default.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>REPLICATE_UPDATE</literal></term>
+ <term><literal>NOREPLICATE_UPDATE</literal></term>
+ <listitem>
+ <para>
+ These clauses determine whether the new publication will send
+ the <command>UPDATE</command> operations to the subscribers.
+ <literal>REPLICATE_UPDATE</literal> is the default.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>REPLICATE_DELETE</literal></term>
+ <term><literal>NOREPLICATE_DELETE</literal></term>
+ <listitem>
+ <para>
+ These clauses determine whether the new publication will send
+ the <command>DELETE</command> operations to the subscribers.
+ <literal>REPLICATE_DELETE</literal> is the default.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+
+ <para>
+ This operation does not reserve any resources on the server. It only
+ defines grouping and filtering logic for future subscribers.
+ </para>
+
+ <para>
+ To create a publication, the invoking user must have the
+ <literal>CREATE</> privilege for the current database and the
+ <literal>SUBSCRIPTION</> role.
+ (Of course, superusers bypass this check.)
+ </para>
+
+ <para>
+ Replication of <command>UPDATE</command> and <command>DELETE</command>
+ operations requires the tables added to the publication to have a
+ <literal>REPLICA IDENTITY</> index specified.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Examples</title>
+
+ <para>
+ Create a simple publication that just replicates all DML for tables in it:
+<programlisting>
+CREATE PUBLICATION mypublication;
+</programlisting>
+ </para>
+
+ <para>
+ Create an insert only publication (for example for tables without a
+ primary key):
+<programlisting>
+CREATE PUBLICATION insert_only NOREPLICATE_UPDATE NOREPLICATE_DELETE;
+</programlisting>
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ <command>CREATE PUBLICATION</command> is a <productname>PostgreSQL</>
+ extension.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-alterpublication"></member>
+ <member><xref linkend="sql-droppublication"></member>
+ <member><xref linkend="sql-createsubscription"></member>
+ </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/drop_publication.sgml b/doc/src/sgml/ref/drop_publication.sgml
new file mode 100644
index 0000000..c5b0c78
--- /dev/null
+++ b/doc/src/sgml/ref/drop_publication.sgml
@@ -0,0 +1,86 @@
+<!--
+doc/src/sgml/ref/drop_publication.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPPUBLICATION">
+ <indexterm zone="sql-droppublication">
+ <primary>DROP PUBLICATION</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP PUBLICATION</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP PUBLICATION</refname>
+ <refpurpose>remove an existing publication</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP PUBLCATION <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP PUBLCATION</command> removes publications from the database.
+ </para>
+
+ <para>
+ A publication can only be dropped by its owner or a superuser.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing publication.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </refsect1>
+
+ <refsect1>
+ <title>Examples</title>
+
+ <para>
+ Drop a publication:
+<programlisting>
+DROP PUBLICATION mypublication;
+</programlisting>
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ <command>DROP PUBLICATION</command> is a <productname>PostgreSQL</>
+ extension.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createpublication"></member>
+ <member><xref linkend="sql-alterpublication"></member>
+ </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 1ce7610..6bb3683 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -42,6 +42,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_foreign_table.h pg_policy.h pg_replication_origin.h \
pg_default_acl.h pg_init_privs.h pg_seclabel.h pg_shseclabel.h \
pg_collation.h pg_range.h pg_transform.h \
+ pg_publication.h pg_publication_rel.h \
toasting.h indexing.h \
)
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 04d7840..359e7bb 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -48,6 +48,8 @@
#include "catalog/pg_opfamily.h"
#include "catalog/pg_policy.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_publication.h"
+#include "catalog/pg_publication_rel.h"
#include "catalog/pg_rewrite.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_transform.h"
@@ -64,6 +66,7 @@
#include "commands/extension.h"
#include "commands/policy.h"
#include "commands/proclang.h"
+#include "commands/replicationcmds.h"
#include "commands/schemacmds.h"
#include "commands/seclabel.h"
#include "commands/trigger.h"
@@ -163,6 +166,8 @@ static const Oid object_classes[] = {
ExtensionRelationId, /* OCLASS_EXTENSION */
EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */
PolicyRelationId, /* OCLASS_POLICY */
+ PublicationRelationId, /* OCLASS_PUBCLICATION */
+ PublicationRelRelationId, /* OCLASS_PUBCLICATION_REL */
TransformRelationId /* OCLASS_TRANSFORM */
};
@@ -1279,6 +1284,14 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePolicyById(object->objectId);
break;
+ case OCLASS_PUBLICATION:
+ DropPublicationById(object->objectId);
+ break;
+
+ case OCLASS_PUBLICATION_REL:
+ RemovePublicationRelById(object->objectId);
+ break;
+
case OCLASS_TRANSFORM:
DropTransformById(object->objectId);
break;
@@ -2436,6 +2449,12 @@ getObjectClass(const ObjectAddress *object)
case PolicyRelationId:
return OCLASS_POLICY;
+ case PublicationRelationId:
+ return OCLASS_PUBLICATION;
+
+ case PublicationRelRelationId:
+ return OCLASS_PUBLICATION_REL;
+
case TransformRelationId:
return OCLASS_TRANSFORM;
}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 8068b82..375f4b0 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -45,6 +45,8 @@
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_policy.h"
+#include "catalog/pg_publication.h"
+#include "catalog/pg_publication_rel.h"
#include "catalog/pg_rewrite.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_transform.h"
@@ -71,6 +73,7 @@
#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_type.h"
+#include "replication/publication.h"
#include "rewrite/rewriteSupport.h"
#include "storage/lmgr.h"
#include "storage/sinval.h"
@@ -450,6 +453,18 @@ static const ObjectPropertyType ObjectProperty[] =
Anum_pg_type_typacl,
ACL_KIND_TYPE,
true
+ },
+ {
+ PublicationRelationId,
+ PublicationObjectIndexId,
+ PUBLICATIONOID,
+ PUBLICATIONNAME,
+ Anum_pg_publication_pubname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
}
};
@@ -653,6 +668,14 @@ static const struct object_type_map
{
"policy", OBJECT_POLICY
},
+ /* OCLASS_PUBLICATION */
+ {
+ "publication", OBJECT_PUBLICATION
+ },
+ /* OCLASS_PUBLICATION_REL */
+ {
+ "publication relation", OBJECT_PUBLICATION_REL
+ },
/* OCLASS_TRANSFORM */
{
"transform", OBJECT_TRANSFORM
@@ -688,6 +711,9 @@ static ObjectAddress get_object_address_opf_member(ObjectType objtype,
static ObjectAddress get_object_address_usermapping(List *objname,
List *objargs, bool missing_ok);
+static ObjectAddress get_object_address_publication_rel(List *objname,
+ List *objargs, Relation *relation,
+ bool missing_ok);
static ObjectAddress get_object_address_defacl(List *objname, List *objargs,
bool missing_ok);
static const ObjectPropertyType *get_object_property_data(Oid class_id);
@@ -812,6 +838,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_FOREIGN_SERVER:
case OBJECT_EVENT_TRIGGER:
case OBJECT_ACCESS_METHOD:
+ case OBJECT_PUBLICATION:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -926,6 +953,10 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
address = get_object_address_usermapping(objname, objargs,
missing_ok);
break;
+ case OBJECT_PUBLICATION_REL:
+ address = get_object_address_publication_rel(objname, objargs,
+ &relation,
+ missing_ok);
case OBJECT_DEFACL:
address = get_object_address_defacl(objname, objargs,
missing_ok);
@@ -1091,6 +1122,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_EVENT_TRIGGER:
msg = gettext_noop("event trigger name cannot be qualified");
break;
+ case OBJECT_PUBLICATION:
+ msg = gettext_noop("publication name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -1156,6 +1190,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_event_trigger_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_PUBLICATION:
+ address.classId = PublicationRelationId;
+ address.objectId = get_publication_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -1743,6 +1782,50 @@ get_object_address_usermapping(List *objname, List *objargs, bool missing_ok)
}
/*
+ * Find the ObjectAddress for a publication relation.
+ */
+static ObjectAddress
+get_object_address_publication_rel(List *objname, List *objargs,
+ Relation *relation, bool missing_ok)
+{
+ ObjectAddress address;
+ char *pubname;
+ Publication *pub;
+
+ ObjectAddressSet(address, UserMappingRelationId, InvalidOid);
+
+ *relation = relation_openrv_extended(makeRangeVarFromNameList(objname),
+ AccessShareLock, missing_ok);
+ if (!relation)
+ return address;
+
+ /* fetch publication name from input list */
+ pubname = strVal(linitial(objargs));
+
+ /* Now look up the pg_publication tuple */
+ pub = GetPublicationByName(pubname, missing_ok);
+ if (!pub)
+ return address;
+
+ /* Find the publication relation mapping in syscache. */
+ address.objectId =
+ GetSysCacheOid2(PUBLICATIONRELMAP,
+ ObjectIdGetDatum(RelationGetRelid(*relation)),
+ ObjectIdGetDatum(pub->oid));
+ if (!OidIsValid(address.objectId))
+ {
+ if (!missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("publication relation \"%s\" in publication \"%s\" does not exist",
+ RelationGetRelationName(*relation), pubname)));
+ return address;
+ }
+
+ return address;
+}
+
+/*
* Find the ObjectAddress for a default ACL.
*/
static ObjectAddress
@@ -2001,6 +2084,7 @@ pg_get_object_address(PG_FUNCTION_ARGS)
case OBJECT_DOMCONSTRAINT:
case OBJECT_CAST:
case OBJECT_USER_MAPPING:
+ case OBJECT_PUBLICATION_REL:
case OBJECT_DEFACL:
case OBJECT_TRANSFORM:
if (list_length(args) != 1)
@@ -2230,6 +2314,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
case OBJECT_TSPARSER:
case OBJECT_TSTEMPLATE:
case OBJECT_ACCESS_METHOD:
+ case OBJECT_PUBLICATION:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -3195,6 +3280,42 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_PUBLICATION:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(PUBLICATIONOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for publication %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("publicaton %s"),
+ NameStr(((Form_pg_publication) GETSTRUCT(tup))->pubname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
+ case OCLASS_PUBLICATION_REL:
+ {
+ HeapTuple tup;
+ Publication *pub;
+ Form_pg_publication_rel prform;
+
+ tup = SearchSysCache1(PUBLICATIONREL,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for publication table %u",
+ object->objectId);
+
+ prform = (Form_pg_publication_rel) GETSTRUCT(tup);
+ pub = GetPublication(prform->pubid);
+
+ appendStringInfo(&buffer, _("publication table %s in publication %s"),
+ get_rel_name(prform->relid), pub->name);
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3680,6 +3801,14 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "access method");
break;
+ case OCLASS_PUBLICATION:
+ appendStringInfoString(&buffer, "publication");
+ break;
+
+ case OCLASS_PUBLICATION_REL:
+ appendStringInfoString(&buffer, "publication table");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4650,6 +4779,39 @@ getObjectIdentityParts(const ObjectAddress *object,
}
break;
+ case OCLASS_PUBLICATION:
+ {
+ Publication *pub;
+
+ pub = GetPublication(object->objectId);
+ appendStringInfoString(&buffer,
+ quote_identifier(pub->name));
+ if (objname)
+ *objname = list_make1(pstrdup(pub->name));
+ break;
+ }
+
+ case OCLASS_PUBLICATION_REL:
+ {
+ HeapTuple tup;
+ Publication *pub;
+ Form_pg_publication_rel prform;
+
+ tup = SearchSysCache1(PUBLICATIONREL,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for publication table %u",
+ object->objectId);
+
+ prform = (Form_pg_publication_rel) GETSTRUCT(tup);
+ pub = GetPublication(prform->pubid);
+
+ appendStringInfo(&buffer, _("%s in publication %s"),
+ get_rel_name(prform->relid), pub->name);
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 6b3742c..cb580377 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -17,9 +17,8 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
dbcommands.o define.o discard.o dropcmds.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
- policy.o portalcmds.o prepare.o proclang.o \
+ policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
- variable.o view.o
+ tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 50c89b8..8aaf1a7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -122,6 +122,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"TYPE", true},
{"USER MAPPING", true},
{"VIEW", true},
+ {"PUBLICATION", true},
{NULL, false}
};
@@ -1119,6 +1120,8 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_TYPE:
case OBJECT_USER_MAPPING:
case OBJECT_VIEW:
+ case OBJECT_PUBLICATION:
+ case OBJECT_PUBLICATION_REL:
return true;
}
return true;
@@ -1170,6 +1173,8 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_EXTENSION:
case OCLASS_POLICY:
case OCLASS_AM:
+ case OCLASS_PUBLICATION:
+ case OCLASS_PUBLICATION_REL:
return true;
}
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
new file mode 100644
index 0000000..19c6a97
--- /dev/null
+++ b/src/backend/commands/publicationcmds.c
@@ -0,0 +1,550 @@
+/*-------------------------------------------------------------------------
+ *
+ * publicationcmds.c
+ * publication manipulation
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * publicationcmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "miscadmin.h"
+
+#include "access/genam.h"
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/objectaddress.h"
+#include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_publication.h"
+#include "catalog/pg_publication_rel.h"
+
+#include "commands/defrem.h"
+#include "commands/event_trigger.h"
+#include "commands/replicationcmds.h"
+
+#include "executor/spi.h"
+
+#include "nodes/makefuncs.h"
+
+#include "parser/parse_clause.h"
+
+#include "replication/publication.h"
+#include "replication/reorderbuffer.h"
+
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/catcache.h"
+#include "utils/fmgroids.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+static void
+check_replication_permissions(void)
+{
+ if (!superuser() && !has_rolreplication(GetUserId()))
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser or replication role to manipulate publications"))));
+}
+
+static void
+parse_publication_options(List *options,
+ bool *replicate_insert_given,
+ bool *replicate_insert,
+ bool *replicate_update_given,
+ bool *replicate_update,
+ bool *replicate_delete_given,
+ bool *replicate_delete)
+{
+ ListCell *lc;
+
+ *replicate_insert_given = false;
+ *replicate_update_given = false;
+ *replicate_delete_given = false;
+
+ /* Defaults are true */
+ *replicate_insert = true;
+ *replicate_update = true;
+ *replicate_delete = true;
+
+ /* Parse options */
+ foreach (lc, options)
+ {
+ DefElem *defel = (DefElem *) lfirst(lc);
+
+ if (strcmp(defel->defname, "replicate_insert") == 0)
+ {
+ if (*replicate_insert_given)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+
+ *replicate_insert_given = true;
+ *replicate_insert = defGetBoolean(defel);
+ }
+ else if (strcmp(defel->defname, "replicate_update") == 0)
+ {
+ if (*replicate_update_given)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+
+ *replicate_update_given = true;
+ *replicate_update = defGetBoolean(defel);
+ }
+ else if (strcmp(defel->defname, "replicate_delete") == 0)
+ {
+ if (*replicate_delete_given)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+
+ *replicate_delete_given = true;
+ *replicate_delete = defGetBoolean(defel);
+ }
+ else
+ elog(ERROR, "unrecognized option: %s", defel->defname);
+ }
+}
+
+/*
+ * Create new publication.
+ * TODO ACL check
+ */
+ObjectAddress
+CreatePublication(CreatePublicationStmt *stmt)
+{
+ Relation rel;
+ ObjectAddress myself;
+ Oid puboid;
+ bool nulls[Natts_pg_publication];
+ Datum values[Natts_pg_publication];
+ HeapTuple tup;
+ bool replicate_insert_given;
+ bool replicate_update_given;
+ bool replicate_delete_given;
+ bool replicate_insert;
+ bool replicate_update;
+ bool replicate_delete;
+
+ check_replication_permissions();
+
+ rel = heap_open(PublicationRelationId, RowExclusiveLock);
+
+ /* Check if name is used */
+ puboid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(stmt->pubname));
+ if (OidIsValid(puboid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("publication \"%s\" already exists",
+ stmt->pubname)));
+ }
+
+ /* Form a tuple. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_publication_pubname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(stmt->pubname));
+
+ parse_publication_options(stmt->options,
+ &replicate_insert_given, &replicate_insert,
+ &replicate_update_given, &replicate_update,
+ &replicate_delete_given, &replicate_delete);
+
+ values[Anum_pg_publication_pubreplins - 1] =
+ BoolGetDatum(replicate_insert);
+ values[Anum_pg_publication_pubreplupd - 1] =
+ BoolGetDatum(replicate_update);
+ values[Anum_pg_publication_pubrepldel - 1] =
+ BoolGetDatum(replicate_delete);
+
+ tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+
+ /* Insert tuple into catalog. */
+ puboid = simple_heap_insert(rel, tup);
+ CatalogUpdateIndexes(rel, tup);
+ heap_freetuple(tup);
+
+ ObjectAddressSet(myself, PublicationRelationId, puboid);
+
+ heap_close(rel, RowExclusiveLock);
+
+ /* Make the changes visible. */
+ CommandCounterIncrement();
+
+ return myself;
+}
+
+/*
+ * Change options of a publication.
+ */
+static void
+AlterPublicationOptions(AlterPublicationStmt *stmt, Relation rel,
+ HeapTuple tup)
+{
+ bool nulls[Natts_pg_publication];
+ bool replaces[Natts_pg_publication];
+ Datum values[Natts_pg_publication];
+ bool replicate_insert_given;
+ bool replicate_update_given;
+ bool replicate_delete_given;
+ bool replicate_insert;
+ bool replicate_update;
+ bool replicate_delete;
+ ObjectAddress obj;
+ Form_pg_publication pub = (Form_pg_publication) GETSTRUCT(tup);
+
+ parse_publication_options(stmt->options,
+ &replicate_insert_given, &replicate_insert,
+ &replicate_update_given, &replicate_update,
+ &replicate_delete_given, &replicate_delete);
+
+ /*
+ * Validate that replication is not being changed to replicate UPDATEs
+ * and DELETEs if it contains any tables without replication identity.
+ */
+ if ((replicate_update_given && replicate_update) ||
+ (replicate_delete_given && replicate_delete))
+ {
+ Relation pubrelsrel;
+ ScanKeyData scankey;
+ SysScanDesc scan;
+ HeapTuple reltup;
+
+ pubrelsrel = heap_open(PublicationRelRelationId, AccessShareLock);
+
+ /* Loop over all relations in the publication. */
+ ScanKeyInit(&scankey,
+ Anum_pg_publication_rel_pubid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(HeapTupleGetOid(tup)));
+
+ scan = systable_beginscan(pubrelsrel, 0, true, NULL, 1, &scankey);
+
+ /* Process every individual table in the publication. */
+ while (HeapTupleIsValid(reltup = systable_getnext(scan)))
+ {
+ Form_pg_publication_rel t;
+ Relation pubrel;
+
+ t = (Form_pg_publication_rel) GETSTRUCT(reltup);
+
+ pubrel = heap_open(t->relid, AccessShareLock);
+
+ /* Check if relation has replication index. */
+ if (RelationGetForm(pubrel)->relkind == RELKIND_RELATION)
+ {
+
+ if (pubrel->rd_indexvalid == 0)
+ RelationGetIndexList(pubrel);
+ if (!OidIsValid(pubrel->rd_replidindex))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("publication %s cannot be altered to "
+ "replicate UPDATEs or DELETEs because it "
+ "contains tables without PRIMARY KEY",
+ NameStr(pub->pubname))));
+ }
+
+ heap_close(pubrel, NoLock);
+ }
+
+ systable_endscan(scan);
+ heap_close(pubrelsrel, NoLock);
+ }
+
+ /* Everything ok, form a new tuple. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+ memset(replaces, false, sizeof(replaces));
+
+ if (replicate_insert_given)
+ {
+ values[Anum_pg_publication_pubreplins - 1] =
+ BoolGetDatum(replicate_insert);
+ replaces[Anum_pg_publication_pubreplins - 1] = true;
+ }
+ if (replicate_update_given)
+ {
+ values[Anum_pg_publication_pubreplupd - 1] =
+ BoolGetDatum(replicate_update);
+ replaces[Anum_pg_publication_pubreplupd - 1] = true;
+ }
+ if (replicate_delete_given)
+ {
+ values[Anum_pg_publication_pubrepldel - 1] =
+ BoolGetDatum(replicate_delete);
+ replaces[Anum_pg_publication_pubrepldel - 1] = true;
+ }
+
+ tup = heap_modify_tuple(tup, RelationGetDescr(rel), values, nulls,
+ replaces);
+
+ /* Update the catalog. */
+ simple_heap_update(rel, &tup->t_self, tup);
+ CatalogUpdateIndexes(rel, tup);
+
+ ObjectAddressSet(obj, PublicationRelationId, HeapTupleGetOid(tup));
+ EventTriggerCollectSimpleCommand(obj, InvalidObjectAddress,
+ (Node *) stmt);
+}
+
+/*
+ * Add or removes table to/from publication.
+ */
+static void
+AlterPublicationTables(AlterPublicationStmt *stmt, Relation rel,
+ HeapTuple tup)
+{
+ Oid pubid = HeapTupleGetOid(tup);
+ Oid prid;
+ bool missing_ok = false;
+ bool if_not_exists = false;
+ List *rels = NIL;
+ ListCell *lc;
+ ObjectAddress obj;
+
+ if (stmt->schema)
+ {
+ Oid nspoid;
+ Relation rel;
+ SysScanDesc scan;
+ ScanKeyData key[2];
+ HeapTuple tup;
+
+ Assert(list_length(stmt->tables) == 0);
+
+ /*
+ * Open, share-lock, and check all relation in the specified schema
+ */
+
+ nspoid = LookupExplicitNamespace(stmt->schema, false);
+ rel = heap_open(RelationRelationId, AccessShareLock);
+
+ ScanKeyInit(&key[0],
+ Anum_pg_class_relnamespace,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(nspoid));
+ ScanKeyInit(&key[1],
+ Anum_pg_class_relkind,
+ BTEqualStrategyNumber, F_CHAREQ,
+ CharGetDatum(RELKIND_RELATION));
+
+ scan = systable_beginscan(rel, InvalidOid, false,
+ NULL, 2, key);
+
+ while ((tup = systable_getnext(scan)) != NULL)
+ {
+ Oid schemarelid = HeapTupleGetOid(tup);
+ Relation schemarel;
+
+ schemarel = heap_open(schemarelid, AccessShareLock);
+ rels = lappend(rels, schemarel);
+ }
+
+ systable_endscan(scan);
+ heap_close(rel, AccessShareLock);
+
+ /* Don't error on missing tables on DROP */
+ missing_ok = true;
+
+ /* Don't error on already existing tables on ADD. */
+ if_not_exists = true;
+ }
+ else
+ {
+ List *relids = NIL;
+
+ Assert(list_length(stmt->tables) > 0);
+
+ /*
+ * Open, share-lock, and check all the explicitly-specified relations
+ */
+
+ foreach(lc, stmt->tables)
+ {
+ RangeVar *rv = lfirst(lc);
+ Relation rel;
+ bool recurse = interpretInhOption(rv->inhOpt);
+ Oid myrelid;
+
+ rel = heap_openrv(rv, AccessShareLock);
+ myrelid = RelationGetRelid(rel);
+ /* don't throw error for "foo, foo" */
+ if (list_member_oid(relids, myrelid))
+ {
+ heap_close(rel, AccessShareLock);
+ continue;
+ }
+ rels = lappend(rels, rel);
+ relids = lappend_oid(relids, myrelid);
+
+ if (recurse)
+ {
+ ListCell *child;
+ List *children;
+
+ children = find_all_inheritors(myrelid, AccessShareLock,
+ NULL);
+
+ foreach(child, children)
+ {
+ Oid childrelid = lfirst_oid(child);
+
+ if (list_member_oid(relids, childrelid))
+ continue;
+
+ /* find_all_inheritors already got lock */
+ rel = heap_open(childrelid, NoLock);
+ rels = lappend(rels, rel);
+ relids = lappend_oid(relids, childrelid);
+ }
+ }
+ }
+ }
+
+ /* Do the operation which was requested on all found relations. */
+ if (stmt->isDrop)
+ {
+ foreach(lc, rels)
+ {
+ Relation rel = (Relation) lfirst(lc);
+ Oid relid = RelationGetRelid(rel);
+
+ prid = GetSysCacheOid2(PUBLICATIONRELMAP, relid, pubid);
+ if (!OidIsValid(prid))
+ {
+ if (missing_ok)
+ continue;
+
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("relation \"%s\" is not part of the publication",
+ RelationGetRelationName(rel))));
+ }
+
+ ObjectAddressSet(obj, PublicationRelRelationId, prid);
+ performDeletion(&obj, DROP_CASCADE, 0);
+ }
+ }
+ else
+ {
+ foreach(lc, rels)
+ {
+ Relation rel = (Relation) lfirst(lc);
+
+ prid = publication_add_relation(pubid, rel, if_not_exists);
+ ObjectAddressSet(obj, PublicationRelRelationId, prid);
+ EventTriggerCollectSimpleCommand(obj, InvalidObjectAddress,
+ (Node *) stmt);
+ }
+ }
+
+ /* And close the rels */
+ foreach(lc, rels)
+ {
+ Relation rel = (Relation) lfirst(lc);
+
+ heap_close(rel, NoLock);
+ }
+}
+
+/*
+ * Alter the existing publication.
+ *
+ * This is dispatcher function for AlterPublicationOptions and
+ * AlterPublicationTables.
+ */
+void
+AlterPublication(AlterPublicationStmt *stmt)
+{
+ Relation rel;
+ HeapTuple tup;
+
+ check_replication_permissions();
+
+ rel = heap_open(PublicationRelationId, RowExclusiveLock);
+
+ tup = SearchSysCacheCopy1(PUBLICATIONNAME,
+ CStringGetDatum(stmt->pubname));
+
+ if (!HeapTupleIsValid(tup))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("publication \"%s\" does not exist",
+ stmt->pubname)));
+
+ if (stmt->options)
+ AlterPublicationOptions(stmt, rel, tup);
+ else
+ AlterPublicationTables(stmt, rel, tup);
+
+ /* Cleanup. */
+ heap_freetuple(tup);
+ heap_close(rel, RowExclusiveLock);
+}
+
+/*
+ * Drop publication by OID
+ */
+void
+DropPublicationById(Oid pubid)
+{
+ Relation rel;
+ HeapTuple tup;
+
+ check_replication_permissions();
+
+ rel = heap_open(PublicationRelationId, RowExclusiveLock);
+
+ tup = SearchSysCache1(PUBLICATIONOID, ObjectIdGetDatum(pubid));
+
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for publication %u", pubid);
+
+ simple_heap_delete(rel, &tup->t_self);
+
+ ReleaseSysCache(tup);
+
+ heap_close(rel, RowExclusiveLock);
+}
+
+/*
+ * Remove relation from publication by mapping OID.
+ */
+void
+RemovePublicationRelById(Oid prid)
+{
+ Relation rel;
+ HeapTuple tup;
+
+ rel = heap_open(PublicationRelRelationId, RowExclusiveLock);
+
+ tup = SearchSysCache1(PUBLICATIONREL, ObjectIdGetDatum(prid));
+
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for publication table %u",
+ prid);
+
+ simple_heap_delete(rel, &tup->t_self);
+
+ ReleaseSysCache(tup);
+
+ heap_close(rel, RowExclusiveLock);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 3244c76..bf76742 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4154,6 +4154,31 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
return newnode;
}
+static CreatePublicationStmt *
+_copyCreatePublicationStmt(const CreatePublicationStmt *from)
+{
+ CreatePublicationStmt *newnode = makeNode(CreatePublicationStmt);
+
+ COPY_STRING_FIELD(pubname);
+ COPY_NODE_FIELD(options);
+
+ return newnode;
+}
+
+static AlterPublicationStmt *
+_copyAlterPublicationStmt(const AlterPublicationStmt *from)
+{
+ AlterPublicationStmt *newnode = makeNode(AlterPublicationStmt);
+
+ COPY_STRING_FIELD(pubname);
+ COPY_NODE_FIELD(options);
+ COPY_SCALAR_FIELD(isDrop);
+ COPY_STRING_FIELD(schema);
+ COPY_NODE_FIELD(tables);
+
+ return newnode;
+}
+
/* ****************************************************************
* pg_list.h copy functions
* ****************************************************************
@@ -4945,6 +4970,12 @@ copyObject(const void *from)
case T_AlterPolicyStmt:
retval = _copyAlterPolicyStmt(from);
break;
+ case T_CreatePublicationStmt:
+ retval = _copyCreatePublicationStmt(from);
+ break;
+ case T_AlterPublicationStmt:
+ retval = _copyAlterPublicationStmt(from);
+ break;
case T_A_Expr:
retval = _copyAExpr(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 1eb6799..0c5f1d0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2106,6 +2106,29 @@ _equalAlterTSConfigurationStmt(const AlterTSConfigurationStmt *a,
}
static bool
+_equalCreatePublicationStmt(const CreatePublicationStmt *a,
+ const CreatePublicationStmt *b)
+{
+ COMPARE_STRING_FIELD(pubname);
+ COMPARE_NODE_FIELD(options);
+
+ return true;
+}
+
+static bool
+_equalAlterPublicationStmt(const AlterPublicationStmt *a,
+ const AlterPublicationStmt *b)
+{
+ COMPARE_STRING_FIELD(pubname);
+ COMPARE_NODE_FIELD(options);
+ COMPARE_SCALAR_FIELD(isDrop);
+ COMPARE_STRING_FIELD(schema);
+ COMPARE_NODE_FIELD(tables);
+
+ return true;
+}
+
+static bool
_equalCreatePolicyStmt(const CreatePolicyStmt *a, const CreatePolicyStmt *b)
{
COMPARE_STRING_FIELD(policy_name);
@@ -3249,6 +3272,12 @@ equal(const void *a, const void *b)
case T_AlterPolicyStmt:
retval = _equalAlterPolicyStmt(a, b);
break;
+ case T_CreatePublicationStmt:
+ retval = _equalCreatePublicationStmt(a, b);
+ break;
+ case T_AlterPublicationStmt:
+ retval = _equalAlterPublicationStmt(a, b);
+ break;
case T_A_Expr:
retval = _equalAExpr(a, b);
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0cae446..b91e75a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -266,6 +266,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
DropOwnedStmt ReassignOwnedStmt
AlterTSConfigurationStmt AlterTSDictionaryStmt
CreateMatViewStmt RefreshMatViewStmt CreateAmStmt
+ CreatePublicationStmt AlterPublicationStmt
%type <node> select_no_parens select_with_parens select_clause
simple_select values_clause
@@ -372,13 +373,14 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
create_generic_options alter_generic_options
relation_expr_list dostmt_opt_list
transform_element_list transform_type_list
+ publication_opt_list publication_opt_items
%type <list> group_by_list
%type <node> group_by_item empty_grouping_set rollup_clause cube_clause
%type <node> grouping_sets_clause
%type <list> opt_fdw_options fdw_options
-%type <defelt> fdw_option
+%type <defelt> fdw_option publication_opt_item
%type <range> OptTempTableName
%type <into> into_clause create_as_target create_mv_target
@@ -617,7 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
PARALLEL PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLANS POLICY
POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY
- PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROGRAM
+ PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROGRAM PUBLICATION
QUOTE
@@ -778,6 +780,7 @@ stmt :
| AlterTableStmt
| AlterTblSpcStmt
| AlterCompositeTypeStmt
+ | AlterPublicationStmt
| AlterRoleSetStmt
| AlterRoleStmt
| AlterTSConfigurationStmt
@@ -807,6 +810,7 @@ stmt :
| CreateMatViewStmt
| CreateOpClassStmt
| CreateOpFamilyStmt
+ | CreatePublicationStmt
| AlterOpFamilyStmt
| CreatePolicyStmt
| CreatePLangStmt
@@ -5643,6 +5647,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH DICTIONARY { $$ = OBJECT_TSDICTIONARY; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
+ | PUBLICATION { $$ = OBJECT_PUBLICATION; }
;
any_name_list:
@@ -8498,6 +8503,113 @@ AlterOwnerStmt: ALTER AGGREGATE func_name aggr_args OWNER TO RoleSpec
/*****************************************************************************
*
+ * CREATE PUBLICATION name [ WITH options ]
+ *
+ *****************************************************************************/
+
+CreatePublicationStmt:
+ CREATE PUBLICATION name opt_with publication_opt_list
+ {
+ CreatePublicationStmt *n = makeNode(CreatePublicationStmt);
+ n->pubname = $3;
+ n->options = $5;
+ $$ = (Node *)n;
+ }
+ ;
+
+publication_opt_list:
+ publication_opt_items { $$ = $1; }
+ | /* EMPTY */ { $$ = NIL; }
+ ;
+
+publication_opt_items:
+ publication_opt_item { $$ = list_make1($1); }
+ | publication_opt_items publication_opt_item { $$ = lappend($1, $2); }
+ ;
+
+publication_opt_item:
+ IDENT
+ {
+ /*
+ * We handle identifiers that aren't parser keywords with
+ * the following special-case codes, to avoid bloating the
+ * size of the main parser.
+ */
+ if (strcmp($1, "replicate_insert") == 0)
+ $$ = makeDefElem("replicate_insert", (Node *)makeInteger(TRUE));
+ else if (strcmp($1, "noreplicate_insert") == 0)
+ $$ = makeDefElem("replicate_insert", (Node *)makeInteger(FALSE));
+ else if (strcmp($1, "replicate_update") == 0)
+ $$ = makeDefElem("replicate_update", (Node *)makeInteger(TRUE));
+ else if (strcmp($1, "noreplicate_update") == 0)
+ $$ = makeDefElem("replicate_update", (Node *)makeInteger(FALSE));
+ else if (strcmp($1, "replicate_delete") == 0)
+ $$ = makeDefElem("replicate_delete", (Node *)makeInteger(TRUE));
+ else if (strcmp($1, "noreplicate_delete") == 0)
+ $$ = makeDefElem("replicate_delete", (Node *)makeInteger(FALSE));
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized publication option \"%s\"", $1),
+ parser_errposition(@1)));
+ }
+ ;
+
+/*****************************************************************************
+ *
+ * ALTER PUBLICATION name [ WITH ] options
+ *
+ * ALTER PUBLICATION name ADD TABLE table [, table2]
+ *
+ * ALTER PUBLICATION name DROP TABLE table [, table2]
+ *
+ *****************************************************************************/
+
+AlterPublicationStmt:
+ ALTER PUBLICATION name opt_with publication_opt_items
+ {
+ AlterPublicationStmt *n = makeNode(AlterPublicationStmt);
+ n->pubname = $3;
+ n->options = $5;
+ n->isDrop = FALSE;
+ n->schema = NULL;
+ n->tables = NIL;
+ $$ = (Node *)n;
+ }
+ | ALTER PUBLICATION name ADD_P TABLE relation_expr_list
+ {
+ AlterPublicationStmt *n = makeNode(AlterPublicationStmt);
+ n->pubname = $3;
+ n->options = NIL;
+ n->isDrop = FALSE;
+ n->schema = NULL;
+ n->tables = $6;
+ $$ = (Node *)n;
+ }
+ | ALTER PUBLICATION name ADD_P TABLE ALL IN_P SCHEMA name
+ {
+ AlterPublicationStmt *n = makeNode(AlterPublicationStmt);
+ n->pubname = $3;
+ n->options = NIL;
+ n->isDrop = FALSE;
+ n->schema = $9;
+ n->tables = NIL;
+ $$ = (Node *)n;
+ }
+ | ALTER PUBLICATION name DROP TABLE relation_expr_list
+ {
+ AlterPublicationStmt *n = makeNode(AlterPublicationStmt);
+ n->pubname = $3;
+ n->options = NIL;
+ n->isDrop = TRUE;
+ n->schema = NULL;
+ n->tables = $6;
+ $$ = (Node *)n;
+ }
+ ;
+
+/*****************************************************************************
+ *
* QUERY: Define Rewrite Rule
*
*****************************************************************************/
@@ -13899,6 +14011,7 @@ unreserved_keyword:
| PROCEDURAL
| PROCEDURE
| PROGRAM
+ | PUBLICATION
| QUOTE
| RANGE
| READ
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 1d7ca06..3b3e90c 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -14,7 +14,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
-OBJS = decode.o logical.o logicalfuncs.o message.o origin.o reorderbuffer.o \
- snapbuild.o
+OBJS = decode.o logical.o logicalfuncs.o message.o origin.o publication.o \
+ reorderbuffer.o snapbuild.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/publication.c b/src/backend/replication/logical/publication.c
new file mode 100644
index 0000000..b86611e
--- /dev/null
+++ b/src/backend/replication/logical/publication.c
@@ -0,0 +1,343 @@
+/*-------------------------------------------------------------------------
+ *
+ * publication.c
+ * publication C api manipulation
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * publication.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "miscadmin.h"
+
+#include "access/genam.h"
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/objectaddress.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_publication.h"
+#include "catalog/pg_publication_rel.h"
+
+#include "executor/spi.h"
+
+#include "nodes/makefuncs.h"
+
+#include "replication/publication.h"
+#include "replication/reorderbuffer.h"
+
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/catcache.h"
+#include "utils/fmgroids.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+/*
+ * Check if an action on a specified relation should be replicated for
+ * the publication list
+ */
+bool
+publication_change_is_replicated(Relation rel,
+ PublicationChangeType change_type,
+ List *pubnames)
+{
+ List *relpublications = GetRelationPublications(rel);
+ ListCell *rellc;
+ bool result = false;
+
+ /* TODO: optimize */
+ foreach (rellc, relpublications)
+ {
+ Oid relpuboid = lfirst_oid(rellc);
+ ListCell *namelc;
+
+ foreach (namelc, pubnames)
+ {
+ char *pubname = (char *) lfirst(namelc);
+ HeapTuple tup;
+ Form_pg_publication pub;
+
+ tup = SearchSysCache1(PUBLICATIONNAME, CStringGetDatum(pubname));
+ pub = (Form_pg_publication) GETSTRUCT(tup);
+
+ if (HeapTupleGetOid(tup) == relpuboid)
+ {
+ switch (change_type)
+ {
+ case PublicationChangeInsert:
+ result = pub->pubreplins;
+ break;
+ case PublicationChangeUpdate:
+ result = pub->pubreplupd;
+ break;
+ case PublicationChangeDelete:
+ result = pub->pubrepldel;
+ break;
+ default:
+ elog(ERROR, "unknown change_type %d", change_type);
+ }
+
+ /*
+ * We don't need to search more once we found a publication
+ * that replicates the change type.
+ */
+ if (result)
+ {
+ list_free(relpublications);
+ return result;
+ }
+ }
+ }
+ }
+
+ list_free(relpublications);
+ return result;
+}
+
+
+/*
+ * Insert new publication / relation mapping.
+ */
+Oid
+publication_add_relation(Oid pubid, Relation targetrel,
+ bool if_not_exists)
+{
+ Relation rel;
+ HeapTuple tup;
+ Datum values[Natts_pg_publication_rel];
+ bool nulls[Natts_pg_publication_rel];
+ Oid relid = RelationGetRelid(targetrel);
+ Oid prid;
+ Publication *pub = GetPublication(pubid);
+ ObjectAddress myself,
+ referenced;
+
+ rel = heap_open(PublicationRelRelationId, RowExclusiveLock);
+
+ /* Check for duplicates */
+ if (SearchSysCacheExists2(PUBLICATIONRELMAP, relid, pubid))
+ {
+ heap_close(rel, RowExclusiveLock);
+
+ if (if_not_exists)
+ return InvalidOid;
+
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("relation %s is already member of publication %s",
+ RelationGetRelationName(rel), pub->name)));
+ }
+
+ /* Must be table */
+ if (RelationGetForm(rel)->relkind != RELKIND_RELATION)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("only tables can be added to publication"),
+ errdetail("%s is not a table",
+ RelationGetRelationName(rel))));
+
+ /* UNLOGGED and TEMP relations cannot be part of publication. */
+ if (!RelationNeedsWAL(targetrel))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("UNLOGGED and TEMP relations cannot be replicated")));
+
+ /* Must have replica identity index. */
+ if (targetrel->rd_indexvalid == 0)
+ RelationGetIndexList(targetrel);
+ if (!OidIsValid(targetrel->rd_replidindex) &&
+ (pub->replicate_update || pub->replicate_delete))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("table %s cannot be added to publication %s",
+ RelationGetRelationName(targetrel), pub->name),
+ errdetail("table does not have REPLICA IDENTITY index "
+ "and given publication is configured to "
+ "replicate UPDATEs and/or DELETEs"),
+ errhint("Add a PRIMARY KEY to the table")));
+
+ /* Form a tuple. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_publication_rel_pubid - 1] =
+ ObjectIdGetDatum(pubid);
+ values[Anum_pg_publication_rel_relid - 1] =
+ ObjectIdGetDatum(RelationGetRelid(targetrel));
+
+ tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+
+ /* Insert tuple into catalog. */
+ prid = simple_heap_insert(rel, tup);
+ CatalogUpdateIndexes(rel, tup);
+ heap_freetuple(tup);
+
+ /* Add dependency on the publication */
+ ObjectAddressSet(myself, PublicationRelRelationId, prid);
+ ObjectAddressSet(referenced, PublicationRelationId, pubid);
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
+
+ /* Add dependency on the relation */
+ ObjectAddressSet(referenced, RelationRelationId, relid);
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ /*
+ * In case the publication replicates updates and deletes we also
+ * need to record dependency on the replica index.
+ *
+ * XXX: this has unpleasant sideeffect that replacing replica index
+ * (PKey in most cases) means removal of table from publication. We
+ * could habdle is better if we checked specifically for this during
+ * execution of the related commands.
+ */
+ if (pub->replicate_update || pub->replicate_delete)
+ {
+ ObjectAddressSet(referenced, IndexRelationId,
+ targetrel->rd_replidindex);
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ }
+
+ /* Close the table. */
+ heap_close(rel, RowExclusiveLock);
+
+ return prid;
+}
+
+
+/*
+ * Gets list of publication oids for a relation.
+ */
+List *
+GetRelationPublications(Relation rel)
+{
+ List *result;
+ List *oldlist;
+ Relation pubrelsrel;
+ ScanKeyData scankey;
+ SysScanDesc scan;
+ HeapTuple tup;
+ MemoryContext oldcxt;
+
+ /* Quick exit if we already computed the list. */
+ if (rel->rd_publicationsvalid)
+ return list_copy(rel->rd_publications);
+
+ /* Find all publications associated with the relation. */
+ pubrelsrel = heap_open(PublicationRelRelationId, AccessShareLock);
+
+ ScanKeyInit(&scankey,
+ Anum_pg_publication_rel_relid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(RelationGetRelid(rel)));
+
+ scan = systable_beginscan(pubrelsrel, PublicationRelMapIndexId, true,
+ NULL, 1, &scankey);
+
+ result = NIL;
+ while (HeapTupleIsValid(tup = systable_getnext(scan)))
+ {
+ Form_pg_publication_rel pubrel;
+
+ pubrel = (Form_pg_publication_rel) GETSTRUCT(tup);
+
+ result = lappend_oid(result, pubrel->pubid);
+ }
+
+ systable_endscan(scan);
+ heap_close(pubrelsrel, NoLock);
+
+ /* Now save a copy of the completed list in the relcache entry. */
+ oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+ oldlist = rel->rd_publications;
+ rel->rd_publications = list_copy(result);
+ rel->rd_publicationsvalid = true;
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Don't leak the old list, if there is one */
+ list_free(oldlist);
+
+ return result;
+}
+
+Publication *
+GetPublication(Oid pubid)
+{
+ HeapTuple tup;
+ Publication *pub;
+ Form_pg_publication pubform;
+
+ tup = SearchSysCache1(PUBLICATIONOID, ObjectIdGetDatum(pubid));
+
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for publication %u", pubid);
+
+ pubform = (Form_pg_publication) GETSTRUCT(tup);
+
+ pub = (Publication *) palloc(sizeof(Publication));
+ pub->oid = pubid;
+ pub->name = NameStr(pubform->pubname);
+ pub->replicate_insert = pubform->pubreplins;
+ pub->replicate_update = pubform->pubreplupd;
+ pub->replicate_delete = pubform->pubrepldel;
+
+ ReleaseSysCache(tup);
+
+ return pub;
+}
+
+
+/*
+ * Get Publication using name.
+ */
+Publication *
+GetPublicationByName(const char *pubname, bool missing_ok)
+{
+ Oid oid;
+
+ oid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(pubname));
+ if (!OidIsValid(oid))
+ {
+ if (missing_ok)
+ return NULL;
+
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("publication \"%s\" does not exist", pubname)));
+ }
+
+ return GetPublication(oid);
+}
+
+/*
+ * get_publication_oid - given a publication name, look up the OID
+ *
+ * If missing_ok is false, throw an error if name not found. If true, just
+ * return InvalidOid.
+ */
+Oid
+get_publication_oid(const char *pubname, bool missing_ok)
+{
+ Oid oid;
+
+ oid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(pubname));
+ if (!OidIsValid(oid) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("publication \"%s\" does not exist", pubname)));
+ return oid;
+}
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index ac50c2a..886d2ff 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -44,6 +44,7 @@
#include "commands/portalcmds.h"
#include "commands/prepare.h"
#include "commands/proclang.h"
+#include "commands/replicationcmds.h"
#include "commands/schemacmds.h"
#include "commands/seclabel.h"
#include "commands/sequence.h"
@@ -210,6 +211,8 @@ check_xact_readonly(Node *parsetree)
case T_CreateForeignTableStmt:
case T_ImportForeignSchemaStmt:
case T_SecLabelStmt:
+ case T_CreatePublicationStmt:
+ case T_AlterPublicationStmt:
PreventCommandIfReadOnly(CreateCommandTag(parsetree));
PreventCommandIfParallelMode(CreateCommandTag(parsetree));
break;
@@ -1544,6 +1547,19 @@ ProcessUtilitySlow(Node *parsetree,
address = CreateAccessMethod((CreateAmStmt *) parsetree);
break;
+ case T_CreatePublicationStmt:
+ address = CreatePublication((CreatePublicationStmt *) parsetree);
+ break;
+
+ case T_AlterPublicationStmt:
+ AlterPublication((AlterPublicationStmt *) parsetree);
+ /*
+ * AlterPublication calls EventTriggerCollectSimpleCommand
+ * directly
+ */
+ commandCollected = true;
+ break;
+
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(parsetree));
@@ -1902,6 +1918,9 @@ AlterObjectTypeCommandTag(ObjectType objtype)
case OBJECT_MATVIEW:
tag = "ALTER MATERIALIZED VIEW";
break;
+ case OBJECT_PUBLICATION:
+ tag = "PUBLICATION";
+ break;
default:
tag = "???";
break;
@@ -2187,6 +2206,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_ACCESS_METHOD:
tag = "DROP ACCESS METHOD";
break;
+ case OBJECT_PUBLICATION:
+ tag = "DROP PUBLICATION";
+ break;
default:
tag = "???";
}
@@ -2557,6 +2579,14 @@ CreateCommandTag(Node *parsetree)
tag = "CREATE ACCESS METHOD";
break;
+ case T_CreatePublicationStmt:
+ tag = "CREATE PUBLICATION";
+ break;
+
+ case T_AlterPublicationStmt:
+ tag = "ALTER PUBLICATION";
+ break;
+
case T_PrepareStmt:
tag = "PREPARE";
break;
@@ -3122,6 +3152,14 @@ GetCommandLogLevel(Node *parsetree)
lev = LOGSTMT_DDL;
break;
+ case T_CreatePublicationStmt:
+ lev = LOGSTMT_DDL;
+ break;
+
+ case T_AlterPublicationStmt:
+ lev = LOGSTMT_DDL;
+ break;
+
/* already-planned queries */
case T_PlannedStmt:
{
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 8d2ad01..3f7027f 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2052,6 +2052,8 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
MemoryContextDelete(relation->rd_rsdesc->rscxt);
if (relation->rd_fdwroutine)
pfree(relation->rd_fdwroutine);
+ if (relation->rd_publications)
+ list_free(relation->rd_publications);
pfree(relation);
}
@@ -5053,6 +5055,13 @@ load_relcache_init_file(bool shared)
rel->rd_fdwroutine = NULL;
/*
+ * Publications are not needed by most backends so we load them on
+ * demand.
+ */
+ rel->rd_publicationsvalid = false;
+ rel->rd_publications = NIL;
+
+ /*
* Reset transient-state fields in the relcache entry
*/
rel->rd_smgr = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 65ffe84..d575b51 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -50,6 +50,8 @@
#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_range.h"
+#include "catalog/pg_publication.h"
+#include "catalog/pg_publication_rel.h"
#include "catalog/pg_rewrite.h"
#include "catalog/pg_seclabel.h"
#include "catalog/pg_shdepend.h"
@@ -645,6 +647,50 @@ static const struct cachedesc cacheinfo[] = {
},
16
},
+ {PublicationRelationId, /* PUBLICATIONOID */
+ PublicationObjectIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0
+ },
+ 8
+ },
+ {PublicationRelationId, /* PUBLICATIONNAME */
+ PublicationNameIndexId,
+ 1,
+ {
+ Anum_pg_publication_pubname,
+ 0,
+ 0,
+ 0
+ },
+ 8
+ },
+ {PublicationRelRelationId, /* PUBLICATIONREL */
+ PublicationRelObjectIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0
+ },
+ 64
+ },
+ {PublicationRelRelationId, /* PUBLICATIONRELMAP */
+ PublicationRelMapIndexId,
+ 2,
+ {
+ Anum_pg_publication_rel_relid,
+ Anum_pg_publication_rel_pubid,
+ 0,
+ 0
+ },
+ 64
+ },
{RewriteRelationId, /* RULERELNAME */
RewriteRelRulenameIndexId,
2,
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 3f2cebf..a379c19 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -477,17 +477,30 @@ exec_command(const char *cmd,
success = listTables(&cmd[1], pattern, show_verbose, show_system);
break;
case 'r':
- if (cmd[2] == 'd' && cmd[3] == 's')
+ switch (cmd[2])
{
- char *pattern2 = NULL;
+ case 'd':
+ {
+ if (cmd[3] == 's')
+ {
+ char *pattern2 = NULL;
- if (pattern)
- pattern2 = psql_scan_slash_option(scan_state,
+ if (pattern)
+ pattern2 = psql_scan_slash_option(scan_state,
OT_NORMAL, NULL, true);
- success = listDbRoleSettings(pattern, pattern2);
+ success = listDbRoleSettings(pattern, pattern2);
+ }
+ else
+ success = PSQL_CMD_UNKNOWN;
+ break;
+ }
+ case 'p':
+ success = describePublications(pattern, show_verbose);
+ break;
+ default:
+ status = PSQL_CMD_UNKNOWN;
+ break;
}
- else
- success = PSQL_CMD_UNKNOWN;
break;
case 'u':
success = describeRoles(pattern, show_verbose, show_system);
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 27be102..573d980 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2290,6 +2290,38 @@ describeOneTableDetails(const char *schemaname,
}
PQclear(result);
}
+
+ /* print any publications */
+ /* TODO: bump version */
+ if (pset.sversion >= 90600)
+ {
+ printfPQExpBuffer(&buf,
+ "SELECT pub.pubname\n"
+ "FROM pg_catalog.pg_publication pub,\n"
+ " pg_publication_rel pr\n"
+ "WHERE pr.relid = '%s' AND pr.pubid = pub.oid\n"
+ "ORDER BY 1;",
+ oid);
+
+ result = PSQLexec(buf.data);
+ if (!result)
+ goto error_return;
+ else
+ tuples = PQntuples(result);
+
+ if (tuples > 0)
+ printTableAddFooter(&cont, _("Publications:"));
+
+ /* Might be an empty set - that's ok */
+ for (i = 0; i < tuples; i++)
+ {
+ printfPQExpBuffer(&buf, " \"%s\"",
+ PQgetvalue(result, i, 0));
+
+ printTableAddFooter(&cont, buf.data);
+ }
+ PQclear(result);
+ }
}
if (view_def)
@@ -4688,6 +4720,70 @@ listOneExtensionContents(const char *extname, const char *oid)
return true;
}
+/* \drp
+ * Describes publications.
+ *
+ * Takes an optional regexp to select particular publications
+ */
+bool
+describePublications(const char *pattern, bool verbose)
+{
+ PQExpBufferData buf;
+ PGresult *res;
+ printQueryOpt myopt = pset.popt;
+ static const bool translate_columns[] = {false, true, false, false, false};
+
+ /* TODO bump */
+ if (pset.sversion < 90600)
+ {
+ psql_error("The server (version %d.%d) does not support publications.\n",
+ pset.sversion / 10000, (pset.sversion / 100) % 100);
+ return true;
+ }
+
+ initPQExpBuffer(&buf);
+
+ printfPQExpBuffer(&buf,
+ "SELECT pubname AS \"%s\",\n"
+ " pubreplins AS \"%s\",\n"
+ " pubreplupd AS \"%s\",\n"
+ " pubrepldel AS \"%s\"\n",
+ gettext_noop("Name"),
+ gettext_noop("Inserts"),
+ gettext_noop("Updates"),
+ gettext_noop("Deletes"));
+
+ /* TODO Show owner and ACL */
+ if (verbose)
+ {
+ }
+
+ appendPQExpBufferStr(&buf,
+ "\nFROM pg_catalog.pg_publication\n");
+
+ processSQLNamePattern(pset.db, &buf, pattern, false, false,
+ NULL, "pubname", NULL,
+ NULL);
+
+ appendPQExpBufferStr(&buf, "ORDER BY 1;");
+
+ res = PSQLexec(buf.data);
+ termPQExpBuffer(&buf);
+ if (!res)
+ return false;
+
+ myopt.nullPrint = NULL;
+ myopt.title = _("List of publications");
+ myopt.translate_header = true;
+ myopt.translate_columns = translate_columns;
+ myopt.n_translate_columns = lengthof(translate_columns);
+
+ printQuery(res, &myopt, pset.queryFout, false, pset.logfile);
+
+ PQclear(res);
+ return true;
+}
+
/*
* printACLColumn
*
diff --git a/src/bin/psql/describe.h b/src/bin/psql/describe.h
index 20a6508..c4457da 100644
--- a/src/bin/psql/describe.h
+++ b/src/bin/psql/describe.h
@@ -102,4 +102,7 @@ extern bool listExtensionContents(const char *pattern);
/* \dy */
extern bool listEventTriggers(const char *pattern, bool verbose);
+/* \drp */
+bool describePublications(const char *pattern, bool verbose);
+
#endif /* DESCRIBE_H */
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index efc8454..5cc5bce 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -241,6 +241,7 @@ slashUsage(unsigned short int pager)
fprintf(output, _(" \\dO[S+] [PATTERN] list collations\n"));
fprintf(output, _(" \\dp [PATTERN] list table, view, and sequence access privileges\n"));
fprintf(output, _(" \\drds [PATRN1 [PATRN2]] list per-database role settings\n"));
+ fprintf(output, _(" \\drp[+] [PATTERN] list replication publications\n"));
fprintf(output, _(" \\ds[S+] [PATTERN] list sequences\n"));
fprintf(output, _(" \\dt[S+] [PATTERN] list tables\n"));
fprintf(output, _(" \\dT[S+] [PATTERN] list data types\n"));
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 09b36c5..01fd16a 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -161,6 +161,8 @@ typedef enum ObjectClass
OCLASS_EXTENSION, /* pg_extension */
OCLASS_EVENT_TRIGGER, /* pg_event_trigger */
OCLASS_POLICY, /* pg_policy */
+ OCLASS_PUBLICATION, /* pg_publication */
+ OCLASS_PUBLICATION_REL, /* pg_publication_rel */
OCLASS_TRANSFORM /* pg_transform */
} ObjectClass;
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ca5eb3d..ae46ed6 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -319,6 +319,18 @@ DECLARE_UNIQUE_INDEX(pg_replication_origin_roiident_index, 6001, on pg_replicati
DECLARE_UNIQUE_INDEX(pg_replication_origin_roname_index, 6002, on pg_replication_origin using btree(roname text_pattern_ops));
#define ReplicationOriginNameIndex 6002
+DECLARE_UNIQUE_INDEX(pg_publication_oid_index, 6110, on pg_publication using btree(oid oid_ops));
+#define PublicationObjectIndexId 6110
+
+DECLARE_UNIQUE_INDEX(pg_publication_pubname_index, 6111, on pg_publication using btree(pubname name_ops));
+#define PublicationNameIndexId 6111
+
+DECLARE_UNIQUE_INDEX(pg_publication_rel_object_index, 6112, on pg_publication_rel using btree(oid oid_ops));
+#define PublicationRelObjectIndexId 6112
+
+DECLARE_UNIQUE_INDEX(pg_publication_rel_map_index, 6113, on pg_publication_rel using btree(relid oid_ops, pubid oid_ops));
+#define PublicationRelMapIndexId 6113
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_publication.h b/src/include/catalog/pg_publication.h
new file mode 100644
index 0000000..6b5263a
--- /dev/null
+++ b/src/include/catalog/pg_publication.h
@@ -0,0 +1,64 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_publication.h
+ * definition of the relation sets relation (pg_publication)
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_publication.h
+ *
+ * NOTES
+ * the genbki.pl script reads this file and generates .bki
+ * information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_PUBLICATION_H
+#define PG_PUBLICATION_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_publication definition. cpp turns this into
+ * typedef struct FormData_pg_publication
+ *
+ * ----------------
+ */
+#define PublicationRelationId 6104
+#define PublicationRelation_Rowtype_Id 6105
+
+CATALOG(pg_publication,6104) BKI_ROWTYPE_OID(6105)
+{
+ NameData pubname; /* name of the publication */
+
+ /* true if inserts are replicated */
+ bool pubreplins;
+
+ /* true if inserts are replicated */
+ bool pubreplupd;
+
+ /* true if inserts are replicated */
+ bool pubrepldel;
+
+} FormData_pg_publication;
+
+/* ----------------
+ * Form_pg_publication corresponds to a pointer to a tuple with
+ * the format of pg_publication relation.
+ * ----------------
+ */
+typedef FormData_pg_publication *Form_pg_publication;
+
+/* ----------------
+ * compiler constants for pg_publication
+ * ----------------
+ */
+
+#define Natts_pg_publication 4
+#define Anum_pg_publication_pubname 1
+#define Anum_pg_publication_pubreplins 2
+#define Anum_pg_publication_pubreplupd 3
+#define Anum_pg_publication_pubrepldel 4
+
+#endif /* PG_PUBLICATION_H */
diff --git a/src/include/catalog/pg_publication_rel.h b/src/include/catalog/pg_publication_rel.h
new file mode 100644
index 0000000..976c1a9
--- /dev/null
+++ b/src/include/catalog/pg_publication_rel.h
@@ -0,0 +1,53 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_publication_rel.h
+ * definition of the publication to relation map (pg_publication_rel)
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_publication_rel.h
+ *
+ * NOTES
+ * the genbki.pl script reads this file and generates .bki
+ * information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_PUBLICATION_REL_H
+#define PG_PUBLICATION_REL_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_publication_rel definition. cpp turns this into
+ * typedef struct FormData_pg_publication_rel
+ *
+ * ----------------
+ */
+#define PublicationRelRelationId 6106
+#define PublicationRelRelation_Rowtype_Id 6107
+
+CATALOG(pg_publication_rel,6106) BKI_ROWTYPE_OID(6107)
+{
+ Oid pubid; /* Oid of the publication */
+ Oid relid; /* Oid of the relation */
+} FormData_pg_publication_rel;
+
+/* ----------------
+ * Form_pg_publication_rel corresponds to a pointer to a tuple with
+ * the format of pg_publication_rel relation.
+ * ----------------
+ */
+typedef FormData_pg_publication_rel *Form_pg_publication_rel;
+
+/* ----------------
+ * compiler constants for pg_publication_rel
+ * ----------------
+ */
+
+#define Natts_pg_publication_rel 2
+#define Anum_pg_publication_rel_pubid 1
+#define Anum_pg_publication_rel_relid 2
+
+#endif /* PG_PUBLICATION_REL_H */
diff --git a/src/include/commands/replicationcmds.h b/src/include/commands/replicationcmds.h
new file mode 100644
index 0000000..717485f
--- /dev/null
+++ b/src/include/commands/replicationcmds.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * replicationcmds.h
+ * prototypes for publicationcmds.c.
+ *
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/replicationcmds.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef REPLICATIONCMDS_H
+#define REPLICATIONCMDS_H
+
+#include "catalog/objectaddress.h"
+#include "nodes/parsenodes.h"
+
+extern ObjectAddress CreatePublication(CreatePublicationStmt *stmt);
+extern void AlterPublication(AlterPublicationStmt *stmt);
+extern void DropPublicationById(Oid pubid);
+extern void RemovePublicationRelById(Oid prid);
+
+#endif /* REPLICATIONCMDS_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 6b850e4..3cce3d9 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -405,6 +405,8 @@ typedef enum NodeTag
T_AlterPolicyStmt,
T_CreateTransformStmt,
T_CreateAmStmt,
+ T_CreatePublicationStmt,
+ T_AlterPublicationStmt,
/*
* TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 1481fff..c10e6e7 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1408,6 +1408,8 @@ typedef enum ObjectType
OBJECT_OPERATOR,
OBJECT_OPFAMILY,
OBJECT_POLICY,
+ OBJECT_PUBLICATION,
+ OBJECT_PUBLICATION_REL,
OBJECT_ROLE,
OBJECT_RULE,
OBJECT_SCHEMA,
@@ -3101,4 +3103,26 @@ typedef struct AlterTSConfigurationStmt
bool missing_ok; /* for DROP - skip error if missing? */
} AlterTSConfigurationStmt;
+
+typedef struct CreatePublicationStmt
+{
+ NodeTag type;
+ char *pubname; /* Name of of the publication */
+ List *options; /* List of DefElem nodes */
+} CreatePublicationStmt;
+
+typedef struct AlterPublicationStmt
+{
+ NodeTag type;
+ char *pubname; /* Name of of the publication */
+
+ /* parameters used for ALTER PUBLICATION ... WITH */
+ List *options; /* List of DefElem nodes */
+
+ /* parameters used for ALTER PUBLICATION ... ADD/DROP TABLE */
+ bool isDrop; /* Are tables to be added or dropped? */
+ char *schema; /* ALL IN SCHEMA ... */
+ List *tables; /* List of tables to add/drop */
+} AlterPublicationStmt;
+
#endif /* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 17ffef5..9430ff0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -304,6 +304,7 @@ PG_KEYWORD("privileges", PRIVILEGES, UNRESERVED_KEYWORD)
PG_KEYWORD("procedural", PROCEDURAL, UNRESERVED_KEYWORD)
PG_KEYWORD("procedure", PROCEDURE, UNRESERVED_KEYWORD)
PG_KEYWORD("program", PROGRAM, UNRESERVED_KEYWORD)
+PG_KEYWORD("publication", PUBLICATION, UNRESERVED_KEYWORD)
PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD)
PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD)
PG_KEYWORD("read", READ, UNRESERVED_KEYWORD)
diff --git a/src/include/replication/publication.h b/src/include/replication/publication.h
new file mode 100644
index 0000000..08245ee
--- /dev/null
+++ b/src/include/replication/publication.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * publication.h
+ * publication support structures and interfaces
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * publication.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PUBLICATION_H
+#define PUBLICATION_H
+
+#include "postgres.h"
+
+
+typedef enum PublicationChangeType
+{
+ PublicationChangeInsert,
+ PublicationChangeUpdate,
+ PublicationChangeDelete
+} PublicationChangeType;
+
+typedef struct Publication
+{
+ Oid oid;
+ char *name;
+ bool replicate_insert;
+ bool replicate_update;
+ bool replicate_delete;
+} Publication;
+
+extern Publication *GetPublication(Oid pubid);
+extern Publication *GetPublicationByName(const char *pubname, bool missing_ok);
+extern List *GetRelationPublications(Relation rel);
+
+extern bool publication_change_is_replicated(Relation rel,
+ PublicationChangeType change_type,
+ List *inpublications);
+extern Oid publication_add_relation(Oid pubid, Relation targetrel,
+ bool if_not_exists);
+
+extern Oid get_publication_oid(const char *pubname, bool missing_ok);
+
+#endif /* PUBLICATION_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index ed14442..ca2283a 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -159,6 +159,10 @@ typedef struct RelationData
/* use "struct" here to avoid needing to include fdwapi.h: */
struct FdwRoutine *rd_fdwroutine; /* cached function pointers, or NULL */
+ /* Publication support */
+ bool rd_publicationsvalid; /* state of the rd_publications */
+ List *rd_publications;
+
/*
* Hack for CLUSTER, rewriting ALTER TABLE, etc: when writing a new
* version of a table, we need to make any toast pointers inserted into it
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 256615b..632fcbc 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,10 @@ enum SysCacheIdentifier
RELOID,
REPLORIGIDENT,
REPLORIGNAME,
+ PUBLICATIONOID,
+ PUBLICATIONNAME,
+ PUBLICATIONREL,
+ PUBLICATIONRELMAP,
RULERELNAME,
STATRELATTINH,
TABLESPACEOID,
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
new file mode 100644
index 0000000..917408e
--- /dev/null
+++ b/src/test/regress/expected/publication.out
@@ -0,0 +1,76 @@
+--
+-- PUBLICATION
+--
+CREATE PUBLICATION testpub_default;
+CREATE PUBLICATION testpib_ins_trunct WITH noreplicate_delete noreplicate_update;
+ALTER PUBLICATION testpub_default WITH noreplicate_insert noreplicate_delete;
+\drp
+ List of publications
+ Name | Inserts | Updates | Deletes
+--------------------+---------+---------+---------
+ testpib_ins_trunct | t | f | f
+ testpub_default | f | t | f
+(2 rows)
+
+ALTER PUBLICATION testpub_default WITH replicate_insert replicate_delete;
+\drp
+ List of publications
+ Name | Inserts | Updates | Deletes
+--------------------+---------+---------+---------
+ testpib_ins_trunct | t | f | f
+ testpub_default | t | t | t
+(2 rows)
+
+--- adding tables
+CREATE TABLE testpub_tbl1 (id serial primary key, data text);
+CREATE TABLE testpub_nopk (foo int, bar int);
+ALTER PUBLICATION testpub_default ADD TABLE testpub_tbl1;
+ALTER PUBLICATION testpub_default ADD TABLE testpub_nopk;
+ERROR: table testpub_nopk cannot be added to publication testpub_default
+DETAIL: table does not have REPLICA IDENTITY index and given publication is configured to replicate UPDATEs and/or DELETEs
+HINT: Add a PRIMARY KEY to the table
+ALTER PUBLICATION testpib_ins_trunct ADD TABLE testpub_nopk, testpub_tbl1;
+\d+ testpub_nopk
+ Table "public.testpub_nopk"
+ Column | Type | Modifiers | Storage | Stats target | Description
+--------+---------+-----------+---------+--------------+-------------
+ foo | integer | | plain | |
+ bar | integer | | plain | |
+Publications:
+ "testpib_ins_trunct"
+
+\d+ testpub_tbl1
+ Table "public.testpub_tbl1"
+ Column | Type | Modifiers | Storage | Stats target | Description
+--------+---------+-----------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('testpub_tbl1_id_seq'::regclass) | plain | |
+ data | text | | extended | |
+Indexes:
+ "testpub_tbl1_pkey" PRIMARY KEY, btree (id)
+Publications:
+ "testpib_ins_trunct"
+ "testpub_default"
+
+ALTER PUBLICATION testpub_default DROP TABLE testpub_tbl1, testpub_nopk;
+ERROR: relation "testpub_nopk" is not part of the publication
+ALTER PUBLICATION testpub_default DROP TABLE testpub_tbl1;
+\d+ testpub_tbl1
+ Table "public.testpub_tbl1"
+ Column | Type | Modifiers | Storage | Stats target | Description
+--------+---------+-----------------------------------------------------------+----------+--------------+-------------
+ id | integer | not null default nextval('testpub_tbl1_id_seq'::regclass) | plain | |
+ data | text | | extended | |
+Indexes:
+ "testpub_tbl1_pkey" PRIMARY KEY, btree (id)
+Publications:
+ "testpib_ins_trunct"
+
+DROP TABLE testpub_tbl1;
+ERROR: cannot drop table testpub_tbl1 because other objects depend on it
+DETAIL: publication table testpub_tbl1 in publication testpib_ins_trunct depends on table testpub_tbl1
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+DROP TABLE testpub_tbl1 CASCADE;
+NOTICE: drop cascades to publication table testpub_tbl1 in publication testpib_ins_trunct
+DROP PUBLICATION testpub_default;
+DROP PUBLICATION testpib_ins_trunct;
+DROP TABLE testpub_nopk;
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index 1c087a3..5ab04ae 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -121,6 +121,8 @@ pg_opfamily|t
pg_pltemplate|t
pg_policy|t
pg_proc|t
+pg_publication|t
+pg_publication_rel|t
pg_range|t
pg_replication_origin|t
pg_rewrite|t
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4ebad04..871a93d 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -84,7 +84,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges init_privs security_label collate matview lock replica_identity rowsecurity object_address tablesample groupingsets drop_operator
+test: brin gin gist spgist privileges init_privs security_label collate matview lock replica_identity rowsecurity object_address tablesample groupingsets drop_operator publication
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/sql/publication.sql b/src/test/regress/sql/publication.sql
new file mode 100644
index 0000000..9f85081
--- /dev/null
+++ b/src/test/regress/sql/publication.sql
@@ -0,0 +1,40 @@
+--
+-- PUBLICATION
+--
+
+CREATE PUBLICATION testpub_default;
+
+CREATE PUBLICATION testpib_ins_trunct WITH noreplicate_delete noreplicate_update;
+
+ALTER PUBLICATION testpub_default WITH noreplicate_insert noreplicate_delete;
+
+\drp
+
+ALTER PUBLICATION testpub_default WITH replicate_insert replicate_delete;
+
+\drp
+
+--- adding tables
+CREATE TABLE testpub_tbl1 (id serial primary key, data text);
+CREATE TABLE testpub_nopk (foo int, bar int);
+
+ALTER PUBLICATION testpub_default ADD TABLE testpub_tbl1;
+ALTER PUBLICATION testpub_default ADD TABLE testpub_nopk;
+
+ALTER PUBLICATION testpib_ins_trunct ADD TABLE testpub_nopk, testpub_tbl1;
+
+\d+ testpub_nopk
+\d+ testpub_tbl1
+
+ALTER PUBLICATION testpub_default DROP TABLE testpub_tbl1, testpub_nopk;
+ALTER PUBLICATION testpub_default DROP TABLE testpub_tbl1;
+
+\d+ testpub_tbl1
+
+DROP TABLE testpub_tbl1;
+DROP TABLE testpub_tbl1 CASCADE;
+
+DROP PUBLICATION testpub_default;
+DROP PUBLICATION testpib_ins_trunct;
+
+DROP TABLE testpub_nopk;
--
2.7.4
0002-Add-SUBSCRIPTION-catalog-and-DDL.patchapplication/x-patch; name=0002-Add-SUBSCRIPTION-catalog-and-DDL.patchDownload
From ef8ed0bed60f2cf5123a2cdc1335d0d02891d82c Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 13 Jul 2016 18:12:05 +0200
Subject: [PATCH 2/6] Add SUBSCRIPTION catalog and DDL
---
doc/src/sgml/catalogs.sgml | 90 +++++++
doc/src/sgml/ref/allfiles.sgml | 3 +
doc/src/sgml/ref/alter_subscription.sgml | 135 ++++++++++
doc/src/sgml/ref/create_subscription.sgml | 159 ++++++++++++
doc/src/sgml/ref/drop_subscription.sgml | 101 ++++++++
src/backend/catalog/Makefile | 2 +-
src/backend/catalog/catalog.c | 8 +-
src/backend/catalog/dependency.c | 9 +
src/backend/catalog/objectaddress.c | 59 +++++
src/backend/commands/Makefile | 5 +-
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/subscriptioncmds.c | 331 +++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 28 +++
src/backend/nodes/equalfuncs.c | 26 ++
src/backend/parser/gram.y | 127 +++++++++-
src/backend/replication/logical/Makefile | 2 +-
src/backend/replication/logical/subscription.c | 146 +++++++++++
src/backend/tcop/utility.c | 32 +++
src/backend/utils/cache/relcache.c | 6 +-
src/backend/utils/cache/syscache.c | 23 ++
src/bin/psql/command.c | 3 +
src/bin/psql/describe.c | 66 +++++
src/bin/psql/describe.h | 3 +
src/bin/psql/help.c | 1 +
src/include/catalog/dependency.h | 1 +
src/include/catalog/indexing.h | 6 +
src/include/catalog/pg_subscription.h | 52 ++++
src/include/commands/replicationcmds.h | 6 +-
src/include/nodes/nodes.h | 2 +
src/include/nodes/parsenodes.h | 15 ++
src/include/parser/kwlist.h | 1 +
src/include/replication/subscription.h | 33 +++
src/include/utils/rel.h | 1 +
src/include/utils/syscache.h | 2 +
src/test/regress/expected/sanity_check.out | 1 +
35 files changed, 1478 insertions(+), 10 deletions(-)
create mode 100644 doc/src/sgml/ref/alter_subscription.sgml
create mode 100644 doc/src/sgml/ref/create_subscription.sgml
create mode 100644 doc/src/sgml/ref/drop_subscription.sgml
create mode 100644 src/backend/commands/subscriptioncmds.c
create mode 100644 src/backend/replication/logical/subscription.c
create mode 100644 src/include/catalog/pg_subscription.h
create mode 100644 src/include/replication/subscription.h
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6d505ae..84211c1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -286,6 +286,11 @@
</row>
<row>
+ <entry><link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link></entry>
+ <entry>logical replication subscriptions</entry>
+ </row>
+
+ <row>
<entry><link linkend="catalog-pg-tablespace"><structname>pg_tablespace</structname></link></entry>
<entry>tablespaces within this database cluster</entry>
</row>
@@ -6037,6 +6042,91 @@
</sect1>
+ <sect1 id="catalog-pg-subscription">
+ <title><structname>pg_subscription</structname></title>
+
+ <indexterm zone="catalog-pg-subscription">
+ <primary>pg_subscription</primary>
+ </indexterm>
+
+ <para>
+ The <structname>pg_subscription</structname> catalog contains
+ all existing logical replication subscriptions.
+ </para>
+
+ <para>
+ Unlike most system catalogs, <structname>pg_subscription</structname>
+ is shared across all databases of a cluster: there is only one
+ copy of <structname>pg_subscription</structname> per cluster, not
+ one per database.
+ </para>
+
+ <table>
+
+ <title><structname>pg_subscription</structname> Columns</title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><structfield>oid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry></entry>
+ <entry>Row identifier (hidden attribute; must be explicitly selected)</entry>
+ </row>
+
+ <row>
+ <entry><structfield>subname</structfield></entry>
+ <entry><type>Name</type></entry>
+ <entry></entry>
+ <entry>A unique, database-wide identifier for the replication
+ subscription.</entry>
+ </row>
+
+ <row>
+ <entry><structfield>subenabled</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>If true, the subscription is enabled and should be replicating.
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>subconninfo</structfield></entry>
+ <entry><type>text</type></entry>
+ <entry></entry>
+ <entry>Connection string to the upstream database.
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>subslotname</structfield></entry>
+ <entry><type>text</type></entry>
+ <entry></entry>
+ <entry>Name of the replication slot in the upstream database. Also used
+ for local replication origin name.</entry>
+ </row>
+
+ <row>
+ <entry><structfield>subpublications</structfield></entry>
+ <entry><type>text[]</type></entry>
+ <entry></entry>
+ <entry>Array of subscribed publication names. For more on publications
+ see <xref linkend="publications">.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect1>
<sect1 id="catalog-pg-tablespace">
<title><structname>pg_tablespace</structname></title>
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 371a7b7..0d09f81 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -32,6 +32,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY alterSchema SYSTEM "alter_schema.sgml">
<!ENTITY alterServer SYSTEM "alter_server.sgml">
<!ENTITY alterSequence SYSTEM "alter_sequence.sgml">
+<!ENTITY alterSubscription SYSTEM "alter_subscription.sgml">
<!ENTITY alterSystem SYSTEM "alter_system.sgml">
<!ENTITY alterTable SYSTEM "alter_table.sgml">
<!ENTITY alterTableSpace SYSTEM "alter_tablespace.sgml">
@@ -79,6 +80,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createSchema SYSTEM "create_schema.sgml">
<!ENTITY createSequence SYSTEM "create_sequence.sgml">
<!ENTITY createServer SYSTEM "create_server.sgml">
+<!ENTITY createSubscription SYSTEM "create_subscription.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
<!ENTITY createTableSpace SYSTEM "create_tablespace.sgml">
@@ -124,6 +126,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSchema SYSTEM "drop_schema.sgml">
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
+<!ENTITY dropSubscription SYSTEM "drop_subscription.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">
<!ENTITY dropTransform SYSTEM "drop_transform.sgml">
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
new file mode 100644
index 0000000..467e71f
--- /dev/null
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -0,0 +1,135 @@
+<!--
+doc/src/sgml/ref/alter_subscription.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-ALTERSUBSCRIPTION">
+ <indexterm zone="sql-altersubscription">
+ <primary>ALTER SUBSCRIPTION</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>ALTER SUBSCRIPTION</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>ALTER SUBSCRIPTION</refname>
+ <refpurpose>change the definition of a subscription</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> [ [ WITH ] <replaceable class="PARAMETER">option</replaceable> [ ... ] ]
+
+<phrase>where <replaceable class="PARAMETER">option</replaceable> can be:</phrase>
+
+ CONNECTION 'conninfo'
+ | PUBLICATION publication_name [, ...]
+
+ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> ENABLE
+ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> DISABLE
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>ALTER SUBSCRIPTION</command> can change most of the subscription
+ attributes that can be specified in
+ <xref linkend="sql-createsubscription">.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of a subscription whose attributes are to be altered.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>CONNECTION '<replaceable class="parameter">conninfo</replaceable>'</term>
+ <term>PUBLICATION <replaceable class="parameter">publication_name</replaceable></term>
+ <listitem>
+ <para>
+ These clauses alter attributes originally set by
+ <xref linkend="SQL-CREATESUBSCRIPTION">. For more information, see the
+ <command>CREATE SUBSCRIPTION</command> reference page.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>ENABLE</term>
+ <listitem>
+ <para>
+ Enables the previously disabled subscription, starting the logical
+ replication worker at the end of transaction.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>DISABLE</term>
+ <listitem>
+ <para>
+ Disables the running subscription, stopping the logical replication
+ worker at the end of transaction.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </refsect1>
+
+ <refsect1>
+ <title>Examples</title>
+
+ <para>
+ Change the publication subscribed by a subscription to
+ <literal>insert_only</literal>:
+<programlisting>
+ALTER SUBSCRIPTION mysub
+ PUBLICATION insert_only;
+</programlisting>
+ </para>
+
+ <para>
+ Disable (stop) the subscription:
+<programlisting>
+ALTER SUBSCRIPTION mysub DISABLE;
+</programlisting>
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ <command>ALTER SUBSCRIPTION</command> is a <productname>PostgreSQL</>
+ extension.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createsubscription"></member>
+ <member><xref linkend="sql-dropsubscription"></member>
+ <member><xref linkend="sql-createpublication"></member>
+ <member><xref linkend="sql-alterpublication"></member>
+ </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
new file mode 100644
index 0000000..a2cb459
--- /dev/null
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -0,0 +1,159 @@
+<!--
+doc/src/sgml/ref/create_subscription.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESUBSCRIPTION">
+ <indexterm zone="sql-altersubscription">
+ <primary>CREATE SUBSCRIPTION</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE SUBSCRIPTION</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE SUBSCRIPTION</refname>
+ <refpurpose>define new subscription</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE SUBSCRIPTION <replaceable class="PARAMETER">subscription_name</replaceable> [ [ WITH ] <replaceable class="PARAMETER">option</replaceable> [ ... ] ]
+
+<phrase>where <replaceable class="PARAMETER">option</replaceable> can be:</phrase>
+
+ CONNECTION 'conninfo'
+ | PUBLICATION publication_name [, ...]
+ | INITIALLY ( ENABLED | DISABLED )
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE SUBSCRIPTION</command> adds a new subscription for
+ a current database. The subscription name must be distinct from
+ the name of any existing subscription in the database cluster.
+ </para>
+
+ <para>
+ The subscription represents a replication connection to the provider.
+ As such this command does not only add definition in the local catalogs
+ but also creates a replication slot on the provider.
+ </para>
+
+ <para>
+ A logical replication worker will be started to replicate data for the
+ new subscription at the commit of the transaction where this command
+ was run.
+ </para>
+
+ <para>
+ Additional info about subscriptions and logical replication as a whole
+ can is available at <xref linkend="logical-replication-subscription"> and
+ <xref linkend="logical-replication">.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">subscription_name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the new subscription.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>CONNECTION '<replaceable class="parameter">conninfo</replaceable>'</term>
+ <listitem>
+ <para>
+ The connection string to the provider.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>PUBLICATION <replaceable class="parameter">publication_name</replaceable></term>
+ <listitem>
+ <para>
+ Name(s) of the publications on the provider to subscribe to.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>INITIALLY ENABLED</term>
+ <term>INITIALLY DISABLED</term>
+ <listitem>
+ <para>
+ Specifies if the subscription should be actively replicating or
+ if it should be just setup but not started yet. Note that the
+ replication slot as described above is created in either case.
+ <literal>INITIALLY ENABLED</literal> is the default.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </refsect1>
+
+ <refsect1>
+ <title>Examples</title>
+
+ <para>
+ Create a subscription to a different server which replicates tables in
+ the publications <literal>mypubclication</literal> and
+ <literal>insert_only</literal> and starts replicating immediately on
+ commit:
+<programlisting>
+CREATE SUBSCRIPTION mysub
+ CONNECTION 'host=192.168.1.50 port=5432 user=foo dbname=foodb password=foopass'
+ PUBLICATION mypublication, insert_only;
+</programlisting>
+ </para>
+
+ <para>
+ Create a subscription to a different server which replicates tables in
+ the <literal>insert_only</literal> publication and does not replicate
+ until enabled at a later time.
+<programlisting>
+CREATE SUBSCRIPTION mysub
+ CONNECTION 'host=192.168.1.50 port=5432 user=foo dbname=foodb password=foopass'
+ PUBLICATION insert_only
+ INITIALLY DISABLED;
+</programlisting>
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ <command>CREATE SUBSCRIPTION</command> is a <productname>PostgreSQL</>
+ extension.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-altersubscription"></member>
+ <member><xref linkend="sql-dropsubscription"></member>
+ <member><xref linkend="sql-createpublication"></member>
+ <member><xref linkend="sql-alterpublication"></member>
+ </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
new file mode 100644
index 0000000..38fb4b0
--- /dev/null
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -0,0 +1,101 @@
+<!--
+doc/src/sgml/ref/drop_subscription.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSUBSCRIPTION">
+ <indexterm zone="sql-dropsubscription">
+ <primary>DROP SUBSCRIPTION</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP SUBSCRIPTION</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP SUBSCRIPTION</refname>
+ <refpurpose>remove an existing subscription</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP SUBSCRIPTION</command> removes subscriptions from the
+ cluster.
+ </para>
+
+ <para>
+ This command cannot be performed inside a transaction block.
+ </para>
+
+ <para>
+ A subscription can only be dropped by its owner or a superuser.
+ </para>
+
+ <warning>
+ <para>
+ While the <command>DROP SUBSCRIPTION</command> will try to remove the
+ replication slot on the provider, it will not fail when said action
+ is not successfull. This means the replication slot may need to be
+ dropped manually to not hold back removal of WAL on provider.
+ </para>
+ </warning>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of a subscription to be dropped.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </refsect1>
+
+ <refsect1>
+ <title>Examples</title>
+
+ <para>
+ Drop a subscription:
+<programlisting>
+DROP SUBSCRIPTION mysub;
+</programlisting>
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ <command>DROP SUBSCRIPTION</command> is a <productname>PostgreSQL</>
+ extension.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createsubscription"></member>
+ <member><xref linkend="sql-altersubscription"></member>
+ </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 6bb3683..60737d4 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -42,7 +42,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_foreign_table.h pg_policy.h pg_replication_origin.h \
pg_default_acl.h pg_init_privs.h pg_seclabel.h pg_shseclabel.h \
pg_collation.h pg_range.h pg_transform.h \
- pg_publication.h pg_publication_rel.h \
+ pg_publication.h pg_publication_rel.h pg_subscription.h \
toasting.h indexing.h \
)
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 1baaa0b..ba758bb 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -36,6 +36,7 @@
#include "catalog/pg_shdepend.h"
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
+#include "catalog/pg_subscription.h"
#include "catalog/pg_tablespace.h"
#include "catalog/toasting.h"
#include "miscadmin.h"
@@ -227,7 +228,8 @@ IsSharedRelation(Oid relationId)
relationId == SharedSecLabelRelationId ||
relationId == TableSpaceRelationId ||
relationId == DbRoleSettingRelationId ||
- relationId == ReplicationOriginRelationId)
+ relationId == ReplicationOriginRelationId ||
+ relationId == SubscriptionRelationId)
return true;
/* These are their indexes (see indexing.h) */
if (relationId == AuthIdRolnameIndexId ||
@@ -245,7 +247,9 @@ IsSharedRelation(Oid relationId)
relationId == TablespaceNameIndexId ||
relationId == DbRoleSettingDatidRolidIndexId ||
relationId == ReplicationOriginIdentIndex ||
- relationId == ReplicationOriginNameIndex)
+ relationId == ReplicationOriginNameIndex ||
+ relationId == SubscriptionObjectIndexId ||
+ relationId == SubscriptionNameIndexId)
return true;
/* These are their toast tables and toast indexes (see toasting.h) */
if (relationId == PgShdescriptionToastTable ||
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 359e7bb..91564b2 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -51,6 +51,7 @@
#include "catalog/pg_publication.h"
#include "catalog/pg_publication_rel.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_subscription.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_transform.h"
#include "catalog/pg_trigger.h"
@@ -168,6 +169,7 @@ static const Oid object_classes[] = {
PolicyRelationId, /* OCLASS_POLICY */
PublicationRelationId, /* OCLASS_PUBCLICATION */
PublicationRelRelationId, /* OCLASS_PUBCLICATION_REL */
+ SubscriptionRelationId, /* OCLASS_SUBSCRIPTION */
TransformRelationId /* OCLASS_TRANSFORM */
};
@@ -1292,6 +1294,10 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePublicationRelById(object->objectId);
break;
+ case OCLASS_SUBSCRIPTION:
+ DropSubscriptionById(object->objectId);
+ break;
+
case OCLASS_TRANSFORM:
DropTransformById(object->objectId);
break;
@@ -2455,6 +2461,9 @@ getObjectClass(const ObjectAddress *object)
case PublicationRelRelationId:
return OCLASS_PUBLICATION_REL;
+ case SubscriptionRelationId:
+ return OCLASS_SUBSCRIPTION;
+
case TransformRelationId:
return OCLASS_TRANSFORM;
}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 375f4b0..f25c69b 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -48,6 +48,7 @@
#include "catalog/pg_publication.h"
#include "catalog/pg_publication_rel.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_subscription.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_transform.h"
#include "catalog/pg_trigger.h"
@@ -74,6 +75,7 @@
#include "parser/parse_oper.h"
#include "parser/parse_type.h"
#include "replication/publication.h"
+#include "replication/subscription.h"
#include "rewrite/rewriteSupport.h"
#include "storage/lmgr.h"
#include "storage/sinval.h"
@@ -465,6 +467,18 @@ static const ObjectPropertyType ObjectProperty[] =
InvalidAttrNumber,
-1,
true
+ },
+ {
+ SubscriptionRelationId,
+ SubscriptionObjectIndexId,
+ SUBSCRIPTIONOID,
+ SUBSCRIPTIONNAME,
+ Anum_pg_subscription_subname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
}
};
@@ -676,6 +690,10 @@ static const struct object_type_map
{
"publication relation", OBJECT_PUBLICATION_REL
},
+ /* OCLASS_SUBSCRIPTION */
+ {
+ "subscription", OBJECT_SUBSCRIPTION
+ },
/* OCLASS_TRANSFORM */
{
"transform", OBJECT_TRANSFORM
@@ -839,6 +857,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_EVENT_TRIGGER:
case OBJECT_ACCESS_METHOD:
case OBJECT_PUBLICATION:
+ case OBJECT_SUBSCRIPTION:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -1125,6 +1144,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_PUBLICATION:
msg = gettext_noop("publication name cannot be qualified");
break;
+ case OBJECT_SUBSCRIPTION:
+ msg = gettext_noop("subscription name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -1195,6 +1217,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_publication_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_SUBSCRIPTION:
+ address.classId = SubscriptionRelationId;
+ address.objectId = get_subscription_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -2315,6 +2342,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
case OBJECT_TSTEMPLATE:
case OBJECT_ACCESS_METHOD:
case OBJECT_PUBLICATION:
+ case OBJECT_SUBSCRIPTION:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -3316,6 +3344,21 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_SUBSCRIPTION:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(SUBSCRIPTIONOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for subscription %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("subscription %s"),
+ NameStr(((Form_pg_subscription) GETSTRUCT(tup))->subname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3809,6 +3852,10 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "publication table");
break;
+ case OCLASS_SUBSCRIPTION:
+ appendStringInfoString(&buffer, "subscription");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4812,6 +4859,18 @@ getObjectIdentityParts(const ObjectAddress *object,
break;
}
+ case OCLASS_SUBSCRIPTION:
+ {
+ Subscription *sub;
+
+ sub = GetSubscription(object->objectId);
+ appendStringInfoString(&buffer,
+ quote_identifier(sub->name));
+ if (objname)
+ *objname = list_make1(pstrdup(sub->name));
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index cb580377..e0fab38 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,7 +18,8 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
- schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o variable.o view.o
+ schemacmds.o seclabel.o sequence.o subscriptioncmds.o tablecmds.o \
+ tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
+ vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 8aaf1a7..2a7951d 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -123,6 +123,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"USER MAPPING", true},
{"VIEW", true},
{"PUBLICATION", true},
+ {"SUBSCRIPTION", true},
{NULL, false}
};
@@ -1122,6 +1123,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_VIEW:
case OBJECT_PUBLICATION:
case OBJECT_PUBLICATION_REL:
+ case OBJECT_SUBSCRIPTION:
return true;
}
return true;
@@ -1175,6 +1177,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_AM:
case OCLASS_PUBLICATION:
case OCLASS_PUBLICATION_REL:
+ case OCLASS_SUBSCRIPTION:
return true;
}
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
new file mode 100644
index 0000000..54d66d5
--- /dev/null
+++ b/src/backend/commands/subscriptioncmds.c
@@ -0,0 +1,331 @@
+/*-------------------------------------------------------------------------
+ *
+ * subscriptioncmds.c
+ * subscription catalog manipulation functions
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * subscriptioncmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "miscadmin.h"
+
+#include "access/genam.h"
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/objectaddress.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_subscription.h"
+
+#include "commands/defrem.h"
+
+#include "executor/spi.h"
+
+#include "nodes/makefuncs.h"
+
+#include "replication/reorderbuffer.h"
+#include "commands/replicationcmds.h"
+
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/catcache.h"
+#include "utils/fmgroids.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+static void
+check_replication_permissions(void)
+{
+ if (!superuser() && !has_rolreplication(GetUserId()))
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser or replication role to manipulate subscriptions"))));
+}
+
+static void
+parse_subscription_options(List *options,
+ bool *enabled_given, bool *enabled,
+ char **conninfo, List **publications)
+{
+ ListCell *lc;
+
+ *enabled_given = false;
+ *enabled = true;
+ *conninfo = NULL;
+ *publications = NIL;
+
+ /* Parse options */
+ foreach (lc, options)
+ {
+ DefElem *defel = (DefElem *) lfirst(lc);
+
+ if (strcmp(defel->defname, "enabled") == 0)
+ {
+ if (*enabled_given)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+
+ *enabled_given = true;
+ *enabled = defGetBoolean(defel);
+ }
+ else if (strcmp(defel->defname, "conninfo") == 0)
+ {
+ if (*conninfo)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+
+ *conninfo = defGetString(defel);
+ }
+ else if (strcmp(defel->defname, "publication") == 0)
+ {
+ if (*publications)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+
+ if (defel->arg == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("%s requires a parameter", defel->defname)));
+
+ *publications = (List *) defel->arg;
+ }
+ else
+ elog(ERROR, "unrecognized option: %s", defel->defname);
+ }
+}
+
+/*
+ * Auxiliary function to return a TEXT array out of a list of String nodes.
+ */
+static Datum
+publicationListToArray(List *list)
+{
+ ArrayType *arr;
+ Datum *datums;
+ int j = 0;
+ ListCell *cell;
+ MemoryContext memcxt;
+ MemoryContext oldcxt;
+
+ memcxt = AllocSetContextCreate(CurrentMemoryContext,
+ "publicationListToArray to array",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ oldcxt = MemoryContextSwitchTo(memcxt);
+
+ datums = palloc(sizeof(text *) * list_length(list));
+ foreach(cell, list)
+ {
+ char *name = strVal(lfirst(cell));
+
+ datums[j++] = CStringGetTextDatum(name);
+ }
+
+ MemoryContextSwitchTo(oldcxt);
+
+ arr = construct_array(datums, list_length(list),
+ TEXTOID, -1, false, 'i');
+ MemoryContextDelete(memcxt);
+
+ return PointerGetDatum(arr);
+}
+
+/*
+ * Create new subscription.
+ */
+ObjectAddress
+CreateSubscription(CreateSubscriptionStmt *stmt)
+{
+ Relation rel;
+ ObjectAddress myself;
+ Oid subid;
+ bool nulls[Natts_pg_subscription];
+ Datum values[Natts_pg_subscription];
+ HeapTuple tup;
+ bool enabled_given;
+ bool enabled;
+ char *conninfo;
+ List *publications;
+
+ check_replication_permissions();
+
+ rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
+
+ /* Check if name is used */
+ subid = GetSysCacheOid1(SUBSCRIPTIONNAME,
+ CStringGetDatum(stmt->subname));
+ if (OidIsValid(subid))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("subscription \"%s\" already exists",
+ stmt->subname)));
+ }
+
+ /* Parse and check options. */
+ parse_subscription_options(stmt->options, &enabled_given, &enabled,
+ &conninfo, &publications);
+
+ /* TODO: improve error messages here. */
+ if (conninfo == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("connection not specified")));
+
+ if (list_length(publications) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("publication not specified")));
+
+ /* Everything ok, form a new tuple. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_subscription_dbid - 1] = ObjectIdGetDatum(MyDatabaseId);
+ values[Anum_pg_subscription_subname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(stmt->subname));
+ values[Anum_pg_subscription_subenabled - 1] = BoolGetDatum(enabled);
+ values[Anum_pg_subscription_subconninfo - 1] =
+ CStringGetTextDatum(conninfo);
+ values[Anum_pg_subscription_subslotname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(stmt->subname));
+ values[Anum_pg_subscription_subpublications - 1] =
+ publicationListToArray(publications);
+
+ tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+
+ /* Insert tuple into catalog. */
+ subid = simple_heap_insert(rel, tup);
+ CatalogUpdateIndexes(rel, tup);
+ heap_freetuple(tup);
+
+ ObjectAddressSet(myself, SubscriptionRelationId, suboid);
+
+ heap_close(rel, RowExclusiveLock);
+
+ /* Make the changes visible. */
+ CommandCounterIncrement();
+
+ return myself;
+}
+
+/*
+ * Alter the existing subscription.
+ */
+ObjectAddress
+AlterSubscription(AlterSubscriptionStmt *stmt)
+{
+ Relation rel;
+ ObjectAddress myself;
+ bool nulls[Natts_pg_subscription];
+ bool replaces[Natts_pg_subscription];
+ Datum values[Natts_pg_subscription];
+ HeapTuple tup;
+ Oid subid;
+ bool enabled_given;
+ bool enabled;
+ char *conninfo;
+ List *publications;
+
+ check_replication_permissions();
+
+ rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
+
+ /* Fetch the existing tuple. */
+ tup = SearchSysCacheCopy1(SUBSCRIPTIONNAME,
+ CStringGetDatum(stmt->subname));
+
+ if (!HeapTupleIsValid(tup))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("subscription \"%s\" does not exist",
+ stmt->subname)));
+
+ subid = HeapTupleGetOid(tup);
+
+ /* Parse options. */
+ parse_subscription_options(stmt->options, &enabled_given, &enabled,
+ &conninfo, &publications);
+
+ /* Form a new tuple. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+ memset(replaces, false, sizeof(replaces));
+
+ if (enabled_given)
+ {
+ values[Anum_pg_subscription_subenabled - 1] = BoolGetDatum(enabled);
+ replaces[Anum_pg_subscription_subenabled - 1] = true;
+ }
+ if (conninfo)
+ {
+ values[Anum_pg_subscription_subconninfo - 1] =
+ CStringGetTextDatum(conninfo);
+ replaces[Anum_pg_subscription_subenabled - 1] = true;
+ }
+ if (publications != NIL)
+ {
+ values[Anum_pg_subscription_subpublications - 1] =
+ publicationListToArray(publications);
+ replaces[Anum_pg_subscription_subpublications - 1] = true;
+ }
+
+ tup = heap_modify_tuple(tup, RelationGetDescr(rel), values, nulls,
+ replaces);
+
+ /* Update the catalog. */
+ simple_heap_update(rel, &tup->t_self, tup);
+ CatalogUpdateIndexes(rel, tup);
+
+ ObjectAddressSet(myself, SubscriptionRelationId, subid);
+
+ /* Cleanup. */
+ heap_freetuple(tup);
+ heap_close(rel, RowExclusiveLock);
+
+ return myself;
+}
+
+/*
+ * Drop subscription by OID
+ */
+void
+DropSubscriptionById(Oid subid)
+{
+ Relation rel;
+ HeapTuple tup;
+
+ check_replication_permissions();
+
+ rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
+
+ tup = SearchSysCache1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid));
+
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for subscription %u", subid);
+
+ simple_heap_delete(rel, &tup->t_self);
+
+ ReleaseSysCache(tup);
+
+ heap_close(rel, RowExclusiveLock);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index bf76742..3c39b7e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4179,6 +4179,28 @@ _copyAlterPublicationStmt(const AlterPublicationStmt *from)
return newnode;
}
+static CreateSubscriptionStmt *
+_copyCreateSubscriptionStmt(const CreateSubscriptionStmt *from)
+{
+ CreateSubscriptionStmt *newnode = makeNode(CreateSubscriptionStmt);
+
+ COPY_STRING_FIELD(subname);
+ COPY_NODE_FIELD(options);
+
+ return newnode;
+}
+
+static AlterSubscriptionStmt *
+_copyAlterSubscriptionStmt(const AlterSubscriptionStmt *from)
+{
+ AlterSubscriptionStmt *newnode = makeNode(AlterSubscriptionStmt);
+
+ COPY_STRING_FIELD(subname);
+ COPY_NODE_FIELD(options);
+
+ return newnode;
+}
+
/* ****************************************************************
* pg_list.h copy functions
* ****************************************************************
@@ -4976,6 +4998,12 @@ copyObject(const void *from)
case T_AlterPublicationStmt:
retval = _copyAlterPublicationStmt(from);
break;
+ case T_CreateSubscriptionStmt:
+ retval = _copyCreateSubscriptionStmt(from);
+ break;
+ case T_AlterSubscriptionStmt:
+ retval = _copyAlterSubscriptionStmt(from);
+ break;
case T_A_Expr:
retval = _copyAExpr(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 0c5f1d0..2a5ed58 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2129,6 +2129,26 @@ _equalAlterPublicationStmt(const AlterPublicationStmt *a,
}
static bool
+_equalCreateSubscriptionStmt(const CreateSubscriptionStmt *a,
+ const CreateSubscriptionStmt *b)
+{
+ COMPARE_STRING_FIELD(subname);
+ COMPARE_NODE_FIELD(options);
+
+ return true;
+}
+
+static bool
+_equalAlterSubscriptionStmt(const AlterSubscriptionStmt *a,
+ const AlterSubscriptionStmt *b)
+{
+ COMPARE_STRING_FIELD(subname);
+ COMPARE_NODE_FIELD(options);
+
+ return true;
+}
+
+static bool
_equalCreatePolicyStmt(const CreatePolicyStmt *a, const CreatePolicyStmt *b)
{
COMPARE_STRING_FIELD(policy_name);
@@ -3278,6 +3298,12 @@ equal(const void *a, const void *b)
case T_AlterPublicationStmt:
retval = _equalAlterPublicationStmt(a, b);
break;
+ case T_CreateSubscriptionStmt:
+ retval = _equalCreateSubscriptionStmt(a, b);
+ break;
+ case T_AlterSubscriptionStmt:
+ retval = _equalAlterSubscriptionStmt(a, b);
+ break;
case T_A_Expr:
retval = _equalAExpr(a, b);
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b91e75a..03640eb 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -267,6 +267,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
AlterTSConfigurationStmt AlterTSDictionaryStmt
CreateMatViewStmt RefreshMatViewStmt CreateAmStmt
CreatePublicationStmt AlterPublicationStmt
+ CreateSubscriptionStmt AlterSubscriptionStmt
%type <node> select_no_parens select_with_parens select_clause
simple_select values_clause
@@ -374,13 +375,17 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
relation_expr_list dostmt_opt_list
transform_element_list transform_type_list
publication_opt_list publication_opt_items
+ subscription_create_opt_items subscription_opt_items
%type <list> group_by_list
%type <node> group_by_item empty_grouping_set rollup_clause cube_clause
%type <node> grouping_sets_clause
%type <list> opt_fdw_options fdw_options
+ publication_list
%type <defelt> fdw_option publication_opt_item
+ subscription_opt_item subscription_create_opt_item
+%type <value> publication_item
%type <range> OptTempTableName
%type <into> into_clause create_as_target create_mv_target
@@ -631,8 +636,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
SAVEPOINT SCHEMA SCROLL SEARCH SECOND_P SECURITY SELECT SEQUENCE SEQUENCES
SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P START
- STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
- SYMMETRIC SYSID SYSTEM_P
+ STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSCRIPTION
+ SUBSTRING SYMMETRIC SYSID SYSTEM_P
TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM TREAT TRIGGER TRIM TRUE_P
@@ -783,6 +788,7 @@ stmt :
| AlterPublicationStmt
| AlterRoleSetStmt
| AlterRoleStmt
+ | AlterSubscriptionStmt
| AlterTSConfigurationStmt
| AlterTSDictionaryStmt
| AlterUserMappingStmt
@@ -817,6 +823,7 @@ stmt :
| CreateSchemaStmt
| CreateSeqStmt
| CreateStmt
+ | CreateSubscriptionStmt
| CreateTableSpaceStmt
| CreateTransformStmt
| CreateTrigStmt
@@ -5648,6 +5655,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
| PUBLICATION { $$ = OBJECT_PUBLICATION; }
+ | SUBSCRIPTION { $$ = OBJECT_SUBSCRIPTION; }
;
any_name_list:
@@ -8610,6 +8618,120 @@ AlterPublicationStmt:
/*****************************************************************************
*
+ * CREATE SUBSCRIPTION name [ WITH ] options
+ *
+ *****************************************************************************/
+
+CreateSubscriptionStmt:
+ CREATE SUBSCRIPTION name opt_with subscription_create_opt_items
+ {
+ CreateSubscriptionStmt *n =
+ makeNode(CreateSubscriptionStmt);
+ n->subname = $3;
+ n->options = $5;
+ $$ = (Node *)n;
+ }
+ ;
+
+subscription_create_opt_items:
+ subscription_create_opt_item
+ {
+ $$ = list_make1($1);
+ }
+ | subscription_create_opt_items subscription_create_opt_item
+ {
+ $$ = lappend($1, $2);
+ }
+ ;
+
+subscription_create_opt_item:
+ subscription_opt_item
+ | INITIALLY IDENT
+ {
+ if (strcmp($2, "enabled") == 0)
+ $$ = makeDefElem("enabled", (Node *)makeInteger(TRUE));
+ else if (strcmp($2, "disabled") == 0)
+ $$ = makeDefElem("enabled", (Node *)makeInteger(FALSE));
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized subscription option \"%s\"", $1),
+ parser_errposition(@2)));
+ }
+ ;
+
+subscription_opt_item:
+ CONNECTION Sconst
+ {
+ $$ = makeDefElem("conninfo", (Node *)makeString($2));
+ }
+ | PUBLICATION publication_list
+ {
+ $$ = makeDefElem("publication", (Node *)$2);
+ }
+ ;
+
+publication_list:
+ publication_item
+ {
+ $$ = list_make1($1);
+ }
+ | publication_list ',' publication_item
+ {
+ $$ = lappend($1, $3);
+ }
+ ;
+
+publication_item:
+ ColLabel { $$ = makeString($1); };
+
+/*****************************************************************************
+ *
+ * ALTER SUBSCRIPTION name [ WITH ] options
+ *
+ *****************************************************************************/
+
+AlterSubscriptionStmt:
+ ALTER SUBSCRIPTION name opt_with subscription_opt_items
+ {
+ AlterSubscriptionStmt *n =
+ makeNode(AlterSubscriptionStmt);
+ n->subname = $3;
+ n->options = $5;
+ $$ = (Node *)n;
+ }
+ | ALTER SUBSCRIPTION name ENABLE_P
+ {
+ AlterSubscriptionStmt *n =
+ makeNode(AlterSubscriptionStmt);
+ n->subname = $3;
+ n->options = list_make1(makeDefElem("enabled",
+ (Node *)makeInteger(TRUE)));
+ $$ = (Node *)n;
+ }
+ | ALTER SUBSCRIPTION name DISABLE_P
+ {
+ AlterSubscriptionStmt *n =
+ makeNode(AlterSubscriptionStmt);
+ n->subname = $3;
+ n->options = list_make1(makeDefElem("enabled",
+ (Node *)makeInteger(FALSE)));
+ $$ = (Node *)n;
+ } ;
+
+subscription_opt_items:
+ subscription_opt_item
+ {
+ $$ = list_make1($1);
+ }
+ | subscription_opt_items subscription_opt_item
+ {
+ $$ = lappend($1, $2);
+ }
+ ;
+
+/*****************************************************************************
+ *
* QUERY: Define Rewrite Rule
*
*****************************************************************************/
@@ -14066,6 +14188,7 @@ unreserved_keyword:
| STORAGE
| STRICT_P
| STRIP_P
+ | SUBSCRIPTION
| SYSID
| SYSTEM_P
| TABLES
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 3b3e90c..e4b093b 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -15,6 +15,6 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
OBJS = decode.o logical.o logicalfuncs.o message.o origin.o publication.o \
- reorderbuffer.o snapbuild.o
+ reorderbuffer.o snapbuild.o subscription.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/subscription.c b/src/backend/replication/logical/subscription.c
new file mode 100644
index 0000000..7d1de2c
--- /dev/null
+++ b/src/backend/replication/logical/subscription.c
@@ -0,0 +1,146 @@
+/*-------------------------------------------------------------------------
+ *
+ * subscription.c
+ * replication subscriptions
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * subscription.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "miscadmin.h"
+
+#include "access/genam.h"
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/objectaddress.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_subscription.h"
+
+#include "executor/spi.h"
+
+#include "nodes/makefuncs.h"
+
+#include "replication/reorderbuffer.h"
+#include "replication/subscription.h"
+
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/catcache.h"
+#include "utils/fmgroids.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+static List *textarray_to_stringlist(ArrayType *textarray);
+
+/*
+ * Fetch the subscription from the syscache.
+ */
+Subscription *
+GetSubscription(Oid subid)
+{
+ HeapTuple tup;
+ Subscription *sub;
+ Form_pg_subscription subform;
+ Datum datum;
+ bool isnull;
+
+ tup = SearchSysCache1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid));
+
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for subscription %u", subid);
+
+ subform = (Form_pg_subscription) GETSTRUCT(tup);
+
+ sub = (Subscription *) palloc(sizeof(Subscription));
+ sub->oid = subid;
+ sub->dbid = subform->dbid;
+ sub->name = NameStr(subform->subname);
+ sub->enabled = subform->subenabled;
+
+ /* Get conninfo */
+ datum = SysCacheGetAttr(SUBSCRIPTIONOID,
+ tup,
+ Anum_pg_subscription_subconninfo,
+ &isnull);
+ Assert(!isnull);
+ sub->conninfo = pstrdup(TextDatumGetCString(datum));
+
+ /* Get slotname */
+ datum = SysCacheGetAttr(SUBSCRIPTIONOID,
+ tup,
+ Anum_pg_subscription_subslotname,
+ &isnull);
+ Assert(!isnull);
+ sub->slotname = pstrdup(NameStr(*DatumGetName(datum)));
+
+ /* Get publications */
+ datum = SysCacheGetAttr(SUBSCRIPTIONOID,
+ tup,
+ Anum_pg_subscription_subpublications,
+ &isnull);
+ Assert(!isnull);
+ sub->publications = textarray_to_stringlist(DatumGetArrayTypeP(datum));
+
+ ReleaseSysCache(tup);
+
+ return sub;
+}
+
+/*
+ * get_subscription_oid - given a subscription name, look up the OID
+ *
+ * If missing_ok is false, throw an error if name not found. If true, just
+ * return InvalidOid.
+ */
+Oid
+get_subscription_oid(const char *subname, bool missing_ok)
+{
+ Oid oid;
+
+ oid = GetSysCacheOid1(SUBSCRIPTIONNAME, CStringGetDatum(subname));
+ if (!OidIsValid(oid) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("subscription \"%s\" does not exist", subname)));
+ return oid;
+}
+
+/*
+ * Convert text array to list of strings.
+ *
+ * Note: the resulting list of strings is pallocated here.
+ */
+static List *
+textarray_to_stringlist(ArrayType *textarray)
+{
+ Datum *elems;
+ int nelems, i;
+ List *res = NIL;
+
+ deconstruct_array(textarray,
+ TEXTOID, -1, false, 'i',
+ &elems, NULL, &nelems);
+
+ if (nelems == 0)
+ return NIL;
+
+ for (i = 0; i < nelems; i++)
+ res = lappend(res, makeString(pstrdup(TextDatumGetCString(elems[i]))));
+
+ return res;
+}
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 886d2ff..4cb2366 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -213,6 +213,8 @@ check_xact_readonly(Node *parsetree)
case T_SecLabelStmt:
case T_CreatePublicationStmt:
case T_AlterPublicationStmt:
+ case T_CreateSubscriptionStmt:
+ case T_AlterSubscriptionStmt:
PreventCommandIfReadOnly(CreateCommandTag(parsetree));
PreventCommandIfParallelMode(CreateCommandTag(parsetree));
break;
@@ -1560,6 +1562,14 @@ ProcessUtilitySlow(Node *parsetree,
commandCollected = true;
break;
+ case T_CreateSubscriptionStmt:
+ address = CreateSubscription((CreateSubscriptionStmt *) parsetree);
+ break;
+
+ case T_AlterSubscriptionStmt:
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ break;
+
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(parsetree));
@@ -1921,6 +1931,9 @@ AlterObjectTypeCommandTag(ObjectType objtype)
case OBJECT_PUBLICATION:
tag = "PUBLICATION";
break;
+ case OBJECT_SUBSCRIPTION:
+ tag = "SUBSCRIPTION";
+ break;
default:
tag = "???";
break;
@@ -2209,6 +2222,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_PUBLICATION:
tag = "DROP PUBLICATION";
break;
+ case OBJECT_SUBSCRIPTION:
+ tag = "DROP SUBSCRIPTION";
+ break;
default:
tag = "???";
}
@@ -2587,6 +2603,14 @@ CreateCommandTag(Node *parsetree)
tag = "ALTER PUBLICATION";
break;
+ case T_CreateSubscriptionStmt:
+ tag = "CREATE SUBSCRIPTION";
+ break;
+
+ case T_AlterSubscriptionStmt:
+ tag = "ALTER SUBSCRIPTION";
+ break;
+
case T_PrepareStmt:
tag = "PREPARE";
break;
@@ -3160,6 +3184,14 @@ GetCommandLogLevel(Node *parsetree)
lev = LOGSTMT_DDL;
break;
+ case T_CreateSubscriptionStmt:
+ lev = LOGSTMT_DDL;
+ break;
+
+ case T_AlterSubscriptionStmt:
+ lev = LOGSTMT_DDL;
+ break;
+
/* already-planned queries */
case T_PlannedStmt:
{
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 3f7027f..7c3121b 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -52,6 +52,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_rewrite.h"
#include "catalog/pg_shseclabel.h"
+#include "catalog/pg_subscription.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
@@ -100,6 +101,7 @@ static const FormData_pg_attribute Desc_pg_authid[Natts_pg_authid] = {Schema_pg_
static const FormData_pg_attribute Desc_pg_auth_members[Natts_pg_auth_members] = {Schema_pg_auth_members};
static const FormData_pg_attribute Desc_pg_index[Natts_pg_index] = {Schema_pg_index};
static const FormData_pg_attribute Desc_pg_shseclabel[Natts_pg_shseclabel] = {Schema_pg_shseclabel};
+static const FormData_pg_attribute Desc_pg_subscription[Natts_pg_subscription] = {Schema_pg_subscription};
/*
* Hash tables that index the relation cache
@@ -3266,8 +3268,10 @@ RelationCacheInitializePhase2(void)
false, Natts_pg_auth_members, Desc_pg_auth_members);
formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
false, Natts_pg_shseclabel, Desc_pg_shseclabel);
+ formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
+ true, Natts_pg_subscription, Desc_pg_subscription);
-#define NUM_CRITICAL_SHARED_RELS 4 /* fix if you change list above */
+#define NUM_CRITICAL_SHARED_RELS 5 /* fix if you change list above */
}
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index d575b51..03c8916 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -59,6 +59,7 @@
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_replication_origin.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_subscription.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_transform.h"
#include "catalog/pg_ts_config.h"
@@ -713,6 +714,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {SubscriptionRelationId, /* SUBSCRIPTIONOID */
+ SubscriptionObjectIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0
+ },
+ 4
+ },
+ {SubscriptionRelationId, /* SUBSCRIPTIONNAME */
+ SubscriptionNameIndexId,
+ 1,
+ {
+ Anum_pg_subscription_subname,
+ 0,
+ 0,
+ 0
+ },
+ 4
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index a379c19..225ea25 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -497,6 +497,9 @@ exec_command(const char *cmd,
case 'p':
success = describePublications(pattern, show_verbose);
break;
+ case 's':
+ success = describeSubscriptions(pattern, show_verbose);
+ break;
default:
status = PSQL_CMD_UNKNOWN;
break;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 573d980..823cda4 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4784,6 +4784,72 @@ describePublications(const char *pattern, bool verbose)
return true;
}
+/* \drs
+ * Describes subscriptions.
+ *
+ * Takes an optional regexp to select particular subscriptions
+ */
+bool
+describeSubscriptions(const char *pattern, bool verbose)
+{
+ PQExpBufferData buf;
+ PGresult *res;
+ printQueryOpt myopt = pset.popt;
+ static const bool translate_columns[] = {false, true, false, false, false};
+
+ /* TODO bump */
+ if (pset.sversion < 90600)
+ {
+ psql_error("The server (version %d.%d) does not support subscriptions.\n",
+ pset.sversion / 10000, (pset.sversion / 100) % 100);
+ return true;
+ }
+
+ initPQExpBuffer(&buf);
+
+ printfPQExpBuffer(&buf,
+ "SELECT subname AS \"%s\",\n"
+ " (SELECT datname FROM pg_catalog.pg_database WHERE oid = dbid) AS \"%s\",\n"
+ " subenabled AS \"%s\",\n"
+ " subpublications AS \"%s\",\n"
+ " subconninfo AS \"%s\"\n",
+ gettext_noop("Name"),
+ gettext_noop("Database"),
+ gettext_noop("Enabled"),
+ gettext_noop("Publication"),
+ gettext_noop("Conninfo"));
+
+ /* TODO Show owner and ACL */
+ if (verbose)
+ {
+ }
+
+ appendPQExpBufferStr(&buf,
+ "\nFROM pg_catalog.pg_subscription\n");
+
+ processSQLNamePattern(pset.db, &buf, pattern, false, false,
+ NULL, "subname", NULL,
+ NULL);
+
+ appendPQExpBufferStr(&buf, "ORDER BY 1;");
+
+ res = PSQLexec(buf.data);
+ termPQExpBuffer(&buf);
+ if (!res)
+ return false;
+
+ myopt.nullPrint = NULL;
+ myopt.title = _("List of subscriptions");
+ myopt.translate_header = true;
+ myopt.translate_columns = translate_columns;
+ myopt.n_translate_columns = lengthof(translate_columns);
+
+ printQuery(res, &myopt, pset.queryFout, false, pset.logfile);
+
+ PQclear(res);
+ return true;
+}
+
/*
* printACLColumn
*
diff --git a/src/bin/psql/describe.h b/src/bin/psql/describe.h
index c4457da..a754fb8 100644
--- a/src/bin/psql/describe.h
+++ b/src/bin/psql/describe.h
@@ -105,4 +105,7 @@ extern bool listEventTriggers(const char *pattern, bool verbose);
/* \drp */
bool describePublications(const char *pattern, bool verbose);
+/* \drs */
+bool describeSubscriptions(const char *pattern, bool verbose);
+
#endif /* DESCRIBE_H */
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 5cc5bce..88c3932 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -242,6 +242,7 @@ slashUsage(unsigned short int pager)
fprintf(output, _(" \\dp [PATTERN] list table, view, and sequence access privileges\n"));
fprintf(output, _(" \\drds [PATRN1 [PATRN2]] list per-database role settings\n"));
fprintf(output, _(" \\drp[+] [PATTERN] list replication publications\n"));
+ fprintf(output, _(" \\drs[+] [PATTERN] list replication subscriptions\n"));
fprintf(output, _(" \\ds[S+] [PATTERN] list sequences\n"));
fprintf(output, _(" \\dt[S+] [PATTERN] list tables\n"));
fprintf(output, _(" \\dT[S+] [PATTERN] list data types\n"));
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 01fd16a..789ce00 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -163,6 +163,7 @@ typedef enum ObjectClass
OCLASS_POLICY, /* pg_policy */
OCLASS_PUBLICATION, /* pg_publication */
OCLASS_PUBLICATION_REL, /* pg_publication_rel */
+ OCLASS_SUBSCRIPTION, /* pg_subscription */
OCLASS_TRANSFORM /* pg_transform */
} ObjectClass;
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ae46ed6..86e2939 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -331,6 +331,12 @@ DECLARE_UNIQUE_INDEX(pg_publication_rel_object_index, 6112, on pg_publication_re
DECLARE_UNIQUE_INDEX(pg_publication_rel_map_index, 6113, on pg_publication_rel using btree(relid oid_ops, pubid oid_ops));
#define PublicationRelMapIndexId 6113
+DECLARE_UNIQUE_INDEX(pg_subscription_oid_index, 6114, on pg_subscription using btree(oid oid_ops));
+#define SubscriptionObjectIndexId 6114
+
+DECLARE_UNIQUE_INDEX(pg_subscription_subname_index, 6115, on pg_subscription using btree(subname name_ops));
+#define SubscriptionNameIndexId 6115
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
new file mode 100644
index 0000000..254d509
--- /dev/null
+++ b/src/include/catalog/pg_subscription.h
@@ -0,0 +1,52 @@
+/* -------------------------------------------------------------------------
+ *
+ * pg_subscription.h
+ * Definition of the subscription catalog (pg_subscription).
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * -------------------------------------------------------------------------
+ */
+#ifndef PG_SUBSCRIPTION_H
+#define PG_SUBSCRIPTION_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_subscription definition. cpp turns this into
+ * typedef struct FormData_pg_subscription
+ * ----------------
+ */
+#define SubscriptionRelationId 6100
+#define SubscriptionRelation_Rowtype_Id 6101
+
+CATALOG(pg_subscription,6100) BKI_SHARED_RELATION BKI_ROWTYPE_OID(6101) BKI_SCHEMA_MACRO
+{
+ Oid dbid; /* Database the subscription is in. */
+ NameData subname; /* Name of the subscription */
+ bool subenabled; /* True if the subsription is enabled (running) */
+
+#ifdef CATALOG_VARLEN /* variable-length fields start here */
+ text subconninfo; /* Connection string to the provider */
+ NameData subslotname; /* Slot name on provider */
+
+ text subpublications[1]; /* List of publications subscribed to */
+#endif
+} FormData_pg_subscription;
+
+typedef FormData_pg_subscription *Form_pg_subscription;
+
+/* ----------------
+ * compiler constants for pg_subscription
+ * ----------------
+ */
+#define Natts_pg_subscription 6
+#define Anum_pg_subscription_dbid 1
+#define Anum_pg_subscription_subname 2
+#define Anum_pg_subscription_subenabled 3
+#define Anum_pg_subscription_subconninfo 4
+#define Anum_pg_subscription_subslotname 5
+#define Anum_pg_subscription_subpublications 6
+
+#endif /* PG_SUBSCRIPTION_H */
diff --git a/src/include/commands/replicationcmds.h b/src/include/commands/replicationcmds.h
index 717485f..7c35d72 100644
--- a/src/include/commands/replicationcmds.h
+++ b/src/include/commands/replicationcmds.h
@@ -1,7 +1,7 @@
/*-------------------------------------------------------------------------
*
* replicationcmds.h
- * prototypes for publicationcmds.c.
+ * prototypes for publicationcmds.c and subscriptioncmds.c.
*
*
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
@@ -23,4 +23,8 @@ extern void AlterPublication(AlterPublicationStmt *stmt);
extern void DropPublicationById(Oid pubid);
extern void RemovePublicationRelById(Oid prid);
+extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern void DropSubscriptionById(Oid subid);
+
#endif /* REPLICATIONCMDS_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 3cce3d9..322286b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -407,6 +407,8 @@ typedef enum NodeTag
T_CreateAmStmt,
T_CreatePublicationStmt,
T_AlterPublicationStmt,
+ T_CreateSubscriptionStmt,
+ T_AlterSubscriptionStmt,
/*
* TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index c10e6e7..247234c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1414,6 +1414,7 @@ typedef enum ObjectType
OBJECT_RULE,
OBJECT_SCHEMA,
OBJECT_SEQUENCE,
+ OBJECT_SUBSCRIPTION,
OBJECT_TABCONSTRAINT,
OBJECT_TABLE,
OBJECT_TABLESPACE,
@@ -3125,4 +3126,18 @@ typedef struct AlterPublicationStmt
List *tables; /* List of tables to add/drop */
} AlterPublicationStmt;
+typedef struct CreateSubscriptionStmt
+{
+ NodeTag type;
+ char *subname; /* Name of of the subscription */
+ List *options; /* List of DefElem nodes */
+} CreateSubscriptionStmt;
+
+typedef struct AlterSubscriptionStmt
+{
+ NodeTag type;
+ char *subname; /* Name of of the subscription */
+ List *options; /* List of DefElem nodes */
+} AlterSubscriptionStmt;
+
#endif /* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 9430ff0..c11a3f4 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -370,6 +370,7 @@ PG_KEYWORD("stdout", STDOUT, UNRESERVED_KEYWORD)
PG_KEYWORD("storage", STORAGE, UNRESERVED_KEYWORD)
PG_KEYWORD("strict", STRICT_P, UNRESERVED_KEYWORD)
PG_KEYWORD("strip", STRIP_P, UNRESERVED_KEYWORD)
+PG_KEYWORD("subscription", SUBSCRIPTION, UNRESERVED_KEYWORD)
PG_KEYWORD("substring", SUBSTRING, COL_NAME_KEYWORD)
PG_KEYWORD("symmetric", SYMMETRIC, RESERVED_KEYWORD)
PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
diff --git a/src/include/replication/subscription.h b/src/include/replication/subscription.h
new file mode 100644
index 0000000..a937f4b
--- /dev/null
+++ b/src/include/replication/subscription.h
@@ -0,0 +1,33 @@
+/*-------------------------------------------------------------------------
+ *
+ * subscription.h
+ * replication subscription support struncture/function definition
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * subscription.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SUBSCRIPTION_H
+#define SUBSCRIPTION_H
+
+#include "postgres.h"
+
+
+typedef struct Subscription
+{
+ Oid oid; /* Oid of the subscription */
+ Oid dbid; /* Oid of the database which dubscription is in */
+ char *name; /* Name of the subscription */
+ bool enabled; /* Indicates if the subscription is enabled */
+ char *conninfo; /* Connection string to the provider */
+ char *slotname; /* Name of the replication slot */
+ List *publications; /* List of publication names to subscribe to */
+} Subscription;
+
+extern Subscription *GetSubscription(Oid subid);
+extern Oid get_subscription_oid(const char *subname, bool missing_ok);
+
+#endif /* SUBSCRIPTION_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index ca2283a..7fcd139 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -540,5 +540,6 @@ typedef struct ViewOptions
extern void RelationIncrementReferenceCount(Relation rel);
extern void RelationDecrementReferenceCount(Relation rel);
extern bool RelationHasUnloggedIndex(Relation rel);
+extern List *RelationGetRepsetList(Relation rel);
#endif /* REL_H */
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 632fcbc..b1d03a5 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -85,6 +85,8 @@ enum SysCacheIdentifier
PUBLICATIONRELMAP,
RULERELNAME,
STATRELATTINH,
+ SUBSCRIPTIONOID,
+ SUBSCRIPTIONNAME,
TABLESPACEOID,
TRFOID,
TRFTYPELANG,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index 5ab04ae..ceac2c8 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -131,6 +131,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_subscription|t
pg_tablespace|t
pg_transform|t
pg_trigger|t
--
2.7.4
0003-Define-logical-replication-protocol-and-output-plugi.patchapplication/x-patch; name=0003-Define-logical-replication-protocol-and-output-plugi.patchDownload
From 0aa5eb65e4e8eb6a14654e4f877ad8d1541d6049 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 4 Jun 2016 17:57:09 +0200
Subject: [PATCH 3/6] Define logical replication protocol and output plugin
---
doc/src/sgml/protocol.sgml | 712 +++++++++++++++++++++
src/Makefile | 1 +
src/backend/replication/logical/Makefile | 4 +-
src/backend/replication/logical/proto.c | 641 +++++++++++++++++++
src/backend/replication/pgoutput/Makefile | 33 +
src/backend/replication/pgoutput/pgoutput.c | 445 +++++++++++++
src/backend/replication/pgoutput/pgoutput_config.c | 188 ++++++
src/include/replication/logicalproto.h | 76 +++
src/include/replication/pgoutput.h | 32 +
9 files changed, 2130 insertions(+), 2 deletions(-)
create mode 100644 src/backend/replication/logical/proto.c
create mode 100644 src/backend/replication/pgoutput/Makefile
create mode 100644 src/backend/replication/pgoutput/pgoutput.c
create mode 100644 src/backend/replication/pgoutput/pgoutput_config.c
create mode 100644 src/include/replication/logicalproto.h
create mode 100644 src/include/replication/pgoutput.h
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 8e701aa..6dd0e55 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2104,6 +2104,136 @@ The commands accepted in walsender mode are:
</sect1>
+<sect1 id="protocol-logical-replication">
+ <title>Logical Streaming Replication Protocol</title>
+
+ <para>
+ This section describes the logical replication protocol which is the message
+ flow started by the <literal>START_REPLICATION</literal>
+ <literal>SLOT</literal> <replaceable class="parameter">slot_name</>
+ <literal>LOGICAL</literal> replication command.
+ </para>
+
+ <para>
+ The logical streaming replication protocol builds on the primitives of
+ physical streaming replication protocol.
+ </para>
+
+ <sect2 id="protocol-logical-replication-params">
+ <title>Logical Streaming Replication Parameters</title>
+
+ <para>
+ The logical replication <literal>START_REPLICATION</literal> command
+ accepts following parameters:
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ proto_version
+ </term>
+ <listitem>
+ <para>
+ Protocol version. Currently only version <literal>1</literal> is
+ supported.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ encoding
+ </term>
+ <listitem>
+ <para>
+ Name of the encoding. This currently correspond to the encoding of the
+ server.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ publication_names
+ </term>
+ <listitem>
+ <para>
+ Comma separated list of publication names for which to subscribe
+ (receive changes). See
+ <xref linkend="logical-replication-publication"> for more info.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ </para>
+ </sect2>
+
+ <sect2 id="protocol-logical-messages">
+ <title>Logical Replication Protocol Messages</title>
+
+ <para>
+ The individual protocol messages are discussed in the following
+ sub-sections. Individual messages are describer in
+ <xref linkend="protocol-logicalrep-message-formats"> section.
+ </para>
+
+ <para>
+ All top-level protocol messages begin with a message type byte.
+ While represented in code as a character, this is a signed byte with no
+ associated encoding.
+ </para>
+
+ <para>
+ Since the streaming replication protocol supplies a message length there
+ is no need for top-level protocol messages to embed a length in their
+ header.
+ </para>
+
+ </sect2>
+
+ <sect2 id="protocol-logical-messages-flow">
+ <title>Logical Replication Protocol Message Flow</title>
+
+ <para>
+ With the exception of the <literal>START_REPLICATION</literal> command and
+ the replay progress messages, all information flows only from the backend
+ to the frontend.
+ </para>
+
+ <para>
+ The logical replication protocol sends individual transactions one by one.
+ This means that all messages between two Begin and Commit messages all
+ belong to the same transaction.
+ </para>
+
+ <para>
+ Every sent transaction contains zero or more DML messages (Insert,
+ Update, Delete) and in case of cascaded setup it can also contain Origin
+ messages. The origin message indicated that the transaction originated on
+ different replication node. Since replication node in the scope of logical
+ replication protocol can be pretty much anything, the only identifier
+ is the origin name. It's downstream responsibility to handle this as
+ needed (if needed). The Origin message is always sent before any DML
+ messages in the transaction.
+ </para>
+
+ <para>
+ Every DML message contains arbitraty relation id, which can be mapped to
+ an id in the Relation messages. The Relation describe the schema of the
+ given relation. The Relation message is sent for a given relation either
+ because it's the first time we send DML message for given relation in
+ current session or because the relation definition has changed since the
+ last Relation message was sent for it. The protocol assumes that the client
+ is capable of caching the metadata for as many relations as needed.
+ </para>
+
+ <para>
+ </para>
+
+ </sect2>
+
+</sect1>
+
<sect1 id="protocol-message-types">
<title>Message Data Types</title>
@@ -5112,6 +5242,588 @@ not line breaks.
</sect1>
+<sect1 id="protocol-logicalrep-message-formats">
+<title>Logical Replication Message Formats</title>
+
+<para>
+This section describes the detailed format of each logical replication message.
+These messages are returned either by replication slot SQL interface or by
+WalSender. In case of WalSender they are encapsulated inside the replication
+protocol WAL messages as described in <xref linkend="protocol-replication">
+section and generaly obey same message flow as physical replication.
+</para>
+
+<variablelist>
+
+<varlistentry>
+<term>
+Begin
+</term>
+<listitem>
+<para>
+
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('B')
+</term>
+<listitem>
+<para>
+ Identifies the message as a begin message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int64
+</term>
+<listitem>
+<para>
+ The final LSN of the transaction.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int64
+</term>
+<listitem>
+<para>
+ Commit timestamp of the transaction.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int32
+</term>
+<listitem>
+<para>
+ Xid of the transaction.
+</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>
+Commit
+</term>
+<listitem>
+<para>
+
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('C')
+</term>
+<listitem>
+<para>
+ Identifies the message as a commit message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int64
+</term>
+<listitem>
+<para>
+ The LSN of the commit.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int64
+</term>
+<listitem>
+<para>
+ The end LSN of the transaction.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int64
+</term>
+<listitem>
+<para>
+ Commit timestamp of the transaction.
+</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>
+Origin
+</term>
+<listitem>
+<para>
+
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('O')
+</term>
+<listitem>
+<para>
+ Identifies the message as an origin message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int64
+</term>
+<listitem>
+<para>
+ The LSN of the commit on the origin server.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int8
+</term>
+<listitem>
+<para>
+ Length of the origin name (including the NULL-termination
+ character).
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ String
+</term>
+<listitem>
+<para>
+ Name of the origin.
+</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>
+Relation
+</term>
+<listitem>
+<para>
+
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('R')
+</term>
+<listitem>
+<para>
+ Identifies the message as an relation message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int32
+</term>
+<listitem>
+<para>
+ Id of the relation.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int8
+</term>
+<listitem>
+<para>
+ Length of the namespace name (including the NULL-termination
+ character).
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ String
+</term>
+<listitem>
+<para>
+ Namespace.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int8
+</term>
+<listitem>
+<para>
+ Length of the relation name (including the NULL-termination
+ character).
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ String
+</term>
+<listitem>
+<para>
+ Relation name.
+</para>
+</listitem>
+</varlistentry>
+</variablelist>
+
+</para>
+
+<para>
+This message is always followed by Attributes message.
+</para>
+
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>
+Attrinutes
+</term>
+<listitem>
+<para>
+
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('A')
+</term>
+<listitem>
+<para>
+ Identifies the message as an attributes message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int16
+</term>
+<listitem>
+<para>
+ Number of columns.
+</para>
+</listitem>
+</varlistentry>
+</variablelist>
+ Next, the following submessage appears for each column:
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('C')
+</term>
+<listitem>
+<para>
+ Start of column block.
+</para>
+</listitem>
+</varlistentry><varlistentry>
+<term>
+ Int8
+</term>
+<listitem>
+<para>
+ Flags for the column. Currently can be either 0 for no flags
+ or one which marks the column as part of the key.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int8
+</term>
+<listitem>
+<para>
+ Length of column name (including the NULL-termination
+ character).
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ String
+</term>
+<listitem>
+<para>
+ Name of the column.
+</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>
+Insert
+</term>
+<listitem>
+<para>
+
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('I')
+</term>
+<listitem>
+<para>
+ Identifies the message as an insert message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int32
+</term>
+<listitem>
+<para>
+ Id of the relation corresponding to the id in the relation
+ message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Byte1('N')
+</term>
+<listitem>
+<para>
+ Identifies the following TupleData message as a new tuple.
+</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>
+Update
+</term>
+<listitem>
+<para>
+
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('U')
+</term>
+<listitem>
+<para>
+ Identifies the message as an update message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int32
+</term>
+<listitem>
+<para>
+ Id of the relation corresponding to the id in the relation
+ message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Byte1('O')
+</term>
+<listitem>
+<para>
+ Identifies the following TupleData message as a old tuple.
+ This field is optional and is only present if table in which
+ the update happened has REPLICA IDENTITY set to FULL or when
+ the REPLICA IDENTITY index values have changed.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Byte1('N')
+</term>
+<listitem>
+<para>
+ Identifies the following TupleData message as a new tuple.
+</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>
+Delete
+</term>
+<listitem>
+<para>
+
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('D')
+</term>
+<listitem>
+<para>
+ Identifies the message as a delete message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int32
+</term>
+<listitem>
+<para>
+ Id of the relation corresponding to the id in the relation
+ message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Byte1('O')
+</term>
+<listitem>
+<para>
+ Identifies the following TupleData message as the old tuple
+ (deleted tuple).
+</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>
+TupleData
+</term>
+<listitem>
+<para>
+
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('T')
+</term>
+<listitem>
+<para>
+ Identifies the message as an tuple data message.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int16
+</term>
+<listitem>
+<para>
+ Number of columns.
+</para>
+</listitem>
+</varlistentry>
+</variablelist>
+ Next, one of the following submessages appears for each column:
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('n')
+</term>
+<listitem>
+<para>
+ Idenfifies the data as NULL value.
+</para>
+</listitem>
+</varlistentry>
+</variablelist>
+ Or
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('u')
+</term>
+<listitem>
+<para>
+ Idenfifies unchanged TOASTed value (the actual value is not
+ sent).
+</para>
+</listitem>
+</varlistentry>
+</variablelist>
+ Or
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('t')
+</term>
+<listitem>
+<para>
+ Idenfifies the data as text formatted value.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int32
+</term>
+<listitem>
+<para>
+ Length of the column value.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ String
+</term>
+<listitem>
+<para>
+ The text value.
+</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+
+</sect1>
+
<sect1 id="protocol-changes">
<title>Summary of Changes since Protocol 2.0</title>
diff --git a/src/Makefile b/src/Makefile
index b526be7..9723a1e 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -22,6 +22,7 @@ SUBDIRS = \
include \
interfaces \
backend/replication/libpqwalreceiver \
+ backend/replication/pgoutput \
fe_utils \
bin \
pl \
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index e4b093b..438811e 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -14,7 +14,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
-OBJS = decode.o logical.o logicalfuncs.o message.o origin.o publication.o \
- reorderbuffer.o snapbuild.o subscription.o
+OBJS = decode.o logical.o logicalfuncs.o message.o origin.o proto.o \
+ publication.o reorderbuffer.o snapbuild.o subscription.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
new file mode 100644
index 0000000..2b82495
--- /dev/null
+++ b/src/backend/replication/logical/proto.c
@@ -0,0 +1,641 @@
+/*-------------------------------------------------------------------------
+ *
+ * proto.c
+ * logical replication protocol functions
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logical/proto.c
+ *
+ * TODO
+ * unaligned access
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+
+#include "access/htup_details.h"
+#include "access/heapam.h"
+
+#include "access/sysattr.h"
+#include "access/tuptoaster.h"
+#include "access/xact.h"
+
+#include "catalog/catversion.h"
+#include "catalog/index.h"
+
+#include "catalog/namespace.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_namespace.h"
+#include "catalog/pg_type.h"
+
+#include "commands/dbcommands.h"
+
+#include "executor/spi.h"
+
+#include "libpq/pqformat.h"
+
+#include "mb/pg_wchar.h"
+
+#include "nodes/makefuncs.h"
+
+#include "replication/logicalproto.h"
+#include "replication/reorderbuffer.h"
+
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/timestamp.h"
+#include "utils/typcache.h"
+
+#define IS_REPLICA_IDENTITY 1
+
+static void logicalrep_write_attrs(StringInfo out, Relation rel);
+static void logicalrep_write_tuple(StringInfo out, Relation rel,
+ HeapTuple tuple);
+
+static void logicalrep_read_attrs(StringInfo in, char ***attrnames,
+ int *nattrnames);
+static void logicalrep_read_tuple(StringInfo in, LogicalRepTupleData *tuple);
+
+
+/*
+ * Given a List of strings, return it as single comma separated
+ * string, quoting identifiers as needed.
+ *
+ * This is essentially the reverse of SplitIdentifierString.
+ *
+ * The caller should free the result.
+ */
+static char *
+stringlist_to_identifierstr(List *strings)
+{
+ ListCell *lc;
+ StringInfoData res;
+ bool first = true;
+
+ initStringInfo(&res);
+
+ foreach (lc, strings)
+ {
+ if (first)
+ first = false;
+ else
+ appendStringInfoChar(&res, ',');
+
+ appendStringInfoString(&res, quote_identifier(strVal(lfirst(lc))));
+ }
+
+ return res.data;
+}
+
+/*
+ * Build string of options for logical replication plugin.
+ */
+char *
+logicalrep_build_options(List *publications)
+{
+ StringInfoData options;
+ char *publicationstr;
+
+ initStringInfo(&options);
+ appendStringInfo(&options, "proto_version '%u'", LOGICALREP_PROTO_VERSION_NUM);
+ appendStringInfo(&options, ", encoding %s",
+ quote_literal_cstr(GetDatabaseEncodingName()));
+ appendStringInfo(&options, ", pg_version '%u'", PG_VERSION_NUM);
+ publicationstr = stringlist_to_identifierstr(publications);
+ appendStringInfo(&options, ", publication_names %s",
+ quote_literal_cstr(publicationstr));
+ pfree(publicationstr);
+
+ return options.data;
+}
+
+/*`
+ * Write BEGIN to the output stream.
+ */
+void
+logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
+{
+ pq_sendbyte(out, 'B'); /* BEGIN */
+
+ /* fixed fields */
+ pq_sendint64(out, txn->final_lsn);
+ pq_sendint64(out, txn->commit_time);
+ pq_sendint(out, txn->xid, 4);
+}
+
+/*
+ * Read transaction BEGIN from the stream.
+ */
+void
+logicalrep_read_begin(StringInfo in, XLogRecPtr *remote_lsn,
+ TimestampTz *committime, TransactionId *remote_xid)
+{
+ /* read fields */
+ *remote_lsn = pq_getmsgint64(in);
+ Assert(*remote_lsn != InvalidXLogRecPtr);
+ *committime = pq_getmsgint64(in);
+ *remote_xid = pq_getmsgint(in, 4);
+}
+
+
+/*
+ * Write COMMIT to the output stream.
+ */
+void
+logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ pq_sendbyte(out, 'C'); /* sending COMMIT */
+
+ /* send fixed fields */
+ pq_sendint64(out, commit_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+}
+
+/*
+ * Read transaction COMMIT from the stream.
+ */
+void
+logicalrep_read_commit(StringInfo in, XLogRecPtr *commit_lsn,
+ XLogRecPtr *end_lsn, TimestampTz *committime)
+{
+ /* read fields */
+ *commit_lsn = pq_getmsgint64(in);
+ *end_lsn = pq_getmsgint64(in);
+ *committime = pq_getmsgint64(in);
+}
+
+/*
+ * Write ORIGIN to the output stream.
+ */
+void
+logicalrep_write_origin(StringInfo out, const char *origin,
+ XLogRecPtr origin_lsn)
+{
+ uint8 len;
+
+ Assert(strlen(origin) < 255);
+
+ pq_sendbyte(out, 'O'); /* ORIGIN */
+
+ /* fixed fields */
+ pq_sendint64(out, origin_lsn);
+
+ /* origin */
+ len = strlen(origin) + 1;
+ pq_sendbyte(out, len);
+ pq_sendbytes(out, origin, len);
+}
+
+
+/*
+ * Read ORIGIN from the output stream.
+ */
+char *
+logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn)
+{
+ uint8 len;
+
+ /* fixed fields */
+ *origin_lsn = pq_getmsgint64(in);
+
+ /* origin */
+ len = pq_getmsgbyte(in);
+
+ return pnstrdup(pq_getmsgbytes(in, len), len);
+}
+
+
+/*
+ * Write INSERT to the output stream.
+ */
+void
+logicalrep_write_insert(StringInfo out, Relation rel, HeapTuple newtuple)
+{
+ pq_sendbyte(out, 'I'); /* action INSERT */
+
+ /* use Oid as relation identifier */
+ pq_sendint(out, RelationGetRelid(rel), 4);
+
+ pq_sendbyte(out, 'N'); /* new tuple follows */
+ logicalrep_write_tuple(out, rel, newtuple);
+}
+
+/*
+ * Read INSERT from stream.
+ *
+ * Fills the new tuple.
+ */
+LogicalRepRelId
+logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)
+{
+ char action;
+ LogicalRepRelId relid;
+
+ /* read the relation id */
+ relid = pq_getmsgint(in, 4);
+
+ action = pq_getmsgbyte(in);
+ if (action != 'N')
+ elog(ERROR, "expected new tuple but got %d",
+ action);
+
+ logicalrep_read_tuple(in, newtup);
+
+ return relid;
+}
+
+/*
+ * Write UPDATE to the output stream.
+ */
+void
+logicalrep_write_update(StringInfo out, Relation rel, HeapTuple oldtuple,
+ HeapTuple newtuple)
+{
+ pq_sendbyte(out, 'U'); /* action UPDATE */
+
+ /* use Oid as relation identifier */
+ pq_sendint(out, RelationGetRelid(rel), 4);
+
+ if (oldtuple != NULL)
+ {
+ pq_sendbyte(out, 'O'); /* old tuple follows */
+ logicalrep_write_tuple(out, rel, oldtuple);
+ }
+
+ pq_sendbyte(out, 'N'); /* new tuple follows */
+ logicalrep_write_tuple(out, rel, newtuple);
+}
+
+/*
+ * Read UPDATE from stream.
+ */
+LogicalRepRelId
+logicalrep_read_update(StringInfo in, bool *hasoldtup,
+ LogicalRepTupleData *oldtup,
+ LogicalRepTupleData *newtup)
+{
+ char action;
+ LogicalRepRelId relid;
+
+ /* read the relation id */
+ relid = pq_getmsgint(in, 4);
+
+ /* read and verify action */
+ action = pq_getmsgbyte(in);
+ if (action != 'O' && action != 'N')
+ elog(ERROR, "expected action 'N' or 'O', got %c",
+ action);
+
+ /* check for old tuple */
+ if (action == 'O')
+ {
+ logicalrep_read_tuple(in, oldtup);
+ *hasoldtup = true;
+ action = pq_getmsgbyte(in);
+ }
+ else
+ *hasoldtup = false;
+
+ /* check for new tuple */
+ if (action != 'N')
+ elog(ERROR, "expected action 'N', got %c",
+ action);
+
+ logicalrep_read_tuple(in, newtup);
+
+ return relid;
+}
+
+/*
+ * Write DELETE to the output stream.
+ */
+void
+logicalrep_write_delete(StringInfo out, Relation rel, HeapTuple oldtuple)
+{
+ pq_sendbyte(out, 'D'); /* action DELETE */
+
+ /* use Oid as relation identifier */
+ pq_sendint(out, RelationGetRelid(rel), 4);
+
+ pq_sendbyte(out, 'O'); /* old tuple follows */
+ logicalrep_write_tuple(out, rel, oldtuple);
+}
+
+/*
+ * Read DELETE from stream.
+ *
+ * Fills the old tuple.
+ */
+LogicalRepRelId
+logicalrep_read_delete(StringInfo in, LogicalRepTupleData *oldtup)
+{
+ char action;
+ LogicalRepRelId relid;
+
+ /* read the relation id */
+ relid = pq_getmsgint(in, 4);
+
+ /* read and verify action */
+ action = pq_getmsgbyte(in);
+ if (action != 'O')
+ elog(ERROR, "expected action 'O', got %c", action);
+
+ logicalrep_read_tuple(in, oldtup);
+
+ return relid;
+}
+
+/*
+ * Write relation description to the output stream.
+ */
+void
+logicalrep_write_rel(StringInfo out, Relation rel)
+{
+ char *nspname;
+ uint8 nspnamelen;
+ const char *relname;
+ uint8 relnamelen;
+
+ pq_sendbyte(out, 'R'); /* sending RELATION */
+
+ /* use Oid as relation identifier */
+ pq_sendint(out, RelationGetRelid(rel), 4);
+
+ nspname = get_namespace_name(RelationGetNamespace(rel));
+ if (nspname == NULL)
+ elog(ERROR, "cache lookup failed for namespace %u",
+ rel->rd_rel->relnamespace);
+ nspnamelen = strlen(nspname) + 1;
+
+ relname = RelationGetRelationName(rel);
+ relnamelen = strlen(relname) + 1;
+
+ pq_sendbyte(out, nspnamelen); /* schema name length */
+ pq_sendbytes(out, nspname, nspnamelen);
+
+ pq_sendbyte(out, relnamelen); /* table name length */
+ pq_sendbytes(out, relname, relnamelen);
+
+ /* send the attribute info */
+ logicalrep_write_attrs(out, rel);
+
+ pfree(nspname);
+}
+
+/*
+ * Read schema.relation from stream and return as LogicalRepRelation opened in
+ * lockmode.
+ */
+LogicalRepRelation *
+logicalrep_read_rel(StringInfo in)
+{
+ LogicalRepRelation *rel = palloc(sizeof(LogicalRepRelation));
+ int len;
+
+ rel->remoteid = pq_getmsgint(in, 4);
+
+ /* Read relation from stream */
+ len = pq_getmsgbyte(in);
+ rel->nspname = (char *) pq_getmsgbytes(in, len);
+
+ len = pq_getmsgbyte(in);
+ rel->relname = (char *) pq_getmsgbytes(in, len);
+
+ /* Get attribute description */
+ logicalrep_read_attrs(in, &rel->attnames, &rel->natts);
+
+ return rel;
+}
+
+/*
+ * Write a tuple to the outputstream, in the most efficient format possible.
+ */
+static void
+logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple)
+{
+ TupleDesc desc;
+ Datum values[MaxTupleAttributeNumber];
+ bool isnull[MaxTupleAttributeNumber];
+ int i;
+ uint16 nliveatts = 0;
+
+ desc = RelationGetDescr(rel);
+
+ pq_sendbyte(out, 'T'); /* sending TUPLE */
+
+ for (i = 0; i < desc->natts; i++)
+ {
+ if (desc->attrs[i]->attisdropped)
+ continue;
+ nliveatts++;
+ }
+ pq_sendint(out, nliveatts, 2);
+
+ /* try to allocate enough memory from the get go */
+ enlargeStringInfo(out, tuple->t_len +
+ nliveatts * (1 + 4));
+
+ heap_deform_tuple(tuple, desc, values, isnull);
+
+ /* Write the values */
+ for (i = 0; i < desc->natts; i++)
+ {
+ HeapTuple typtup;
+ Form_pg_type typclass;
+ Form_pg_attribute att = desc->attrs[i];
+ char *outputstr;
+ int len;
+
+ /* skip dropped columns */
+ if (att->attisdropped)
+ continue;
+
+ if (isnull[i])
+ {
+ pq_sendbyte(out, 'n'); /* null column */
+ continue;
+ }
+ else if (att->attlen == -1 && VARATT_IS_EXTERNAL_ONDISK(values[i]))
+ {
+ pq_sendbyte(out, 'u'); /* unchanged toast column */
+ continue;
+ }
+
+ typtup = SearchSysCache1(TYPEOID, ObjectIdGetDatum(att->atttypid));
+ if (!HeapTupleIsValid(typtup))
+ elog(ERROR, "cache lookup failed for type %u", att->atttypid);
+ typclass = (Form_pg_type) GETSTRUCT(typtup);
+
+ pq_sendbyte(out, 't'); /* 'text' data follows */
+
+ outputstr = OidOutputFunctionCall(typclass->typoutput, values[i]);
+ len = strlen(outputstr) + 1; /* null terminated */
+ pq_sendint(out, len, 4); /* length */
+ appendBinaryStringInfo(out, outputstr, len); /* data */
+
+ pfree(outputstr);
+
+ ReleaseSysCache(typtup);
+ }
+}
+
+/*
+ * Read tuple in remote format from stream.
+ *
+ * The returned tuple points into the input stringinfo.
+ */
+static void
+logicalrep_read_tuple(StringInfo in, LogicalRepTupleData *tuple)
+{
+ int i;
+ int natts;
+ char action;
+
+ /* Check that the action is what we expect. */
+ action = pq_getmsgbyte(in);
+ if (action != 'T')
+ elog(ERROR, "expected TUPLE, got %c", action);
+
+ /* Get of attributes. */
+ natts = pq_getmsgint(in, 2);
+
+ memset(tuple->changed, 0, sizeof(tuple->changed));
+
+ /* Read the data */
+ for (i = 0; i < natts; i++)
+ {
+ char kind;
+ int len;
+
+ kind = pq_getmsgbyte(in);
+
+ switch (kind)
+ {
+ case 'n': /* null */
+ tuple->values[i] = NULL;
+ tuple->changed[i] = true;
+ break;
+ case 'u': /* unchanged column */
+ tuple->values[i] = (char *) 0xdeadbeef; /* make bad usage more obvious */
+ break;
+ case 't': /* text formatted value */
+ {
+ tuple->changed[i] = true;
+
+ len = pq_getmsgint(in, 4); /* read length */
+
+ /* and data */
+ tuple->values[i] = (char *) pq_getmsgbytes(in, len);
+ }
+ break;
+ default:
+ elog(ERROR, "unknown data representation type '%c'", kind);
+ }
+ }
+}
+
+
+/*
+ * Write relation attributes to the outputstream.
+ */
+static void
+logicalrep_write_attrs(StringInfo out, Relation rel)
+{
+ TupleDesc desc;
+ int i;
+ uint16 nliveatts = 0;
+ Bitmapset *idattrs;
+
+ desc = RelationGetDescr(rel);
+
+ pq_sendbyte(out, 'A'); /* sending ATTRS */
+
+ /* send number of live attributes */
+ for (i = 0; i < desc->natts; i++)
+ {
+ if (desc->attrs[i]->attisdropped)
+ continue;
+ nliveatts++;
+ }
+ pq_sendint(out, nliveatts, 2);
+
+ /* fetch bitmap of REPLICATION IDENTITY attributes */
+ idattrs = RelationGetIndexAttrBitmap(rel, INDEX_ATTR_BITMAP_IDENTITY_KEY);
+
+ /* send the attributes */
+ for (i = 0; i < desc->natts; i++)
+ {
+ Form_pg_attribute att = desc->attrs[i];
+ uint8 flags = 0;
+ uint8 len;
+ const char *attname;
+
+ if (att->attisdropped)
+ continue;
+
+ if (bms_is_member(att->attnum - FirstLowInvalidHeapAttributeNumber,
+ idattrs))
+ flags |= IS_REPLICA_IDENTITY;
+
+ pq_sendbyte(out, 'C'); /* column definition follows */
+ pq_sendbyte(out, flags);
+
+ attname = NameStr(att->attname);
+ len = strlen(attname) + 1;
+ pq_sendbyte(out, len);
+ pq_sendbytes(out, attname, len); /* data */
+ }
+
+ bms_free(idattrs);
+}
+
+
+/*
+ * Read relation attribute names from the outputstream.
+ */
+static void
+logicalrep_read_attrs(StringInfo in, char ***attrnames, int *nattrnames)
+{
+ int i;
+ uint16 nattrs;
+ char **attrs;
+ char blocktype;
+
+ blocktype = pq_getmsgbyte(in);
+ if (blocktype != 'A')
+ elog(ERROR, "expected ATTRS, got %c", blocktype);
+
+ nattrs = pq_getmsgint(in, 2);
+ attrs = palloc(nattrs * sizeof(char *));
+
+ /* read the attributes */
+ for (i = 0; i < nattrs; i++)
+ {
+ uint8 len;
+
+ blocktype = pq_getmsgbyte(in); /* column definition follows */
+ if (blocktype != 'C')
+ elog(ERROR, "expected COLUMN, got %c", blocktype);
+
+ /* We ignore flags atm. */
+ (void) pq_getmsgbyte(in);
+
+ /* attribute name */
+ len = pq_getmsgbyte(in);
+ /* the string is NULL terminated */
+ attrs[i] = (char *) pq_getmsgbytes(in, len);
+ }
+
+ *attrnames = attrs;
+ *nattrnames = nattrs;
+}
diff --git a/src/backend/replication/pgoutput/Makefile b/src/backend/replication/pgoutput/Makefile
new file mode 100644
index 0000000..f25ed20
--- /dev/null
+++ b/src/backend/replication/pgoutput/Makefile
@@ -0,0 +1,33 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for src/backend/replication/pgoutput
+#
+# IDENTIFICATION
+# src/backend/replication/pgoutput
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+
+OBJS = pgoutput.o pgoutput_config.o $(WIN32RES)
+SHLIB_LINK = $(libpq)
+PGFILEDESC = "pgoutput - standard logical replication output plugin"
+NAME = pgoutput
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean maintainer-clean: clean-lib
+ rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
new file mode 100644
index 0000000..d74c7e9
--- /dev/null
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -0,0 +1,445 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput.c
+ * Logical Replication output plugin
+ *
+ * Copyright (c) 2012-2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * pgoutput.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xact.h"
+
+#include "mb/pg_wchar.h"
+
+#include "replication/logical.h"
+#include "replication/logicalproto.h"
+#include "replication/origin.h"
+#include "replication/pgoutput.h"
+#include "replication/publication.h"
+
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+
+PG_MODULE_MAGIC;
+
+extern void _PG_output_plugin_init(OutputPluginCallbacks *cb);
+
+static void pgoutput_startup(LogicalDecodingContext * ctx,
+ OutputPluginOptions *opt, bool is_init);
+static void pgoutput_shutdown(LogicalDecodingContext * ctx);
+static void pgoutput_begin_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn);
+static void pgoutput_commit_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void pgoutput_change(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, Relation rel,
+ ReorderBufferChange *change);
+static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
+ RepOriginId origin_id);
+
+/* Entry in the map used to remember which relation schemas we sent. */
+typedef struct RelSchemaSyncEntry
+{
+ Oid relid; /* relation oid */
+ bool schema_sent; /* did we send the schema? */
+} RelSchemaSyncEntry;
+
+/* Map used to remember which relation schemas we sent. */
+static HTAB *RelSchemaSyncCache = NULL;
+
+static void rel_schema_sync_cache_cb(Datum arg, Oid relid);
+
+static void init_rel_schema_sync_cache(MemoryContext decoding_context);
+static void destroy_rel_schema_sync_cache(void);
+static RelSchemaSyncEntry *get_rel_schema_sync_entry(Oid relid);
+static void rel_schema_sync_cache_cb(Datum arg, Oid relid);
+
+/*
+ * Specify output plugin callbacks
+ */
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+ AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+ cb->startup_cb = pgoutput_startup;
+ cb->begin_cb = pgoutput_begin_txn;
+ cb->change_cb = pgoutput_change;
+ cb->commit_cb = pgoutput_commit_txn;
+ cb->filter_by_origin_cb = pgoutput_origin_filter;
+ cb->shutdown_cb = pgoutput_shutdown;
+}
+
+/*
+ * Initialize this plugin
+ */
+static void
+pgoutput_startup(LogicalDecodingContext * ctx, OutputPluginOptions *opt,
+ bool is_init)
+{
+ PGOutputData *data = palloc0(sizeof(PGOutputData));
+ int client_encoding;
+
+ /* Create our memory context for private allocations. */
+ data->context = AllocSetContextCreate(ctx->context,
+ "logical replication output context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ ctx->output_plugin_private = data;
+
+ /*
+ * This is replication start and not slot initialization.
+ *
+ * Parse and validate options passed by the client.
+ */
+ if (!is_init)
+ {
+ /* We can only do binary */
+ if (opt->output_type != OUTPUT_PLUGIN_BINARY_OUTPUT)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("only binary mode is supported for logical replication protocol")));
+
+ /* Parse the params and ERROR if we see any we don't recognise */
+ pgoutput_process_parameters(ctx->output_plugin_options, data);
+
+ /* Check if we support requested protol */
+ if (data->protocol_version > LOGICALREP_PROTO_VERSION_NUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("client sent protocol_version=%d but we only support protocol %d or lower",
+ data->protocol_version, LOGICALREP_PROTO_VERSION_NUM)));
+
+ if (data->protocol_version < LOGICALREP_PROTO_MIN_VERSION_NUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("client sent protocol_version=%d but we only support protocol %d or higher",
+ data->protocol_version, LOGICALREP_PROTO_MIN_VERSION_NUM)));
+
+ /* Check for encoding match */
+ if (data->client_encoding == NULL ||
+ strlen(data->client_encoding) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("encoding parameter missing")));
+
+ client_encoding = pg_char_to_encoding(data->client_encoding);
+
+ if (client_encoding == -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognised encoding name %s passed to expected_encoding",
+ data->client_encoding)));
+
+ if (client_encoding != GetDatabaseEncoding())
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("encoding conversion for logical replication is not supported yet"),
+ errdetail("encoding %s must be unset or match server_encoding %s",
+ data->client_encoding, GetDatabaseEncodingName())));
+
+ if (list_length(data->publication_names) < 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("publication_names parameter missing")));
+
+ /* Initialize relation schema cache. */
+ init_rel_schema_sync_cache(CacheMemoryContext);
+ }
+}
+
+/*
+ * BEGIN callback
+ */
+static void
+pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+ bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+
+ OutputPluginPrepareWrite(ctx, !send_replication_origin);
+ logicalrep_write_begin(ctx->out, txn);
+
+ if (send_replication_origin)
+ {
+ char *origin;
+
+ /* Message boundary */
+ OutputPluginWrite(ctx, false);
+ OutputPluginPrepareWrite(ctx, true);
+
+ /*
+ * XXX: which behaviour we want here?
+ *
+ * Alternatives:
+ * - don't send origin message if origin name not found
+ * (that's what we do now)
+ * - throw error - that will break replication, not good
+ * - send some special "unknown" origin
+ */
+ if (replorigin_by_oid(txn->origin_id, true, &origin))
+ logicalrep_write_origin(ctx->out, origin, txn->origin_lsn);
+ }
+
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT callback
+ */
+static void
+pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_commit(ctx->out, txn, commit_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * Convert ReorderBufferChange to PublicationChangeType
+ */
+static PublicationChangeType
+get_publication_change_type(ReorderBufferChange *change)
+{
+ switch (change->action)
+ {
+ case REORDER_BUFFER_CHANGE_INSERT:
+ return PublicationChangeInsert;
+ case REORDER_BUFFER_CHANGE_UPDATE:
+ return PublicationChangeUpdate;
+ case REORDER_BUFFER_CHANGE_DELETE:
+ return PublicationChangeDelete;
+ default:
+ elog(ERROR, "unexpected action %d", change->action);
+ }
+}
+
+/*
+ * Sends the decoded DML over wire.
+ */
+static void
+pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ MemoryContext old;
+ RelSchemaSyncEntry *relentry = NULL;
+
+ /* First check the table filter */
+ if (!publication_change_is_replicated(relation,
+ get_publication_change_type(change),
+ data->publication_names))
+ return;
+
+ /* Avoid leaking memory by using and resetting our own context */
+ old = MemoryContextSwitchTo(data->context);
+
+ /*
+ * Write the relation schema if the current schema haven't been sent yet.
+ */
+ relentry = get_rel_schema_sync_entry(RelationGetRelid(relation));
+ if (!relentry->schema_sent)
+ {
+ OutputPluginPrepareWrite(ctx, false);
+ logicalrep_write_rel(ctx->out, relation);
+ OutputPluginWrite(ctx, false);
+ relentry->schema_sent = true;
+ }
+
+ /* Send the data */
+ switch (change->action)
+ {
+ case REORDER_BUFFER_CHANGE_INSERT:
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_insert(ctx->out, relation,
+ &change->data.tp.newtuple->tuple);
+ OutputPluginWrite(ctx, true);
+ break;
+ case REORDER_BUFFER_CHANGE_UPDATE:
+ {
+ HeapTuple oldtuple = change->data.tp.oldtuple ?
+ &change->data.tp.oldtuple->tuple : NULL;
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_update(ctx->out, relation, oldtuple,
+ &change->data.tp.newtuple->tuple);
+ OutputPluginWrite(ctx, true);
+ break;
+ }
+ case REORDER_BUFFER_CHANGE_DELETE:
+ if (change->data.tp.oldtuple)
+ {
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_delete(ctx->out, relation,
+ &change->data.tp.oldtuple->tuple);
+ OutputPluginWrite(ctx, true);
+ }
+ else
+ elog(DEBUG1, "didn't send DELETE change because of missing oldtuple");
+ break;
+ default:
+ Assert(false);
+ }
+
+ /* Cleanup */
+ MemoryContextSwitchTo(old);
+ MemoryContextReset(data->context);
+}
+
+/*
+ * Currently we always forward.
+ */
+static bool
+pgoutput_origin_filter(LogicalDecodingContext *ctx,
+ RepOriginId origin_id)
+{
+ return false;
+}
+
+/*
+ * Shutdown the output plugin.
+ *
+ * Note, we don't need to clean the data->context as it's child context
+ * of the ctx->context so it will be cleaned up by logical decoding machinery.
+ */
+static void
+pgoutput_shutdown(LogicalDecodingContext * ctx)
+{
+ destroy_rel_schema_sync_cache();
+}
+
+
+/*
+ * Initialize the relation schema sync cache for a decoding session.
+ *
+ * The hash table is destoyed at the end of a decoding session. While
+ * relcache invalidations still exist and will still be invoked, they
+ * will just see the null hash table global and take no action.
+ */
+static void
+init_rel_schema_sync_cache(MemoryContext cachectx)
+{
+ HASHCTL ctl;
+ int hash_flags;
+ MemoryContext old_ctxt;
+
+ if (RelSchemaSyncCache != NULL)
+ return;
+
+ /* Make a new hash table for the cache */
+ hash_flags = HASH_ELEM | HASH_CONTEXT;
+
+ MemSet(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(struct RelSchemaSyncEntry);
+ ctl.hcxt = cachectx;
+
+ hash_flags |= HASH_BLOBS;
+
+ old_ctxt = MemoryContextSwitchTo(cachectx);
+ RelSchemaSyncCache = hash_create("logical replication output relation cache",
+ 128, &ctl, hash_flags);
+ (void) MemoryContextSwitchTo(old_ctxt);
+
+ Assert(RelSchemaSyncCache != NULL);
+
+ CacheRegisterRelcacheCallback(rel_schema_sync_cache_cb, (Datum) 0);
+}
+
+/*
+ * Remove all the entries from our relation cache.
+ */
+static void
+destroy_rel_schema_sync_cache(void)
+{
+ HASH_SEQ_STATUS status;
+ RelSchemaSyncEntry *entry;
+
+ if (RelSchemaSyncCache == NULL)
+ return;
+
+ hash_seq_init(&status, RelSchemaSyncCache);
+
+ while ((entry = (RelSchemaSyncEntry *) hash_seq_search(&status)) != NULL)
+ {
+ if (hash_search(RelSchemaSyncCache, (void *) &entry->relid,
+ HASH_REMOVE, NULL) == NULL)
+ elog(ERROR, "hash table corrupted");
+ }
+
+ RelSchemaSyncCache = NULL;
+}
+
+/*
+ * Find or create entry in the relation schema cache.
+ */
+static RelSchemaSyncEntry *
+get_rel_schema_sync_entry(Oid relid)
+{
+ RelSchemaSyncEntry *entry;
+ bool found;
+ MemoryContext oldctx;
+
+ Assert(RelSchemaSyncCache != NULL);
+
+ /* Find cached function info, creating if not found */
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+ entry = (RelSchemaSyncEntry *) hash_search(RelSchemaSyncCache,
+ (void *) &relid,
+ HASH_ENTER, &found);
+ Assert(entry != NULL);
+ (void) MemoryContextSwitchTo(oldctx);
+
+ /* Not found means schema wasn't sent */
+ if (!found)
+ entry->schema_sent = false;
+
+ return entry;
+}
+
+/*
+ * Relcache invalidation callback
+ */
+static void
+rel_schema_sync_cache_cb(Datum arg, Oid relid)
+{
+ RelSchemaSyncEntry *entry;
+
+ /*
+ * We can get here if the plugin was used in SQL interface as the
+ * RelSchemaSyncCache is detroyed when the decoding finishes, but there
+ * is no way to unregister the relcache invalidation callback.
+ */
+ if (RelSchemaSyncCache == NULL)
+ return;
+
+ /*
+ * Nobody keeps pointers to entries in this hash table around outside
+ * logical decoding callback calls - but invalidation events can come in
+ * *during* a callback if we access the relcache in the callback. Because
+ * of that we must mark the cache entry as invalid but not remove it from
+ * the hash while it could still be referenced, then prune it at a later
+ * safe point.
+ *
+ * Getting invalidations for relations that aren't in the table is
+ * entirely normal, since there's no way to unregister for an
+ * invalidation event. So we don't care if it's found or not.
+ */
+ entry = (RelSchemaSyncEntry *) hash_search(RelSchemaSyncCache, &relid,
+ HASH_FIND, NULL);
+
+ /*
+ * Reset schema sent status as the relation definition may have
+ * changed.
+ */
+ if (entry != NULL)
+ entry->schema_sent = false;
+}
diff --git a/src/backend/replication/pgoutput/pgoutput_config.c b/src/backend/replication/pgoutput/pgoutput_config.c
new file mode 100644
index 0000000..335b971
--- /dev/null
+++ b/src/backend/replication/pgoutput/pgoutput_config.c
@@ -0,0 +1,188 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_param.c
+ * Logical Replication output plugin parameter parsing
+ *
+ * Copyright (c) 2012-2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * pgoutput_param.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/catversion.h"
+
+#include "mb/pg_wchar.h"
+
+#include "nodes/makefuncs.h"
+
+#include "replication/pgoutput.h"
+
+#include "utils/builtins.h"
+#include "utils/int8.h"
+
+
+typedef enum OutputParamType
+{
+ OUTPUT_PARAM_TYPE_UNDEFINED,
+ OUTPUT_PARAM_TYPE_UINT32,
+ OUTPUT_PARAM_TYPE_STRING
+} OutputParamType;
+
+/* param parsing */
+static int get_param_key(const char * const param_name);
+static Datum get_param_value(DefElem *elem, OutputParamType type,
+ bool null_ok);
+static uint32 parse_param_uint32(DefElem *elem);
+
+enum {
+ PARAM_UNRECOGNISED,
+ PARAM_PROTOCOL_VERSION,
+ PARAM_ENCODING,
+ PARAM_PG_VERSION,
+ PARAM_PUBLICATION_NAMES,
+} OutputPluginParamKey;
+
+typedef struct {
+ const char * const paramname;
+ int paramkey;
+} OutputPluginParam;
+
+/* Oh, if only C had switch on strings */
+static OutputPluginParam param_lookup[] = {
+ {"proto_version", PARAM_PROTOCOL_VERSION},
+ {"encoding", PARAM_ENCODING},
+ {"pg_version", PARAM_PG_VERSION},
+ {"publication_names", PARAM_PUBLICATION_NAMES},
+ {NULL, PARAM_UNRECOGNISED}
+};
+
+
+/*
+ * Read parameters sent by client at startup and store recognised
+ * ones in the parameters PGOutputData.
+ *
+ * The data must have all client-supplied parameter fields zeroed,
+ * such as by memset or palloc0, since values not supplied
+ * by the client are not set.
+ */
+void
+pgoutput_process_parameters(List *options, PGOutputData *data)
+{
+ ListCell *lc;
+
+ /* Examine all the other params in the message. */
+ foreach(lc, options)
+ {
+ DefElem *elem = lfirst(lc);
+ Datum val;
+
+ Assert(elem->arg == NULL || IsA(elem->arg, String));
+
+ /* Check each param, whether or not we recognise it */
+ switch(get_param_key(elem->defname))
+ {
+ case PARAM_PROTOCOL_VERSION:
+ val = get_param_value(elem, OUTPUT_PARAM_TYPE_UINT32, false);
+ data->protocol_version = DatumGetUInt32(val);
+ break;
+
+ case PARAM_ENCODING:
+ val = get_param_value(elem, OUTPUT_PARAM_TYPE_STRING, false);
+ data->client_encoding = DatumGetCString(val);
+ break;
+
+ case PARAM_PG_VERSION:
+ val = get_param_value(elem, OUTPUT_PARAM_TYPE_UINT32, false);
+ data->client_pg_version = DatumGetUInt32(val);
+ break;
+
+ case PARAM_PUBLICATION_NAMES:
+ val = get_param_value(elem, OUTPUT_PARAM_TYPE_STRING, false);
+ if (!SplitIdentifierString(DatumGetCString(val), ',',
+ &data->publication_names))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("invalid publication name syntax")));
+
+ break;
+
+ default:
+ ereport(ERROR,
+ (errmsg("Unrecognised pgoutput parameter %s",
+ elem->defname)));
+ break;
+ }
+ }
+}
+
+/*
+ * Look up a param name to find the enum value for the
+ * param, or PARAM_UNRECOGNISED if not found.
+ */
+static int
+get_param_key(const char * const param_name)
+{
+ OutputPluginParam *param = ¶m_lookup[0];
+
+ do {
+ if (strcmp(param->paramname, param_name) == 0)
+ return param->paramkey;
+ param++;
+ } while (param->paramname != NULL);
+
+ return PARAM_UNRECOGNISED;
+}
+
+/*
+ * Parse parameter as given type and return the value as Datum.
+ */
+static Datum
+get_param_value(DefElem *elem, OutputParamType type, bool null_ok)
+{
+ /* Check for NULL value */
+ if (elem->arg == NULL || strVal(elem->arg) == NULL)
+ {
+ if (null_ok)
+ return (Datum) 0;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("parameter \"%s\" cannot be NULL", elem->defname)));
+ }
+
+ switch (type)
+ {
+ case OUTPUT_PARAM_TYPE_UINT32:
+ return UInt32GetDatum(parse_param_uint32(elem));
+ case OUTPUT_PARAM_TYPE_STRING:
+ return CStringGetDatum(pstrdup(strVal(elem->arg)));
+ default:
+ elog(ERROR, "unknown parameter type %d", type);
+ }
+}
+
+/*
+ * Parse string DefElem as uint32.
+ */
+static uint32
+parse_param_uint32(DefElem *elem)
+{
+ int64 res;
+
+ if (!scanint8(strVal(elem->arg), true, &res))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse integer value \"%s\" for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+
+ if (res > PG_UINT32_MAX || res < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("value \"%s\" out of range for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+
+ return (uint32) res;
+}
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
new file mode 100644
index 0000000..b69d015
--- /dev/null
+++ b/src/include/replication/logicalproto.h
@@ -0,0 +1,76 @@
+/*-------------------------------------------------------------------------
+ *
+ * logicalproto.h
+ * logical replication protocol
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/include/replication/logicalproto.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LOGICAL_PROTO_H
+#define LOGICAL_PROTO_H
+
+#include "replication/reorderbuffer.h"
+#include "utils/rel.h"
+
+/*
+ * Protocol capabilities
+ *
+ * LOGICAL_PROTO_VERSION_NUM is our native protocol and the greatest version
+ * we can support. PGLOGICAL_PROTO_MIN_VERSION_NUM is the oldest version we
+ * have backwards compatibility for. The client requests protocol version at
+ * connect time.
+ */
+#define LOGICALREP_PROTO_MIN_VERSION_NUM 1
+#define LOGICALREP_PROTO_VERSION_NUM 1
+
+/* Tuple coming via logical replication. */
+typedef struct LogicalRepTupleData
+{
+ char *values[MaxTupleAttributeNumber]; /* value in out function format or NULL if values is NULL */
+ bool changed[MaxTupleAttributeNumber]; /* marker for changed/unchanged values */
+} LogicalRepTupleData;
+
+typedef uint32 LogicalRepRelId;
+
+/* Relation information */
+typedef struct LogicalRepRelation
+{
+ /* Info coming from the remote side. */
+ LogicalRepRelId remoteid; /* unique id of the relation */
+ char *nspname; /* schema name */
+ char *relname; /* relation name */
+ int natts; /* number of columns */
+ char **attnames; /* column names */
+} LogicalRepRelation;
+
+extern char *logicalrep_build_options(List *publications);
+extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
+extern void logicalrep_read_begin(StringInfo in, XLogRecPtr *remote_lsn,
+ TimestampTz *committime, TransactionId *remote_xid);
+extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+extern void logicalrep_read_commit(StringInfo in, XLogRecPtr *commit_lsn,
+ XLogRecPtr *end_lsn, TimestampTz *committime);
+extern void logicalrep_write_origin(StringInfo out, const char *origin,
+ XLogRecPtr origin_lsn);
+extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
+extern void logicalrep_write_insert(StringInfo out, Relation rel,
+ HeapTuple newtuple);
+extern LogicalRepRelId logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup);
+extern void logicalrep_write_update(StringInfo out, Relation rel, HeapTuple oldtuple,
+ HeapTuple newtuple);
+extern LogicalRepRelId logicalrep_read_update(StringInfo in, bool *hasoldtup,
+ LogicalRepTupleData *oldtup,
+ LogicalRepTupleData *newtup);
+extern void logicalrep_write_delete(StringInfo out, Relation rel,
+ HeapTuple oldtuple);
+extern LogicalRepRelId logicalrep_read_delete(StringInfo in, LogicalRepTupleData *oldtup);
+extern void logicalrep_write_rel(StringInfo out, Relation rel);
+
+extern LogicalRepRelation *logicalrep_read_rel(StringInfo in);
+
+#endif /* LOGICALREP_PROTO_H */
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
new file mode 100644
index 0000000..8cd82e7
--- /dev/null
+++ b/src/include/replication/pgoutput.h
@@ -0,0 +1,32 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput.h
+ * Logical Replication output plugin
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * pgoutput.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PGOUTPUT_H
+#define PGOUTPUT_H
+
+
+typedef struct PGOutputData
+{
+ MemoryContext context; /* pricate memory context for transient
+ * allocations */
+
+ /* client info */
+ uint32 protocol_version;
+ const char *client_encoding;
+ uint32 client_pg_version;
+
+ List *publication_names;
+} PGOutputData;
+
+extern void pgoutput_process_parameters(List *options, PGOutputData *data);
+
+#endif /* PGOUTPUT_H */
--
2.7.4
0004-Make-libpqwalreceiver-reentrant.patchapplication/x-patch; name=0004-Make-libpqwalreceiver-reentrant.patchDownload
From a1ff9a936566af8486179d7add4c2be0860cef18 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 6 Jul 2016 13:59:23 +0200
Subject: [PATCH 4/6] Make libpqwalreceiver reentrant
---
.../libpqwalreceiver/libpqwalreceiver.c | 328 ++++++++++++++-------
src/backend/replication/walreceiver.c | 67 +++--
src/include/replication/walreceiver.h | 75 +++--
3 files changed, 306 insertions(+), 164 deletions(-)
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 45dccb3..f28a792 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -25,6 +25,7 @@
#include "miscadmin.h"
#include "replication/walreceiver.h"
#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
#ifdef HAVE_POLL_H
#include <poll.h>
@@ -38,62 +39,83 @@
PG_MODULE_MAGIC;
-void _PG_init(void);
+typedef struct WalReceiverConnHandle {
+ /* Current connection to the primary, if any */
+ PGconn *streamConn;
+ /* Buffer for currently read records */
+ char *recvBuf;
+} WalReceiverConnHandle;
-/* Current connection to the primary, if any */
-static PGconn *streamConn = NULL;
-
-/* Buffer for currently read records */
-static char *recvBuf = NULL;
+PGDLLEXPORT WalReceiverConnHandle *_PG_walreceirver_conn_init(WalReceiverConnAPI *wrcapi);
/* Prototypes for interface functions */
-static void libpqrcv_connect(char *conninfo);
-static char *libpqrcv_get_conninfo(void);
-static void libpqrcv_identify_system(TimeLineID *primary_tli);
-static void libpqrcv_readtimelinehistoryfile(TimeLineID tli, char **filename, char **content, int *len);
-static bool libpqrcv_startstreaming(TimeLineID tli, XLogRecPtr startpoint,
- char *slotname);
-static void libpqrcv_endstreaming(TimeLineID *next_tli);
-static int libpqrcv_receive(char **buffer, pgsocket *wait_fd);
-static void libpqrcv_send(const char *buffer, int nbytes);
-static void libpqrcv_disconnect(void);
+static void libpqrcv_connect(WalReceiverConnHandle *handle, char *conninfo,
+ bool logical, const char *connname);
+static char *libpqrcv_get_conninfo(WalReceiverConnHandle *handle);
+static void libpqrcv_identify_system(WalReceiverConnHandle *handle,
+ TimeLineID *primary_tli);
+static void libpqrcv_readtimelinehistoryfile(WalReceiverConnHandle *handle,
+ TimeLineID tli, char **filename,
+ char **content, int *len);
+static char *libpqrcv_create_slot(WalReceiverConnHandle *handle,
+ char *slotname, bool logical,
+ XLogRecPtr *lsn);
+static bool libpqrcv_startstreaming_physical(WalReceiverConnHandle *handle,
+ TimeLineID tli, XLogRecPtr startpoint,
+ char *slotname);
+static bool libpqrcv_startstreaming_logical(WalReceiverConnHandle *handle,
+ XLogRecPtr startpoint, char *slotname,
+ char *options);
+static void libpqrcv_endstreaming(WalReceiverConnHandle *handle,
+ TimeLineID *next_tli);
+static int libpqrcv_receive(WalReceiverConnHandle *handle, char **buffer,
+ pgsocket *wait_fd);
+static void libpqrcv_send(WalReceiverConnHandle *handle, const char *buffer,
+ int nbytes);
+static void libpqrcv_disconnect(WalReceiverConnHandle *handle);
/* Prototypes for private functions */
-static bool libpq_select(int timeout_ms);
-static PGresult *libpqrcv_PQexec(const char *query);
+static bool libpq_select(WalReceiverConnHandle *handle,
+ int timeout_ms);
+static PGresult *libpqrcv_PQexec(WalReceiverConnHandle *handle,
+ const char *query);
/*
- * Module load callback
+ * Module initialization callback
*/
-void
-_PG_init(void)
+WalReceiverConnHandle *
+_PG_walreceirver_conn_init(WalReceiverConnAPI *wrcapi)
{
- /* Tell walreceiver how to reach us */
- if (walrcv_connect != NULL || walrcv_identify_system != NULL ||
- walrcv_readtimelinehistoryfile != NULL ||
- walrcv_startstreaming != NULL || walrcv_endstreaming != NULL ||
- walrcv_receive != NULL || walrcv_send != NULL ||
- walrcv_disconnect != NULL)
- elog(ERROR, "libpqwalreceiver already loaded");
- walrcv_connect = libpqrcv_connect;
- walrcv_get_conninfo = libpqrcv_get_conninfo;
- walrcv_identify_system = libpqrcv_identify_system;
- walrcv_readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile;
- walrcv_startstreaming = libpqrcv_startstreaming;
- walrcv_endstreaming = libpqrcv_endstreaming;
- walrcv_receive = libpqrcv_receive;
- walrcv_send = libpqrcv_send;
- walrcv_disconnect = libpqrcv_disconnect;
+ WalReceiverConnHandle *handle;
+
+ handle = palloc0(sizeof(WalReceiverConnHandle));
+
+ /* Tell caller how to reach us */
+ wrcapi->connect = libpqrcv_connect;
+ wrcapi->get_conninfo = libpqrcv_get_conninfo;
+ wrcapi->identify_system = libpqrcv_identify_system;
+ wrcapi->readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile;
+ wrcapi->create_slot = libpqrcv_create_slot;
+ wrcapi->startstreaming_physical = libpqrcv_startstreaming_physical;
+ wrcapi->startstreaming_logical = libpqrcv_startstreaming_logical;
+ wrcapi->endstreaming = libpqrcv_endstreaming;
+ wrcapi->receive = libpqrcv_receive;
+ wrcapi->send = libpqrcv_send;
+ wrcapi->disconnect = libpqrcv_disconnect;
+
+ return handle;
}
/*
* Establish the connection to the primary server for XLOG streaming
*/
static void
-libpqrcv_connect(char *conninfo)
+libpqrcv_connect(WalReceiverConnHandle *handle, char *conninfo, bool logical,
+ const char *connname)
{
const char *keys[5];
const char *vals[5];
+ int i = 0;
/*
* We use the expand_dbname parameter to process the connection string (or
@@ -102,22 +124,26 @@ libpqrcv_connect(char *conninfo)
* database name is ignored by the server in replication mode, but specify
* "replication" for .pgpass lookup.
*/
- keys[0] = "dbname";
- vals[0] = conninfo;
- keys[1] = "replication";
- vals[1] = "true";
- keys[2] = "dbname";
- vals[2] = "replication";
- keys[3] = "fallback_application_name";
- vals[3] = "walreceiver";
- keys[4] = NULL;
- vals[4] = NULL;
-
- streamConn = PQconnectdbParams(keys, vals, /* expand_dbname = */ true);
- if (PQstatus(streamConn) != CONNECTION_OK)
+ keys[i] = "dbname";
+ vals[i] = conninfo;
+ keys[++i] = "replication";
+ vals[i] = logical ? "database" : "true";
+ if (!logical)
+ {
+ keys[++i] = "dbname";
+ vals[i] = "replication";
+ }
+ keys[++i] = "fallback_application_name";
+ vals[i] = connname;
+ keys[++i] = NULL;
+ vals[i] = NULL;
+
+ handle->streamConn = PQconnectdbParams(keys, vals,
+ /* expand_dbname = */ true);
+ if (PQstatus(handle->streamConn) != CONNECTION_OK)
ereport(ERROR,
(errmsg("could not connect to the primary server: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
}
/*
@@ -125,17 +151,17 @@ libpqrcv_connect(char *conninfo)
* are obfuscated.
*/
static char *
-libpqrcv_get_conninfo(void)
+libpqrcv_get_conninfo(WalReceiverConnHandle *handle)
{
PQconninfoOption *conn_opts;
PQconninfoOption *conn_opt;
PQExpBufferData buf;
char *retval;
- Assert(streamConn != NULL);
+ Assert(handle->streamConn != NULL);
initPQExpBuffer(&buf);
- conn_opts = PQconninfo(streamConn);
+ conn_opts = PQconninfo(handle->streamConn);
if (conn_opts == NULL)
ereport(ERROR,
@@ -174,7 +200,8 @@ libpqrcv_get_conninfo(void)
* timeline ID of the primary.
*/
static void
-libpqrcv_identify_system(TimeLineID *primary_tli)
+libpqrcv_identify_system(WalReceiverConnHandle *handle,
+ TimeLineID *primary_tli)
{
PGresult *res;
char *primary_sysid;
@@ -184,14 +211,14 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
* Get the system identifier and timeline ID as a DataRow message from the
* primary server.
*/
- res = libpqrcv_PQexec("IDENTIFY_SYSTEM");
+ res = libpqrcv_PQexec(handle, "IDENTIFY_SYSTEM");
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
PQclear(res);
ereport(ERROR,
(errmsg("could not receive database system identifier and timeline ID from "
"the primary server: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
}
if (PQnfields(res) < 3 || PQntuples(res) != 1)
{
@@ -225,6 +252,43 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
}
/*
+ * Create new replication slot.
+ */
+static char *
+libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname,
+ bool logical, XLogRecPtr *lsn)
+{
+ PGresult *res;
+ char cmd[256];
+ char *snapshot;
+
+ if (logical)
+ snprintf(cmd, sizeof(cmd),
+ "CREATE_REPLICATION_SLOT \"%s\" LOGICAL %s",
+ slotname, "pgoutput");
+ else
+ snprintf(cmd, sizeof(cmd),
+ "CREATE_REPLICATION_SLOT \"%s\"", slotname);
+
+ res = libpqrcv_PQexec(handle, cmd);
+
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ elog(FATAL, "could not crate replication slot \"%s\": %s\n",
+ slotname, PQerrorMessage(handle->streamConn));
+ }
+
+ *lsn = DatumGetLSN(DirectFunctionCall1Coll(pg_lsn_in, InvalidOid,
+ CStringGetDatum(PQgetvalue(res, 0, 1))));
+ snapshot = pstrdup(PQgetvalue(res, 0, 2));
+
+ PQclear(res);
+
+ return snapshot;
+}
+
+
+/*
* Start streaming WAL data from given startpoint and timeline.
*
* Returns true if we switched successfully to copy-both mode. False
@@ -235,7 +299,9 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
* throws an ERROR.
*/
static bool
-libpqrcv_startstreaming(TimeLineID tli, XLogRecPtr startpoint, char *slotname)
+libpqrcv_startstreaming_physical(WalReceiverConnHandle *handle,
+ TimeLineID tli, XLogRecPtr startpoint,
+ char *slotname)
{
char cmd[256];
PGresult *res;
@@ -249,7 +315,49 @@ libpqrcv_startstreaming(TimeLineID tli, XLogRecPtr startpoint, char *slotname)
snprintf(cmd, sizeof(cmd),
"START_REPLICATION %X/%X TIMELINE %u",
(uint32) (startpoint >> 32), (uint32) startpoint, tli);
- res = libpqrcv_PQexec(cmd);
+ res = libpqrcv_PQexec(handle, cmd);
+
+ if (PQresultStatus(res) == PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ return false;
+ }
+ else if (PQresultStatus(res) != PGRES_COPY_BOTH)
+ {
+ PQclear(res);
+ ereport(ERROR,
+ (errmsg("could not start WAL streaming: %s",
+ PQerrorMessage(handle->streamConn))));
+ }
+ PQclear(res);
+ return true;
+}
+
+/*
+ * Same as above but for logical stream.
+ *
+ * The ERROR scenario can be that the options were incorrect for given
+ * slot.
+ */
+static bool
+libpqrcv_startstreaming_logical(WalReceiverConnHandle *handle,
+ XLogRecPtr startpoint, char *slotname,
+ char *options)
+{
+ StringInfoData cmd;
+ PGresult *res;
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "START_REPLICATION SLOT \"%s\" LOGICAL %X/%X",
+ slotname,
+ (uint32) (startpoint >> 32),
+ (uint32) startpoint);
+
+ /* Send options */
+ if (options)
+ appendStringInfo(&cmd, "( %s )", options);
+
+ res = libpqrcv_PQexec(handle, cmd.data);
if (PQresultStatus(res) == PGRES_COMMAND_OK)
{
@@ -261,25 +369,28 @@ libpqrcv_startstreaming(TimeLineID tli, XLogRecPtr startpoint, char *slotname)
PQclear(res);
ereport(ERROR,
(errmsg("could not start WAL streaming: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
}
PQclear(res);
+ pfree(cmd.data);
return true;
}
+
/*
* Stop streaming WAL data. Returns the next timeline's ID in *next_tli, as
* reported by the server, or 0 if it did not report it.
*/
static void
-libpqrcv_endstreaming(TimeLineID *next_tli)
+libpqrcv_endstreaming(WalReceiverConnHandle *handle, TimeLineID *next_tli)
{
PGresult *res;
- if (PQputCopyEnd(streamConn, NULL) <= 0 || PQflush(streamConn))
+ if (PQputCopyEnd(handle->streamConn, NULL) <= 0 ||
+ PQflush(handle->streamConn))
ereport(ERROR,
(errmsg("could not send end-of-streaming message to primary: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
/*
* After COPY is finished, we should receive a result set indicating the
@@ -291,7 +402,7 @@ libpqrcv_endstreaming(TimeLineID *next_tli)
* called after receiving CopyDone from the backend - the walreceiver
* never terminates replication on its own initiative.
*/
- res = PQgetResult(streamConn);
+ res = PQgetResult(handle->streamConn);
if (PQresultStatus(res) == PGRES_TUPLES_OK)
{
/*
@@ -305,7 +416,7 @@ libpqrcv_endstreaming(TimeLineID *next_tli)
PQclear(res);
/* the result set should be followed by CommandComplete */
- res = PQgetResult(streamConn);
+ res = PQgetResult(handle->streamConn);
}
else
*next_tli = 0;
@@ -313,23 +424,24 @@ libpqrcv_endstreaming(TimeLineID *next_tli)
if (PQresultStatus(res) != PGRES_COMMAND_OK)
ereport(ERROR,
(errmsg("error reading result of streaming command: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
PQclear(res);
/* Verify that there are no more results */
- res = PQgetResult(streamConn);
+ res = PQgetResult(handle->streamConn);
if (res != NULL)
ereport(ERROR,
(errmsg("unexpected result after CommandComplete: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
}
/*
* Fetch the timeline history file for 'tli' from primary.
*/
static void
-libpqrcv_readtimelinehistoryfile(TimeLineID tli,
- char **filename, char **content, int *len)
+libpqrcv_readtimelinehistoryfile(WalReceiverConnHandle *handle,
+ TimeLineID tli, char **filename,
+ char **content, int *len)
{
PGresult *res;
char cmd[64];
@@ -338,14 +450,14 @@ libpqrcv_readtimelinehistoryfile(TimeLineID tli,
* Request the primary to send over the history file for given timeline.
*/
snprintf(cmd, sizeof(cmd), "TIMELINE_HISTORY %u", tli);
- res = libpqrcv_PQexec(cmd);
+ res = libpqrcv_PQexec(handle, cmd);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
PQclear(res);
ereport(ERROR,
(errmsg("could not receive timeline history file from "
"the primary server: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
}
if (PQnfields(res) != 2 || PQntuples(res) != 1)
{
@@ -375,22 +487,23 @@ libpqrcv_readtimelinehistoryfile(TimeLineID tli,
* This is based on pqSocketCheck.
*/
static bool
-libpq_select(int timeout_ms)
+libpq_select(WalReceiverConnHandle *handle, int timeout_ms)
{
int ret;
- Assert(streamConn != NULL);
- if (PQsocket(streamConn) < 0)
+ Assert(handle->streamConn != NULL);
+ if (PQsocket(handle->streamConn) < 0)
ereport(ERROR,
(errcode_for_socket_access(),
- errmsg("invalid socket: %s", PQerrorMessage(streamConn))));
+ errmsg("invalid socket: %s",
+ PQerrorMessage(handle->streamConn))));
/* We use poll(2) if available, otherwise select(2) */
{
#ifdef HAVE_POLL
struct pollfd input_fd;
- input_fd.fd = PQsocket(streamConn);
+ input_fd.fd = PQsocket(handle->streamConn);
input_fd.events = POLLIN | POLLERR;
input_fd.revents = 0;
@@ -402,7 +515,7 @@ libpq_select(int timeout_ms)
struct timeval *ptr_timeout;
FD_ZERO(&input_mask);
- FD_SET(PQsocket(streamConn), &input_mask);
+ FD_SET(PQsocket(handle->streamConn), &input_mask);
if (timeout_ms < 0)
ptr_timeout = NULL;
@@ -413,7 +526,7 @@ libpq_select(int timeout_ms)
ptr_timeout = &timeout;
}
- ret = select(PQsocket(streamConn) + 1, &input_mask,
+ ret = select(PQsocket(handle->streamConn) + 1, &input_mask,
NULL, NULL, ptr_timeout);
#endif /* HAVE_POLL */
}
@@ -444,7 +557,7 @@ libpq_select(int timeout_ms)
* Queries are always executed on the connection in streamConn.
*/
static PGresult *
-libpqrcv_PQexec(const char *query)
+libpqrcv_PQexec(WalReceiverConnHandle *handle, const char *query)
{
PGresult *result = NULL;
PGresult *lastResult = NULL;
@@ -459,7 +572,7 @@ libpqrcv_PQexec(const char *query)
* Submit a query. Since we don't use non-blocking mode, this also can
* block. But its risk is relatively small, so we ignore that for now.
*/
- if (!PQsendQuery(streamConn, query))
+ if (!PQsendQuery(handle->streamConn, query))
return NULL;
for (;;)
@@ -468,7 +581,7 @@ libpqrcv_PQexec(const char *query)
* Receive data until PQgetResult is ready to get the result without
* blocking.
*/
- while (PQisBusy(streamConn))
+ while (PQisBusy(handle->streamConn))
{
/*
* We don't need to break down the sleep into smaller increments,
@@ -476,9 +589,9 @@ libpqrcv_PQexec(const char *query)
* elog(FATAL) within SIGTERM signal handler if the signal arrives
* in the middle of establishment of replication connection.
*/
- if (!libpq_select(-1))
+ if (!libpq_select(handle, -1))
continue; /* interrupted */
- if (PQconsumeInput(streamConn) == 0)
+ if (PQconsumeInput(handle->streamConn) == 0)
return NULL; /* trouble */
}
@@ -487,7 +600,7 @@ libpqrcv_PQexec(const char *query)
* there are many. Since walsender will never generate multiple
* results, we skip the concatenation of error messages.
*/
- result = PQgetResult(streamConn);
+ result = PQgetResult(handle->streamConn);
if (result == NULL)
break; /* query is complete */
@@ -497,7 +610,7 @@ libpqrcv_PQexec(const char *query)
if (PQresultStatus(lastResult) == PGRES_COPY_IN ||
PQresultStatus(lastResult) == PGRES_COPY_OUT ||
PQresultStatus(lastResult) == PGRES_COPY_BOTH ||
- PQstatus(streamConn) == CONNECTION_BAD)
+ PQstatus(handle->streamConn) == CONNECTION_BAD)
break;
}
@@ -508,10 +621,10 @@ libpqrcv_PQexec(const char *query)
* Disconnect connection to primary, if any.
*/
static void
-libpqrcv_disconnect(void)
+libpqrcv_disconnect(WalReceiverConnHandle *handle)
{
- PQfinish(streamConn);
- streamConn = NULL;
+ PQfinish(handle->streamConn);
+ handle->streamConn = NULL;
}
/*
@@ -531,30 +644,31 @@ libpqrcv_disconnect(void)
* ereports on error.
*/
static int
-libpqrcv_receive(char **buffer, pgsocket *wait_fd)
+libpqrcv_receive(WalReceiverConnHandle *handle, char **buffer,
+ pgsocket *wait_fd)
{
int rawlen;
- if (recvBuf != NULL)
- PQfreemem(recvBuf);
- recvBuf = NULL;
+ if (handle->recvBuf != NULL)
+ PQfreemem(handle->recvBuf);
+ handle->recvBuf = NULL;
/* Try to receive a CopyData message */
- rawlen = PQgetCopyData(streamConn, &recvBuf, 1);
+ rawlen = PQgetCopyData(handle->streamConn, &handle->recvBuf, 1);
if (rawlen == 0)
{
/* Try consuming some data. */
- if (PQconsumeInput(streamConn) == 0)
+ if (PQconsumeInput(handle->streamConn) == 0)
ereport(ERROR,
(errmsg("could not receive data from WAL stream: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
/* Now that we've consumed some input, try again */
- rawlen = PQgetCopyData(streamConn, &recvBuf, 1);
+ rawlen = PQgetCopyData(handle->streamConn, &handle->recvBuf, 1);
if (rawlen == 0)
{
/* Tell caller to try again when our socket is ready. */
- *wait_fd = PQsocket(streamConn);
+ *wait_fd = PQsocket(handle->streamConn);
return 0;
}
}
@@ -562,7 +676,7 @@ libpqrcv_receive(char **buffer, pgsocket *wait_fd)
{
PGresult *res;
- res = PQgetResult(streamConn);
+ res = PQgetResult(handle->streamConn);
if (PQresultStatus(res) == PGRES_COMMAND_OK ||
PQresultStatus(res) == PGRES_COPY_IN)
{
@@ -574,16 +688,16 @@ libpqrcv_receive(char **buffer, pgsocket *wait_fd)
PQclear(res);
ereport(ERROR,
(errmsg("could not receive data from WAL stream: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
}
}
if (rawlen < -1)
ereport(ERROR,
(errmsg("could not receive data from WAL stream: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
/* Return received messages to caller */
- *buffer = recvBuf;
+ *buffer = handle->recvBuf;
return rawlen;
}
@@ -593,11 +707,11 @@ libpqrcv_receive(char **buffer, pgsocket *wait_fd)
* ereports on error.
*/
static void
-libpqrcv_send(const char *buffer, int nbytes)
+libpqrcv_send(WalReceiverConnHandle *handle, const char *buffer, int nbytes)
{
- if (PQputCopyData(streamConn, buffer, nbytes) <= 0 ||
- PQflush(streamConn))
+ if (PQputCopyData(handle->streamConn, buffer, nbytes) <= 0 ||
+ PQflush(handle->streamConn))
ereport(ERROR,
(errmsg("could not send data to WAL stream: %s",
- PQerrorMessage(streamConn))));
+ PQerrorMessage(handle->streamConn))));
}
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 413ee3a..68e3df5 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -51,6 +51,7 @@
#include "access/transam.h"
#include "access/xlog_internal.h"
#include "catalog/pg_type.h"
+#include "fmgr.h"
#include "funcapi.h"
#include "libpq/pqformat.h"
#include "libpq/pqsignal.h"
@@ -73,16 +74,9 @@ int wal_receiver_status_interval;
int wal_receiver_timeout;
bool hot_standby_feedback;
-/* libpqreceiver hooks to these when loaded */
-walrcv_connect_type walrcv_connect = NULL;
-walrcv_get_conninfo_type walrcv_get_conninfo = NULL;
-walrcv_identify_system_type walrcv_identify_system = NULL;
-walrcv_startstreaming_type walrcv_startstreaming = NULL;
-walrcv_endstreaming_type walrcv_endstreaming = NULL;
-walrcv_readtimelinehistoryfile_type walrcv_readtimelinehistoryfile = NULL;
-walrcv_receive_type walrcv_receive = NULL;
-walrcv_send_type walrcv_send = NULL;
-walrcv_disconnect_type walrcv_disconnect = NULL;
+/* filled by libpqreceiver when loaded */
+static WalReceiverConnAPI *wrcapi = NULL;
+static WalReceiverConnHandle *wrchandle = NULL;
#define NAPTIME_PER_CYCLE 100 /* max sleep time between cycles (100ms) */
@@ -202,6 +196,7 @@ WalReceiverMain(void)
WalRcvData *walrcv = WalRcv;
TimestampTz last_recv_timestamp;
bool ping_sent;
+ walrcvconn_init_fn walrcvconn_init;
/*
* WalRcv should be set up already (if we are a backend, we inherit this
@@ -284,15 +279,24 @@ WalReceiverMain(void)
sigdelset(&BlockSig, SIGQUIT);
/* Load the libpq-specific functions */
- load_file("libpqwalreceiver", false);
- if (walrcv_connect == NULL ||
- walrcv_get_conninfo == NULL ||
- walrcv_startstreaming == NULL ||
- walrcv_endstreaming == NULL ||
- walrcv_identify_system == NULL ||
- walrcv_readtimelinehistoryfile == NULL ||
- walrcv_receive == NULL || walrcv_send == NULL ||
- walrcv_disconnect == NULL)
+ wrcapi = palloc0(sizeof(WalReceiverConnAPI));
+
+ walrcvconn_init = (walrcvconn_init_fn)
+ load_external_function("libpqwalreceiver",
+ "_PG_walreceirver_conn_init", false, NULL);
+
+ if (walrcvconn_init == NULL)
+ elog(ERROR, "libpqwalreceiver does not declare _PG_walreceirver_conn_init symbol");
+
+ wrchandle = walrcvconn_init(wrcapi);
+ if (wrcapi->connect == NULL ||
+ wrcapi->get_conninfo == NULL ||
+ wrcapi->startstreaming_physical == NULL ||
+ wrcapi->endstreaming == NULL ||
+ wrcapi->identify_system == NULL ||
+ wrcapi->readtimelinehistoryfile == NULL ||
+ wrcapi->receive == NULL || wrcapi->send == NULL ||
+ wrcapi->disconnect == NULL)
elog(ERROR, "libpqwalreceiver didn't initialize correctly");
/*
@@ -306,14 +310,14 @@ WalReceiverMain(void)
/* Establish the connection to the primary for XLOG streaming */
EnableWalRcvImmediateExit();
- walrcv_connect(conninfo);
+ wrcapi->connect(wrchandle, conninfo, false, "walreceiver");
DisableWalRcvImmediateExit();
/*
* Save user-visible connection string. This clobbers the original
* conninfo, for security.
*/
- tmp_conninfo = walrcv_get_conninfo();
+ tmp_conninfo = wrcapi->get_conninfo(wrchandle);
SpinLockAcquire(&walrcv->mutex);
memset(walrcv->conninfo, 0, MAXCONNINFO);
if (tmp_conninfo)
@@ -332,7 +336,7 @@ WalReceiverMain(void)
* IDENTIFY_SYSTEM replication command,
*/
EnableWalRcvImmediateExit();
- walrcv_identify_system(&primaryTLI);
+ wrcapi->identify_system(wrchandle, &primaryTLI);
DisableWalRcvImmediateExit();
/*
@@ -369,7 +373,8 @@ WalReceiverMain(void)
* on the new timeline.
*/
ThisTimeLineID = startpointTLI;
- if (walrcv_startstreaming(startpointTLI, startpoint,
+ if (wrcapi->startstreaming_physical(wrchandle, startpointTLI,
+ startpoint,
slotname[0] != '\0' ? slotname : NULL))
{
if (first_stream)
@@ -421,7 +426,7 @@ WalReceiverMain(void)
}
/* See if we can read data immediately */
- len = walrcv_receive(&buf, &wait_fd);
+ len = wrcapi->receive(wrchandle, &buf, &wait_fd);
if (len != 0)
{
/*
@@ -452,7 +457,7 @@ WalReceiverMain(void)
endofwal = true;
break;
}
- len = walrcv_receive(&buf, &wait_fd);
+ len = wrcapi->receive(wrchandle, &buf, &wait_fd);
}
/* Let the master know that we received some data. */
@@ -568,7 +573,7 @@ WalReceiverMain(void)
* our side, too.
*/
EnableWalRcvImmediateExit();
- walrcv_endstreaming(&primaryTLI);
+ wrcapi->endstreaming(wrchandle, &primaryTLI);
DisableWalRcvImmediateExit();
/*
@@ -723,7 +728,7 @@ WalRcvFetchTimeLineHistoryFiles(TimeLineID first, TimeLineID last)
tli)));
EnableWalRcvImmediateExit();
- walrcv_readtimelinehistoryfile(tli, &fname, &content, &len);
+ wrcapi->readtimelinehistoryfile(wrchandle, tli, &fname, &content, &len);
DisableWalRcvImmediateExit();
/*
@@ -775,8 +780,8 @@ WalRcvDie(int code, Datum arg)
SpinLockRelease(&walrcv->mutex);
/* Terminate the connection gracefully. */
- if (walrcv_disconnect != NULL)
- walrcv_disconnect();
+ if (wrcapi->disconnect != NULL)
+ wrcapi->disconnect(wrchandle);
/* Wake up the startup process to notice promptly that we're gone */
WakeupRecovery();
@@ -1146,7 +1151,7 @@ XLogWalRcvSendReply(bool force, bool requestReply)
(uint32) (applyPtr >> 32), (uint32) applyPtr,
requestReply ? " (reply requested)" : "");
- walrcv_send(reply_message.data, reply_message.len);
+ wrcapi->send(wrchandle, reply_message.data, reply_message.len);
}
/*
@@ -1224,7 +1229,7 @@ XLogWalRcvSendHSFeedback(bool immed)
pq_sendint64(&reply_message, GetCurrentIntegerTimestamp());
pq_sendint(&reply_message, xmin, 4);
pq_sendint(&reply_message, nextEpoch, 4);
- walrcv_send(reply_message.data, reply_message.len);
+ wrcapi->send(wrchandle, reply_message.data, reply_message.len);
if (TransactionIdIsValid(xmin))
master_has_standby_xmin = true;
else
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index cd787c9..0f42008 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -133,33 +133,56 @@ typedef struct
extern WalRcvData *WalRcv;
-/* libpqwalreceiver hooks */
-typedef void (*walrcv_connect_type) (char *conninfo);
-extern PGDLLIMPORT walrcv_connect_type walrcv_connect;
-
-typedef char *(*walrcv_get_conninfo_type) (void);
-extern PGDLLIMPORT walrcv_get_conninfo_type walrcv_get_conninfo;
-
-typedef void (*walrcv_identify_system_type) (TimeLineID *primary_tli);
-extern PGDLLIMPORT walrcv_identify_system_type walrcv_identify_system;
-
-typedef void (*walrcv_readtimelinehistoryfile_type) (TimeLineID tli, char **filename, char **content, int *size);
-extern PGDLLIMPORT walrcv_readtimelinehistoryfile_type walrcv_readtimelinehistoryfile;
-
-typedef bool (*walrcv_startstreaming_type) (TimeLineID tli, XLogRecPtr startpoint, char *slotname);
-extern PGDLLIMPORT walrcv_startstreaming_type walrcv_startstreaming;
+struct WalReceiverConnHandle;
+typedef struct WalReceiverConnHandle WalReceiverConnHandle;
-typedef void (*walrcv_endstreaming_type) (TimeLineID *next_tli);
-extern PGDLLIMPORT walrcv_endstreaming_type walrcv_endstreaming;
-
-typedef int (*walrcv_receive_type) (char **buffer, pgsocket *wait_fd);
-extern PGDLLIMPORT walrcv_receive_type walrcv_receive;
-
-typedef void (*walrcv_send_type) (const char *buffer, int nbytes);
-extern PGDLLIMPORT walrcv_send_type walrcv_send;
-
-typedef void (*walrcv_disconnect_type) (void);
-extern PGDLLIMPORT walrcv_disconnect_type walrcv_disconnect;
+/* libpqwalreceiver hooks */
+typedef void (*walrcvconn_connect_fn) (
+ WalReceiverConnHandle *handle,
+ char *conninfo, bool logical,
+ const char *connname);
+typedef char *(*walrcvconn_get_conninfo_fn) (WalReceiverConnHandle *handle);
+typedef void (*walrcvconn_identify_system_fn) (WalReceiverConnHandle *handle,
+ TimeLineID *primary_tli);
+typedef void (*walrcvconn_readtimelinehistoryfile_fn) (
+ WalReceiverConnHandle *handle,
+ TimeLineID tli, char **filename,
+ char **content, int *size);
+typedef char *(*walrcvconn_create_slot_fn) (
+ WalReceiverConnHandle *handle,
+ char *slotname, bool logical,
+ XLogRecPtr *lsn);
+typedef bool (*walrcvconn_startstreaming_physical_fn) (
+ WalReceiverConnHandle *handle,
+ TimeLineID tli, XLogRecPtr startpoint,
+ char *slotname);
+typedef bool (*walrcvconn_startstreaming_logical_fn) (
+ WalReceiverConnHandle *handle,
+ XLogRecPtr startpoint, char *slotname,
+ char *options);
+typedef void (*walrcvconn_endstreaming_fn) (WalReceiverConnHandle *handle,
+ TimeLineID *next_tli);
+typedef int (*walrcvconn_receive_fn) (WalReceiverConnHandle *handle,
+ char **buffer, pgsocket *wait_fd);
+typedef void (*walrcvconn_send_fn) (WalReceiverConnHandle *handle,
+ const char *buffer, int nbytes);
+typedef void (*walrcvconn_disconnect_fn) (WalReceiverConnHandle *handle);
+
+typedef struct WalReceiverConnAPI {
+ walrcvconn_connect_fn connect;
+ walrcvconn_get_conninfo_fn get_conninfo;
+ walrcvconn_identify_system_fn identify_system;
+ walrcvconn_readtimelinehistoryfile_fn readtimelinehistoryfile;
+ walrcvconn_create_slot_fn create_slot;
+ walrcvconn_startstreaming_physical_fn startstreaming_physical;
+ walrcvconn_startstreaming_logical_fn startstreaming_logical;
+ walrcvconn_endstreaming_fn endstreaming;
+ walrcvconn_receive_fn receive;
+ walrcvconn_send_fn send;
+ walrcvconn_disconnect_fn disconnect;
+} WalReceiverConnAPI;
+
+typedef WalReceiverConnHandle *(*walrcvconn_init_fn)(WalReceiverConnAPI *wrconn);
/* prototypes for functions in walreceiver.c */
extern void WalReceiverMain(void) pg_attribute_noreturn();
--
2.7.4
0005-Add-logical-replication-workers.patchapplication/x-patch; name=0005-Add-logical-replication-workers.patchDownload
From f3a729fad9e2a54e0efd83c57562791ab49b4317 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 13 Jul 2016 20:00:06 +0200
Subject: [PATCH 5/6] Add logical replication workers
---
doc/src/sgml/catalogs.sgml | 8 +-
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 361 +++++
doc/src/sgml/postgres.sgml | 1 +
doc/src/sgml/reference.sgml | 6 +
src/backend/commands/subscriptioncmds.c | 175 ++-
src/backend/executor/nodeModifyTable.c | 6 +-
src/backend/postmaster/bgworker.c | 6 +-
src/backend/postmaster/postmaster.c | 41 +
.../libpqwalreceiver/libpqwalreceiver.c | 28 +-
src/backend/replication/logical/Makefile | 5 +-
src/backend/replication/logical/apply.c | 1435 ++++++++++++++++++++
src/backend/replication/logical/launcher.c | 542 ++++++++
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/lmgr/lwlocknames.txt | 1 +
src/backend/utils/misc/guc.c | 22 +
src/include/executor/nodeModifyTable.h | 20 +
src/include/postmaster/bgworker_internals.h | 2 +
src/include/replication/logicalworker.h | 41 +
src/include/replication/walreceiver.h | 4 +
src/test/perl/PostgresNode.pm | 10 +-
src/test/subscription/.gitignore | 2 +
src/test/subscription/Makefile | 20 +
src/test/subscription/README | 16 +
src/test/subscription/t/001_rep_changes.pl | 89 ++
src/test/subscription/t/002_types.pl | 509 +++++++
26 files changed, 3338 insertions(+), 16 deletions(-)
create mode 100644 doc/src/sgml/logical-replication.sgml
create mode 100644 src/backend/replication/logical/apply.c
create mode 100644 src/backend/replication/logical/launcher.c
create mode 100644 src/include/replication/logicalworker.h
create mode 100644 src/test/subscription/.gitignore
create mode 100644 src/test/subscription/Makefile
create mode 100644 src/test/subscription/README
create mode 100644 src/test/subscription/t/001_rep_changes.pl
create mode 100644 src/test/subscription/t/002_types.pl
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 84211c1..5951716 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -5134,7 +5134,8 @@
<para>
The <structname>pg_publication</structname> catalog contains
- all publications created in the database.
+ all publications created in the database. For more on publications
+ see <xref linkend="logical-replication-publication">.
</para>
<table>
@@ -6051,7 +6052,8 @@
<para>
The <structname>pg_subscription</structname> catalog contains
- all existing logical replication subscriptions.
+ all existing logical replication subscriptions. For more information
+ about logical replication see <xref linkend="logical-replication">.
</para>
<para>
@@ -6120,7 +6122,7 @@
<entry><type>text[]</type></entry>
<entry></entry>
<entry>Array of subscribed publication names. For more on publications
- see <xref linkend="publications">.
+ see <xref linkend="logical-replication-publication">.
</entry>
</row>
</tbody>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 4383711..7067a21 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
<!ENTITY config SYSTEM "config.sgml">
<!ENTITY user-manag SYSTEM "user-manag.sgml">
<!ENTITY wal SYSTEM "wal.sgml">
+<!ENTITY logical-replication SYSTEM "logical-replication.sgml">
<!-- programmer's guide -->
<!ENTITY bgworker SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
new file mode 100644
index 0000000..3179add
--- /dev/null
+++ b/doc/src/sgml/logical-replication.sgml
@@ -0,0 +1,361 @@
+<!-- doc/src/sgml/logical-replication.sgml -->
+
+<chapter id="logical-replication">
+
+ <title>Logical Replication</title>
+ <para>
+ Logical Replication is a method of replicating data objects and their
+ changes, based upon their Primary Keys (or Replication Identity). We
+ use the term Logical in contrast to Physical replication which
+ uses exact block addresses and byte-by-byte replication.
+ PostgreSQL supports both mechanisms concurrently, see
+ <xref linkend="high-availability">. Logical Replication allows
+ fine-grained control over both data replication and security.
+ </para>
+ <para>
+ Logical Replication uses a Publish and Subscribe model with one or
+ more Subscribers subscribing to one or more Publications on a
+ Provider node. Subscribers pull data from the Publications they
+ subscribe to and may subsequently re-publish data to allow
+ cascading replication or more complex configurations.
+ </para>
+ <para>
+ Logical replication typically starts with snapshot of the data on
+ the Provider database. Once that is done, the changes on Provider
+ are sent to Subscriber as they occur in real-time. The Subscriber
+ applies the data in same order as the Provider so that the
+ transactional consistency is guaranteed for the Publications within
+ single Subscription. This method of data replication is sometimes
+ referred to as transactional replication.
+ </para>
+ <para>
+ The typical use-cases for logical replication are:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ Sending incremental changes in a single database or a subset of
+ a database to Subscribers as they occur.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Firing triggers for individual changes as they are incoming to
+ subscriber.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Consolidating multiple databases into a single one (for example
+ for analytical purposes).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Replicating between different major versions of the PostgreSQL
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Giving access to the replicated data to different groups of
+ users.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Sharing subset of the database between multiple databases.
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ The Subscriber database behaves in a same way as any other
+ PostgreSQL instance and can be used as a Provider for other
+ databases by defining its own Publications. When the Subscriber is
+ treated as read-only by application, there will be no conflicts from
+ a single Subscription. On the other hand if there are other writes
+ done either by application or other Subscribers to the same set of
+ tables conflicts can arise.
+ </para>
+
+<sect1 id="logical-replication-publication">
+ <title>Publication</title>
+ <para>
+ A Publication object can be defined on any master node, owned by one
+ user. A Publication is a set of changes generated from a group of
+ tables, and might also be described as a Change Set or Replication Set.
+ Each Publication exists in only one database.
+ </para>
+ <para>
+ Publications are different from table schema and do not affect
+ how the table is accessed. Each table can be added to multiple
+ Publications if needed. Publications may include both tables
+ and materialized views. Objects must be added explicitly, except
+ when a Publication is created for "ALL TABLES". There is no
+ default name for a Publication which specifies all tables.
+ </para>
+ <para>
+ The Publication is different from table schema, it does not affect
+ how the table is accessed and each table can be added to multiple
+ Publications as needed. All tables in the database can be added to
+ single Publication. Since the logical replication requires REPLICA
+ IDENTITY index to be present on a table for replication of UPDATEs
+ and DELETEs, the table without such index can't be added to
+ Publication which replicates UPDATEs and DELETEs.
+ </para>
+ <para>
+ Publications can choose to limit the changes they produce to show
+ any combination of INSERT, UPDATE, DELETE and TRUNCATE in a similar
+ way to the way triggers are fired by particular event types.
+ </para>
+ <para>
+ All tables added to the Publication must be accessible via SELECT
+ privilege for the user owning the Publication. Usage on the
+ Publication can be GRANTed to other users.
+ </para>
+ <para>
+ The definition of a Publication object will be included within
+ pg_dump.
+ </para>
+ <para>
+ Every Publication can have multiple Subscribers.
+ </para>
+ <para>
+ Publication is created using the <xref linkend="sql-createpublication">
+ command and may be later altered or dropped using corresponding commands.
+ </para>
+ <para>
+ The individual tables can be added and removed dynamically using
+ <xref linkend="sql-alterpublication">. Both the ADD TABLE and DROP
+ TABLE operations are transactional so the table will start or stop
+ replicating at the correct snapshot once the transaction has committed.
+ </para>
+</sect1>
+<sect1 id="logical-replication-subscription">
+ <title>Subscription</title>
+ <para>
+ A Subscription is the downstream side of the Logical Replication. The
+ node where Subscription is defined is referred to as a Subscriber.
+ Subscription defines the connection to another database and set of
+ Publications (one or more) to which it wants to be subscribed.
+ </para>
+ <para>
+ The Subscriber database behaves in a same way as any other
+ PostgreSQL instance and can be used as a Provider for other
+ databases by defining its own Publications.
+ </para>
+ <para>
+ A Subscriber may have multiple Subscriptions if desired. It is
+ possible to define multiple Subscriptions between single
+ Provider-Subscriber pair, provided that each Publications can only
+ be subscribed to from one Subscriber.
+ </para>
+ <para>
+ Each Subscription will receive changes via one replication slot (see
+ <xref linkend="streaming-replication-slots">). Additional temporary
+ replication slots may be required for the initial data synchronizations
+ of pre-existing table data.
+ </para>
+ <para>
+ Subscriptions are not dumped by pg_dump by default, but can be
+ requested using --subscriptions parameter.
+ </para>
+ <para>
+ The Subscription is added using <xref linkend="sql-createsubscription">
+ and can be stopped/resumed at any time using
+ <xref linkend="sql-altersubscription"> command or removed using
+ <xref linkend="sql-dropsubscription">.
+ </para>
+ <para>
+ When subscription is dropped and recreated the synchronization
+ information is lost. This means that the data has to be
+ resynchronized afterwards.
+ </para>
+</sect1>
+<sect1 id="logical-replication-conflicts">
+ <title>Conflicts</title>
+ <para>
+ Conflicts happen when the replicated changes is breaking any
+ specified constraints (with the exception of foreign keys which are
+ not checked). Currently conflicts are not resolved automatically and
+ cause replication to be stopped with an error until the conflict is
+ manually resolved.
+ </para>
+</sect1>
+<sect1 id="logical-replication-architecture">
+ <title>Architecture</title>
+ <para>
+ Logical replication starts by copying a snapshot of the data on
+ the Provider database. Once that is done, the changes on Provider
+ are sent to Subscriber as they occur in real-time. The Subscriber
+ applies the data in the order in which commits were made on the
+ Provider so that transactional consistency is guaranteed for the
+ Publications within any single Subscription.
+ </para>
+ <para>
+ The Logical Replication is built on the similar architecture as the
+ physical streaming replication
+ (see <xref linkend="streaming-replication">). It is implemented by
+ WalSender and the Apply processes. The WalSender starts the logical
+ decoding (described in <xref linkend="logicaldecoding">) of the WAL and
+ loads the standard logical decoding plugin (pgoutput). The plugin
+ transforms the changes read from WAL to the logical replication protocol
+ (see <xref linkend="protocol-logical-replication">) and filters the data
+ according to Publication specifications. The data are then continuously
+ transferred using the streaming replication protocol to the Apply worker
+ which maps them to the local tables and applies the individual changes as
+ they are received in exact transactional order.
+ </para>
+ <para>
+ The Apply process on Subscriber database always runs with
+ session_replication_role set to replica, which produces the normal effects
+ on triggers and constraints.
+ </para>
+ <sect2 id="logical-replication-snapshot">
+ <title>Initial snapshot</title>
+ <para>
+ The initial snapshot is taken when the replication slot for
+ Subscription is created. The existing data at that snapshot are
+ then sent over using the streaming replication protocol between
+ WalSender and Apply processes in similar way the changes are sent.
+ Once the initial data are copied, the Apply enters catch up phase
+ where it replays the changes which happened on the Provider while
+ the initial snapshot was being copied. Once the replication catches
+ up the Apply switches to normal replication streaming mode and
+ replicates transactions as they happen.
+ </para>
+ </sect2>
+ <sect2 id="logical-replication-table-resync">
+ <title>Individual table resynchronization</title>
+ <para>
+ The table can be resynchronized at any point during the normal
+ replication operation. When the table resynchronization is
+ requested a parallel instance of special kind of the Apply process
+ is started which registers its own temporary replication slot and
+ does new snapshot. Then it works same way as the initial snapshot
+ <xref linkend="logical-replication-snapshot"> with the exception that
+ it only does data copy of single table and once the catchup phase is
+ finished the control of the replication of the table is given back to
+ the main Apply process.
+ </para>
+ </sect2>
+</sect1>
+<sect1 id="logical-replication-monitoring">
+ <title>Monitoring</title>
+ <para>
+ pg_stat_replication
+ </para>
+ <para>
+ pg_stat_subscription
+ </para>
+</sect1>
+<sect1 id="logical-replication-security">
+ <title>Security</title>
+ <para>
+ Replication connection can occur in the same way as physical streaming
+ replication. It requires access to be specifically given using
+ pg_hba.conf. The role used for the replication must have
+ <literal>REPLICATION</literal> privilege <command>GRANTED</command>.
+ This gives a role access to both logical and physical replication.
+ </para>
+ <para>
+ In addition, logical replication can be accessed with the
+ <literal>SUBSCRIPTION</literal> privilege. This allows you to create
+ roles which can pull data from Publications yet cannot request
+ physical replication.
+ </para>
+ <para>
+ To create or subscribe to a Publication the user must have the
+ REPLICATION role, the SUBSCRIPTION role or be a superuser.
+ </para>
+ <para>
+ <literal>SELECT</literal> privilege is required when the user
+ adds a table to a Publication.
+ To subscribe to a Publication, user must be owner or have USAGE
+ privileges granted to the Publication.
+ </para>
+ <para>
+ To create a Subscription the user must have the
+ REPLICATION role, the SUBSCRIPTION role or be a superuser.
+ The Subscription Apply process will run in local database
+ with the privileges of the owner of the Subscription. In practice this
+ means that the owner of the Subscription must have <literal>INSERT</>,
+ <literal>UPDATE</>, <literal>DELETE</> and <literal>TRUNCATE</>
+ privileges on Subscriber to the tables that are being replicated by the
+ Subscription, or be superuser, though this is not recommended.
+ </para>
+ <para>
+ In particular, note that privileges are not re-checked as each change
+ record is read from the Provider, nor are they re-checked for each change
+ when applied. Security is checked once at startup. Concurrent REVOKEs
+ of privilege will interrupt logical replication if they have a material
+ affect on the security of the change stream.
+ </para>
+</sect1>
+<sect1 id="logical-replication-gucs">
+ <title>Logical replication related configuration parameters</title>
+ <para>
+ The Logical Replication requires several configuration options to be
+ set.
+ </para>
+ <para>
+ On the provider side the <varname>wal_level</> must be set to
+ <literal>logical</>, <varname>max_replication_slots</> has to be set to
+ at least number of Subscriptions expected to connect with some reserve
+ for table synchronization as well. And <varname>max_wal_senders</>
+ should be set to at least same as <varname>max_replication_slots</> plus
+ the number of physical replicas that are connected at the same time.
+ </para>
+ <para>
+ The Subscriber also requires the <varname>max_replication_slots</> to
+ be set. In this case it should be set to at least the number of
+ Subscriptions that will be added to the Subscriber. The
+ <varname>max_logical_replication_workers</> has to be set to at least
+ the number of Subscriptions again with some reserve for the table
+ synchronization. Additionally the <varname>max_worker_processes</> may
+ need to be adjusted to accommodate for replication workers at least
+ (<varname>max_logical_replication_workers</> + <literal>1</>). Please
+ note that some extensions and parallel queries also take worker slots
+ from <varname>max_worker_processes</>.
+ </para>
+</sect1>
+<sect1 id="logical-replication-quick-setup">
+ <title>Quick setup</title>
+ <para>
+ First set the configuration options in the postgresql.conf:
+<programlisting>
+wal_level = logical
+max_worker_processes = 10 # one per subscription + one per instance needed on subscriber
+max_logical_replication_workers = 10 # one per subscription + one per instance needed on subscriber
+max_replication_slots = 10 # one per subscription needed both provider and subscriber
+max_wal_senders = 10 # one per subscription needed on provider
+</programlisting>
+ </para>
+ <para>
+ The pg_hba.conf needs to be adjusted to allow replication (the
+ values here depend on your actual network configuration and user you
+ want to use for connecting):
+<programlisting>
+host replication repuser 0.0.0.0/0 md5
+</programlisting>
+ </para>
+ <para>
+ Then on Provider database:
+<programlisting>
+CREATE PUBLICATION mypub;
+ALTER PUBLICATION mypub ADD TABLE users, departments;
+</programlisting>
+ </para>
+ <para>
+ And on Subscriber database:
+<programlisting>
+CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub;
+</programlisting>
+ </para>
+ <para>
+ The above will start the replication process which synchronizes the
+ initial table contents of <literal>users</literal> and
+ <literal>departments</literal> tables and then starts replicating
+ incremental changes to those tables.
+ </para>
+</sect1>
+</chapter>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0346d36..1c94015 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -159,6 +159,7 @@
&monitoring;
&diskusage;
&wal;
+ &logical-replication;
®ress;
</part>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 8acdff1..34007d3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -54,11 +54,13 @@
&alterOperatorClass;
&alterOperatorFamily;
&alterPolicy;
+ &alterPublication;
&alterRole;
&alterRule;
&alterSchema;
&alterSequence;
&alterServer;
+ &alterSubscription;
&alterSystem;
&alterTable;
&alterTableSpace;
@@ -100,11 +102,13 @@
&createOperatorClass;
&createOperatorFamily;
&createPolicy;
+ &createPublication;
&createRole;
&createRule;
&createSchema;
&createSequence;
&createServer;
+ &createSubscription;
&createTable;
&createTableAs;
&createTableSpace;
@@ -144,11 +148,13 @@
&dropOperatorFamily;
&dropOwned;
&dropPolicy;
+ &dropPublication;
&dropRole;
&dropRule;
&dropSchema;
&dropSequence;
&dropServer;
+ &dropSubscription;
&dropTable;
&dropTableSpace;
&dropTSConfig;
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 54d66d5..43e2853 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -29,13 +29,19 @@
#include "catalog/pg_subscription.h"
#include "commands/defrem.h"
+#include "commands/replicationcmds.h"
#include "executor/spi.h"
#include "nodes/makefuncs.h"
+#include "replication/logical.h"
+#include "replication/logicalproto.h"
+#include "replication/logicalworker.h"
+#include "replication/origin.h"
#include "replication/reorderbuffer.h"
-#include "commands/replicationcmds.h"
+#include "replication/logicalworker.h"
+#include "replication/walreceiver.h"
#include "utils/array.h"
#include "utils/builtins.h"
@@ -164,7 +170,12 @@ CreateSubscription(CreateSubscriptionStmt *stmt)
bool enabled_given;
bool enabled;
char *conninfo;
+ char *slotname;
List *publications;
+ WalReceiverConnHandle *wrchandle = NULL;
+ WalReceiverConnAPI *wrcapi = NULL;
+ walrcvconn_init_fn walrcvconn_init;
+ XLogRecPtr lsn;
check_replication_permissions();
@@ -184,6 +195,7 @@ CreateSubscription(CreateSubscriptionStmt *stmt)
/* Parse and check options. */
parse_subscription_options(stmt->options, &enabled_given, &enabled,
&conninfo, &publications);
+ slotname = stmt->subname;
/* TODO: improve error messages here. */
if (conninfo == NULL)
@@ -202,7 +214,7 @@ CreateSubscription(CreateSubscriptionStmt *stmt)
values[Anum_pg_subscription_dbid - 1] = ObjectIdGetDatum(MyDatabaseId);
values[Anum_pg_subscription_subname - 1] =
- DirectFunctionCall1(namein, CStringGetDatum(stmt->subname));
+ DirectFunctionCall1(namein, CStringGetDatum(slotname));
values[Anum_pg_subscription_subenabled - 1] = BoolGetDatum(enabled);
values[Anum_pg_subscription_subconninfo - 1] =
CStringGetTextDatum(conninfo);
@@ -218,13 +230,47 @@ CreateSubscription(CreateSubscriptionStmt *stmt)
CatalogUpdateIndexes(rel, tup);
heap_freetuple(tup);
- ObjectAddressSet(myself, SubscriptionRelationId, suboid);
+ /*
+ * Now that the catalog update is done, try to reserve slot at the
+ * provider node using replication connection.
+ */
+ wrcapi = palloc0(sizeof(WalReceiverConnAPI));
+
+ walrcvconn_init = (walrcvconn_init_fn)
+ load_external_function("libpqwalreceiver",
+ "_PG_walreceirver_conn_init", false, NULL);
+
+ if (walrcvconn_init == NULL)
+ elog(ERROR, "libpqwalreceiver does not declare _PG_walreceirver_conn_init symbol");
+
+ wrchandle = walrcvconn_init(wrcapi);
+ if (wrcapi->connect == NULL ||
+ wrcapi->create_slot == NULL)
+ elog(ERROR, "libpqwalreceiver didn't initialize correctly");
+
+ wrcapi->connect(wrchandle, conninfo, true, stmt->subname);
+ wrcapi->create_slot(wrchandle, slotname, true, &lsn);
+ ereport(NOTICE,
+ (errmsg("created replication slot \"%s\" on provider",
+ slotname)));
+ /*
+ * Setup replication origin tracking.
+ * TODO: do this only when it does not already exist?
+ */
+ replorigin_create(slotname);
+
+ /* And we are done with the remote side. */
+ wrcapi->disconnect(wrchandle);
heap_close(rel, RowExclusiveLock);
/* Make the changes visible. */
CommandCounterIncrement();
+ ApplyLauncherWakeupOnCommit();
+
+ ObjectAddressSet(myself, SubscriptionRelationId, subid);
+
return myself;
}
@@ -302,6 +348,11 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
heap_freetuple(tup);
heap_close(rel, RowExclusiveLock);
+ /* Make the changes visible. */
+ CommandCounterIncrement();
+
+ ApplyLauncherWakeupOnCommit();
+
return myself;
}
@@ -313,19 +364,135 @@ DropSubscriptionById(Oid subid)
{
Relation rel;
HeapTuple tup;
+ Datum datum;
+ bool isnull;
+ char *subname;
+ char *conninfo;
+ char *slotname;
+ TupleDesc tupdesc;
+ RepOriginId originid;
+ MemoryContext tmpctx,
+ oldctx;
+ WalReceiverConnHandle *wrchandle = NULL;
+ WalReceiverConnAPI *wrcapi = NULL;
+ walrcvconn_init_fn walrcvconn_init;
check_replication_permissions();
rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
+ if (GetTopTransactionIdIfAny() != InvalidTransactionId)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("DROP SUBSCRIPTION must be first action in transaction")));
+
tup = SearchSysCache1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid));
if (!HeapTupleIsValid(tup))
elog(ERROR, "cache lookup failed for subscription %u", subid);
+ tupdesc = RelationGetDescr(rel);
+
+ /*
+ * Create temporary memory context to keep copy of subscription
+ * info needed later in the execution.
+ */
+ tmpctx = AllocSetContextCreate(TopMemoryContext,
+ "DropSubscription Ctx",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ oldctx = MemoryContextSwitchTo(tmpctx);
+
+ /* Get subname */
+ datum = heap_getattr(tup, Anum_pg_subscription_subname, tupdesc,
+ &isnull);
+ Assert(!isnull);
+ subname = pstrdup(NameStr(*DatumGetName(datum)));
+
+ /* Get conninfo */
+ datum = heap_getattr(tup, Anum_pg_subscription_subconninfo, tupdesc,
+ &isnull);
+ Assert(!isnull);
+ conninfo = pstrdup(TextDatumGetCString(datum));
+
+ /* Get slotname */
+ datum = heap_getattr(tup, Anum_pg_subscription_subslotname, tupdesc,
+ &isnull);
+ Assert(!isnull);
+ slotname = pstrdup(NameStr(*DatumGetName(datum)));
+
+ MemoryContextSwitchTo(oldctx);
+
+ /* Remove the tuple from catalog. */
simple_heap_delete(rel, &tup->t_self);
ReleaseSysCache(tup);
- heap_close(rel, RowExclusiveLock);
+ heap_close(rel, NoLock);
+
+ originid = replorigin_by_name(slotname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid);
+
+ /* Commit the transaction to make the change visible to laucnher. */
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /* Signal the launcher so that it kills the apply proccess. */
+ ApplyLauncherWakeup();
+
+ StartTransactionCommand();
+
+ /*
+ * Now that the catalog update is done, try to reserve slot at the
+ * provider node using replication connection.
+ */
+ wrcapi = palloc0(sizeof(WalReceiverConnAPI));
+
+ walrcvconn_init = (walrcvconn_init_fn)
+ load_external_function("libpqwalreceiver",
+ "_PG_walreceirver_conn_init", false, NULL);
+
+ if (walrcvconn_init == NULL)
+ elog(ERROR, "libpqwalreceiver does not declare _PG_walreceirver_conn_init symbol");
+
+ wrchandle = walrcvconn_init(wrcapi);
+ if (wrcapi->connect == NULL ||
+ wrcapi->drop_slot == NULL)
+ elog(ERROR, "libpqwalreceiver didn't initialize correctly");
+
+ /*
+ * We must ignore error as that would make it impossible to drop
+ * subscription when provider is down.
+ */
+ oldctx = CurrentMemoryContext;
+ PG_TRY();
+ {
+ wrcapi->connect(wrchandle, conninfo, true, subname);
+ wrcapi->drop_slot(wrchandle, slotname);
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on provider",
+ slotname)));
+ wrcapi->disconnect(wrchandle);
+ }
+ PG_CATCH();
+ {
+ MemoryContext ectx;
+ ErrorData *edata;
+
+ ectx = MemoryContextSwitchTo(oldctx);
+ /* Save error info */
+ edata = CopyErrorData();
+ MemoryContextSwitchTo(ectx);
+ FlushErrorState();
+
+ ereport(WARNING,
+ (errmsg("there was problem dropping the replication slot "
+ "\"%s\" on provider", slotname),
+ errdetail("The error was: %s", edata->message),
+ errhint("You may have to drop it manually")));
+ FreeErrorData(edata);
+ }
+ PG_END_TRY();
}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index af7b26c..907623b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -232,7 +232,7 @@ ExecCheckTIDVisible(EState *estate,
* Returns RETURNING result if any, otherwise NULL.
* ----------------------------------------------------------------
*/
-static TupleTableSlot *
+TupleTableSlot *
ExecInsert(ModifyTableState *mtstate,
TupleTableSlot *slot,
TupleTableSlot *planSlot,
@@ -536,7 +536,7 @@ ExecInsert(ModifyTableState *mtstate,
* Returns RETURNING result if any, otherwise NULL.
* ----------------------------------------------------------------
*/
-static TupleTableSlot *
+TupleTableSlot *
ExecDelete(ItemPointer tupleid,
HeapTuple oldtuple,
TupleTableSlot *planSlot,
@@ -794,7 +794,7 @@ ldelete:;
* Returns RETURNING result if any, otherwise NULL.
* ----------------------------------------------------------------
*/
-static TupleTableSlot *
+TupleTableSlot *
ExecUpdate(ItemPointer tupleid,
HeapTuple oldtuple,
TupleTableSlot *slot,
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 699c934..fc998cd 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -93,6 +93,9 @@ struct BackgroundWorkerHandle
static BackgroundWorkerArray *BackgroundWorkerData;
+/* Enables registration of internal background workers. */
+bool internal_bgworker_registration_in_progress = false;
+
/*
* Calculate shared memory needed.
*/
@@ -745,7 +748,8 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
ereport(DEBUG1,
(errmsg("registering background worker \"%s\"", worker->bgw_name)));
- if (!process_shared_preload_libraries_in_progress)
+ if (!process_shared_preload_libraries_in_progress &&
+ !internal_bgworker_registration_in_progress)
{
if (!IsUnderPostmaster)
ereport(LOG,
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index f5c8e9d..2211532 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -113,6 +113,7 @@
#include "postmaster/pgarch.h"
#include "postmaster/postmaster.h"
#include "postmaster/syslogger.h"
+#include "replication/logicalworker.h"
#include "replication/walsender.h"
#include "storage/fd.h"
#include "storage/ipc.h"
@@ -416,6 +417,7 @@ static void maybe_start_bgworker(void);
static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
static pid_t StartChildProcess(AuxProcType type);
static void StartAutovacuumWorker(void);
+static void register_internal_bgworkers(void);
static void InitPostmasterDeathWatchHandle(void);
/*
@@ -925,6 +927,12 @@ PostmasterMain(int argc, char *argv[])
#endif
/*
+ * Register internal bgworkers before we give external modules chance
+ * to do the same.
+ */
+ register_internal_bgworkers();
+
+ /*
* process any libraries that should be preloaded at postmaster start
*/
process_shared_preload_libraries();
@@ -5641,6 +5649,39 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
}
/*
+ * Register internal background workers.
+ *
+ * This is here mainly because the permanent bgworkers are normally allowed
+ * to be registered only when share preload libraries are loaded which does
+ * not work for the internal ones.
+ */
+static void
+register_internal_bgworkers(void)
+{
+ internal_bgworker_registration_in_progress = true;
+
+ /* Register the logical replication worker launcher if appropriate. */
+ if (!IsBinaryUpgrade && max_logical_replication_workers > 0)
+ {
+ BackgroundWorker bgw;
+
+ bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+ BGWORKER_BACKEND_DATABASE_CONNECTION;
+ bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+ bgw.bgw_main = ApplyLauncherMain;
+ snprintf(bgw.bgw_name, BGW_MAXLEN,
+ "logical replication launcher");
+ bgw.bgw_restart_time = 5;
+ bgw.bgw_notify_pid = 0;
+ bgw.bgw_main_arg = (Datum) 0;
+
+ RegisterBackgroundWorker(&bgw);
+ }
+
+ internal_bgworker_registration_in_progress = false;
+}
+
+/*
* If the time is right, start one background worker.
*
* As a side effect, the bgworker control variables are set or reset whenever
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index f28a792..4c4d441 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -60,6 +60,7 @@ static void libpqrcv_readtimelinehistoryfile(WalReceiverConnHandle *handle,
static char *libpqrcv_create_slot(WalReceiverConnHandle *handle,
char *slotname, bool logical,
XLogRecPtr *lsn);
+static void libpqrcv_drop_slot(WalReceiverConnHandle *handle, char *slotname);
static bool libpqrcv_startstreaming_physical(WalReceiverConnHandle *handle,
TimeLineID tli, XLogRecPtr startpoint,
char *slotname);
@@ -96,6 +97,7 @@ _PG_walreceirver_conn_init(WalReceiverConnAPI *wrcapi)
wrcapi->identify_system = libpqrcv_identify_system;
wrcapi->readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile;
wrcapi->create_slot = libpqrcv_create_slot;
+ wrcapi->drop_slot = libpqrcv_drop_slot;
wrcapi->startstreaming_physical = libpqrcv_startstreaming_physical;
wrcapi->startstreaming_logical = libpqrcv_startstreaming_logical;
wrcapi->endstreaming = libpqrcv_endstreaming;
@@ -274,7 +276,7 @@ libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname,
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- elog(FATAL, "could not crate replication slot \"%s\": %s\n",
+ elog(ERROR, "could not crate replication slot \"%s\": %s\n",
slotname, PQerrorMessage(handle->streamConn));
}
@@ -287,6 +289,28 @@ libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname,
return snapshot;
}
+/*
+ * Drop replication slot.
+ */
+static void
+libpqrcv_drop_slot(WalReceiverConnHandle *handle, char *slotname)
+{
+ PGresult *res;
+ char cmd[256];
+
+ snprintf(cmd, sizeof(cmd),
+ "DROP_REPLICATION_SLOT \"%s\"", slotname);
+
+ res = libpqrcv_PQexec(handle, cmd);
+
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ elog(ERROR, "could not drop replication slot \"%s\": %s\n",
+ slotname, PQerrorMessage(handle->streamConn));
+ }
+
+ PQclear(res);
+}
/*
* Start streaming WAL data from given startpoint and timeline.
@@ -353,7 +377,7 @@ libpqrcv_startstreaming_logical(WalReceiverConnHandle *handle,
(uint32) (startpoint >> 32),
(uint32) startpoint);
- /* Send options */
+ /* Add options */
if (options)
appendStringInfo(&cmd, "( %s )", options);
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 438811e..ab6e11e 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -14,7 +14,8 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
-OBJS = decode.o logical.o logicalfuncs.o message.o origin.o proto.o \
- publication.o reorderbuffer.o snapbuild.o subscription.o
+OBJS = apply.o decode.o launcher.o logical.o logicalfuncs.o message.o \
+ origin.o proto.o publication.o reorderbuffer.o snapbuild.o \
+ subscription.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/apply.c b/src/backend/replication/logical/apply.c
new file mode 100644
index 0000000..eb7af19
--- /dev/null
+++ b/src/backend/replication/logical/apply.c
@@ -0,0 +1,1435 @@
+/*-------------------------------------------------------------------------
+ * apply.c
+ * PostgreSQL logical replication
+ *
+ * Copyright (c) 2012-2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logical/apply.c
+ *
+ * NOTES
+ * This file contains the worker which applies logical changes as they come
+ * from remote logical replication stream.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/xact.h"
+#include "access/xlog_internal.h"
+
+#include "catalog/namespace.h"
+
+#include "commands/trigger.h"
+
+#include "executor/executor.h"
+#include "executor/nodeModifyTable.h"
+
+#include "libpq/pqformat.h"
+#include "libpq/pqsignal.h"
+
+#include "mb/pg_wchar.h"
+
+#include "optimizer/planner.h"
+
+#include "parser/parse_relation.h"
+
+#include "postmaster/bgworker.h"
+#include "postmaster/postmaster.h"
+
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalproto.h"
+#include "replication/logicalworker.h"
+#include "replication/reorderbuffer.h"
+#include "replication/origin.h"
+#include "replication/snapbuild.h"
+#include "replication/subscription.h"
+#include "replication/walreceiver.h"
+
+#include "rewrite/rewriteHandler.h"
+
+#include "storage/bufmgr.h"
+#include "storage/ipc.h"
+#include "storage/lmgr.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+
+#include "utils/builtins.h"
+#include "utils/catcache.h"
+#include "utils/guc.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/timeout.h"
+#include "utils/tqual.h"
+#include "utils/syscache.h"
+
+typedef struct FlushPosition
+{
+ dlist_node node;
+ XLogRecPtr local_end;
+ XLogRecPtr remote_end;
+} FlushPosition;
+
+static dlist_head lsn_mapping = DLIST_STATIC_INIT(lsn_mapping);
+
+static MemoryContext ApplyContext;
+static bool in_remote_transaction = false;
+
+static Subscription *MySubscription = NULL;
+static bool got_SIGTERM = false;
+
+typedef struct LogicalRepRelMapEntry {
+ LogicalRepRelation remoterel; /* key is remoterel.remoteid */
+
+ /* Mapping to local relation, filled as needed. */
+ Oid reloid; /* local relation id */
+ Relation rel; /* relcache entry */
+ int *attmap; /* map of remote attributes to
+ * local ones */
+ AttInMetadata *attin; /* cached info used in type
+ * conversion */
+} LogicalRepRelMapEntry;
+
+static HTAB *LogicalRepRelMap = NULL;
+
+/* filled by libpqreceiver when loaded */
+static WalReceiverConnAPI *wrcapi = NULL;
+static WalReceiverConnHandle *wrchandle = NULL;
+
+static void send_feedback(XLogRecPtr recvpos, int64 now, bool force);
+void pglogical_apply_main(Datum main_arg);
+
+static bool tuple_find_by_replidx(Relation rel, LockTupleMode lockmode,
+ TupleTableSlot *searchslot, TupleTableSlot *slot);
+
+
+
+/*
+ * Relcache invalidation callback for our relation map cache.
+ */
+static void
+logicalreprelmap_invalidate_cb(Datum arg, Oid reloid)
+{
+ LogicalRepRelMapEntry *entry;
+
+ /* Just to be sure. */
+ if (LogicalRepRelMap == NULL)
+ return;
+
+ if (reloid != InvalidOid)
+ {
+ HASH_SEQ_STATUS status;
+
+ hash_seq_init(&status, LogicalRepRelMap);
+
+ /* TODO, use inverse lookup hastable? */
+ while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+ {
+ if (entry->reloid == reloid)
+ entry->reloid = InvalidOid;
+ }
+ }
+ else
+ {
+ /* invalidate all cache entries */
+ HASH_SEQ_STATUS status;
+
+ hash_seq_init(&status, LogicalRepRelMap);
+
+ while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+ entry->reloid = InvalidOid;
+ }
+}
+
+/*
+ * Initialize the relation map cache.
+ */
+static void
+remoterelmap_init(void)
+{
+ HASHCTL ctl;
+
+ /* Make sure we've initialized CacheMemoryContext. */
+ if (CacheMemoryContext == NULL)
+ CreateCacheMemoryContext();
+
+ /* Initialize the hash table. */
+ MemSet(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(uint32);
+ ctl.entrysize = sizeof(LogicalRepRelMapEntry);
+ ctl.hcxt = CacheMemoryContext;
+
+ LogicalRepRelMap = hash_create("logicalrep relation map cache", 128, &ctl,
+ HASH_ELEM | HASH_CONTEXT);
+
+ /* Watch for invalidation events. */
+ CacheRegisterRelcacheCallback(logicalreprelmap_invalidate_cb,
+ (Datum) 0);
+}
+
+/*
+ * Free the entry of a relation map cache.
+ */
+static void
+remoterelmap_free_entry(LogicalRepRelMapEntry *entry)
+{
+ LogicalRepRelation *remoterel;
+
+ remoterel = &entry->remoterel;
+
+ pfree(remoterel->nspname);
+ pfree(remoterel->relname);
+
+ if (remoterel->natts > 0)
+ {
+ int i;
+
+ for (i = 0; i < remoterel->natts; i++)
+ pfree(remoterel->attnames[i]);
+
+ pfree(remoterel->attnames);
+ }
+
+ if (entry->attmap)
+ pfree(entry->attmap);
+
+ remoterel->natts = 0;
+ entry->reloid = InvalidOid;
+ entry->rel = NULL;
+}
+
+/*
+ * Add new entry or update existing entry in the relation map cache.
+ *
+ * Called when new relation mapping is sent by the provider to update
+ * our expected view of incoming data from said provider.
+ */
+static void
+remoterelmap_update(LogicalRepRelation *remoterel)
+{
+ MemoryContext oldctx;
+ LogicalRepRelMapEntry *entry;
+ bool found;
+ int i;
+
+ if (LogicalRepRelMap == NULL)
+ remoterelmap_init();
+
+ /*
+ * HASH_ENTER returns the existing entry if present or creates a new one.
+ */
+ entry = hash_search(LogicalRepRelMap, (void *) &remoterel->remoteid,
+ HASH_ENTER, &found);
+
+ if (found)
+ remoterelmap_free_entry(entry);
+
+ /* Make cached copy of the data */
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+ entry->remoterel.remoteid = remoterel->remoteid;
+ entry->remoterel.nspname = pstrdup(remoterel->nspname);
+ entry->remoterel.relname = pstrdup(remoterel->relname);
+ entry->remoterel.natts = remoterel->natts;
+ entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char*));
+ for (i = 0; i < remoterel->natts; i++)
+ entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+ entry->attmap = palloc(remoterel->natts * sizeof(int));
+ entry->reloid = InvalidOid;
+ MemoryContextSwitchTo(oldctx);
+}
+
+/*
+ * Find attribute index in TupleDesc struct by attribute name.
+ */
+static int
+tupdesc_get_att_by_name(TupleDesc desc, const char *attname)
+{
+ int i;
+
+ for (i = 0; i < desc->natts; i++)
+ {
+ Form_pg_attribute att = desc->attrs[i];
+
+ if (strcmp(NameStr(att->attname), attname) == 0)
+ return i;
+ }
+
+ elog(ERROR, "unknown column name %s", attname);
+}
+
+/*
+ * Open the local relation associated with the remote one.
+ */
+static LogicalRepRelMapEntry *
+logicalreprel_open(uint32 remoteid, LOCKMODE lockmode)
+{
+ LogicalRepRelMapEntry *entry;
+ bool found;
+
+ if (LogicalRepRelMap == NULL)
+ remoterelmap_init();
+
+ /* Search for existing entry. */
+ entry = hash_search(LogicalRepRelMap, (void *) &remoteid,
+ HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "cache lookup failed for remote relation %u",
+ remoteid);
+
+ /* Need to update the local cache? */
+ if (!OidIsValid(entry->reloid))
+ {
+ Oid nspid;
+ Oid relid;
+ int i;
+ TupleDesc desc;
+ LogicalRepRelation *remoterel;
+
+ remoterel = &entry->remoterel;
+
+ nspid = LookupExplicitNamespace(remoterel->nspname, false);
+ relid = get_relname_relid(remoterel->relname, nspid);
+ entry->rel = heap_open(relid, lockmode);
+
+ desc = RelationGetDescr(entry->rel);
+ for (i = 0; i < remoterel->natts; i++)
+ entry->attmap[i] = tupdesc_get_att_by_name(desc,
+ remoterel->attnames[i]);
+
+ entry->reloid = RelationGetRelid(entry->rel);
+ }
+ else
+ entry->rel = heap_open(entry->reloid, lockmode);
+
+ return entry;
+}
+
+/*
+ * Close the previously opened logical relation.
+ */
+static void
+logicalreprel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
+{
+ heap_close(rel->rel, lockmode);
+ rel->rel = NULL;
+}
+
+
+/*
+ * Make sure that we started local transaction.
+ *
+ * Also switches to ApplyContext as necessary.
+ */
+static bool
+ensure_transaction(void)
+{
+ if (IsTransactionState())
+ {
+ if (CurrentMemoryContext != ApplyContext)
+ MemoryContextSwitchTo(ApplyContext);
+ return false;
+ }
+
+ StartTransactionCommand();
+ MemoryContextSwitchTo(ApplyContext);
+ return true;
+}
+
+
+/*
+ * Executor state preparation for evaluation of constraint expressions,
+ * indexes and triggers.
+ *
+ * This is based on similar code in copy.c
+ */
+static EState *
+create_estate_for_relation(LogicalRepRelMapEntry *rel)
+{
+ EState *estate;
+ ResultRelInfo *resultRelInfo;
+ RangeTblEntry *rte;
+
+ estate = CreateExecutorState();
+
+ rte = makeNode(RangeTblEntry);
+ rte->rtekind = RTE_RELATION;
+ rte->relid = RelationGetRelid(rel->rel);
+ rte->relkind = rel->rel->rd_rel->relkind;
+ estate->es_range_table = list_make1(rte);
+
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo, rel->rel, 1, 0);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ /* Triggers might need a slot */
+ if (resultRelInfo->ri_TrigDesc)
+ estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);
+
+ return estate;
+}
+
+/*
+ * Check if the local attribute is present in relation definition used
+ * by upstream and hence updated by the replication.
+ */
+static bool
+physatt_in_attmap(LogicalRepRelMapEntry *rel, int attid)
+{
+ AttrNumber i;
+
+ /* Fast path for tables that are same on upstream and downstream. */
+ if (attid < rel->remoterel.natts && rel->attmap[attid] == attid)
+ return true;
+
+ /* Try to find the attribute in the map. */
+ for (i = 0; i < rel->remoterel.natts; i++)
+ if (rel->attmap[i] == attid)
+ return true;
+
+ return false;
+}
+
+/*
+ * Executes default values for columns for which we can't map to remote
+ * relation columns.
+ *
+ * This allows us to support tables which have more columns on the downstream
+ * than on the upsttream.
+ */
+static void
+FillSlotDefaults(LogicalRepRelMapEntry *rel, EState *estate,
+ TupleTableSlot *slot)
+{
+ TupleDesc desc = RelationGetDescr(rel->rel);
+ AttrNumber num_phys_attrs = desc->natts;
+ int i;
+ AttrNumber attnum,
+ num_defaults = 0;
+ int *defmap;
+ ExprState **defexprs;
+ ExprContext *econtext;
+
+ econtext = GetPerTupleExprContext(estate);
+
+ /* We got all the data via replication, no need to evaluate anything. */
+ if (num_phys_attrs == rel->remoterel.natts)
+ return;
+
+ defmap = (int *) palloc(num_phys_attrs * sizeof(int));
+ defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *));
+
+ for (attnum = 0; attnum < num_phys_attrs; attnum++)
+ {
+ Expr *defexpr;
+
+ if (desc->attrs[attnum]->attisdropped)
+ continue;
+
+ if (physatt_in_attmap(rel, attnum))
+ continue;
+
+ defexpr = (Expr *) build_column_default(rel->rel, attnum + 1);
+
+ if (defexpr != NULL)
+ {
+ /* Run the expression through planner */
+ defexpr = expression_planner(defexpr);
+
+ /* Initialize executable expression in copycontext */
+ defexprs[num_defaults] = ExecInitExpr(defexpr, NULL);
+ defmap[num_defaults] = attnum;
+ num_defaults++;
+ }
+
+ }
+
+ for (i = 0; i < num_defaults; i++)
+ slot->tts_values[defmap[i]] =
+ ExecEvalExpr(defexprs[i], econtext, &slot->tts_isnull[defmap[i]],
+ NULL);
+}
+
+/*
+ * Store data in C string form into slot.
+ * This is similar to BuildTupleFromCStrings but TupleTableSlot fits our
+ * use better.
+ */
+static void
+SlotStoreCStrings(TupleTableSlot *slot, char **values)
+{
+ int natts = slot->tts_tupleDescriptor->natts;
+ int i;
+
+ ExecClearTuple(slot);
+
+ /* Call the "in" function for each non-dropped attribute */
+ for (i = 0; i < natts; i++)
+ {
+ Form_pg_attribute att = slot->tts_tupleDescriptor->attrs[i];
+
+ if (!att->attisdropped && values[i] != NULL)
+ {
+ Oid typinput;
+ Oid typioparam;
+
+ getTypeInputInfo(att->atttypid, &typinput, &typioparam);
+ slot->tts_values[i] = OidInputFunctionCall(typinput, values[i],
+ typioparam,
+ att->atttypmod);
+ slot->tts_isnull[i] = false;
+ }
+ else
+ {
+ /* We assign NULL for both NULL values and dropped attributes. */
+ slot->tts_values[i] = (Datum) 0;
+ slot->tts_isnull[i] = true;
+ }
+ }
+
+ ExecStoreVirtualTuple(slot);
+}
+
+/*
+ * Modify slot with user data provided as C strigs.
+ * This is somewhat similar to heap_modify_tuple but also calls the type
+ * input fuction on the user data as the input is the rext representation
+ * of the types.
+ */
+static void
+SlotModifyCStrings(TupleTableSlot *slot, char **values, bool *replaces)
+{
+ int natts = slot->tts_tupleDescriptor->natts;
+ int i;
+
+ slot_getallattrs(slot);
+ ExecClearTuple(slot);
+
+ /* Call the "in" function for each replaced attribute */
+ for (i = 0; i < natts; i++)
+ {
+ Form_pg_attribute att = slot->tts_tupleDescriptor->attrs[i];
+
+ if (!replaces[i])
+ continue;
+
+ if (values[i] != NULL)
+ {
+ Oid typinput;
+ Oid typioparam;
+
+ getTypeInputInfo(att->atttypid, &typinput, &typioparam);
+ slot->tts_values[i] = OidInputFunctionCall(typinput, values[i],
+ typioparam,
+ att->atttypmod);
+ slot->tts_isnull[i] = false;
+ }
+ else
+ {
+ slot->tts_values[i] = (Datum) 0;
+ slot->tts_isnull[i] = true;
+ }
+ }
+
+ ExecStoreVirtualTuple(slot);
+}
+
+/*
+ * Handle BEGIN message.
+ */
+static void
+handle_begin(StringInfo s)
+{
+ XLogRecPtr commit_lsn;
+ TimestampTz commit_time;
+ TransactionId remote_xid;
+
+ logicalrep_read_begin(s, &commit_lsn, &commit_time, &remote_xid);
+
+ replorigin_session_origin_timestamp = commit_time;
+ replorigin_session_origin_lsn = commit_lsn;
+
+ in_remote_transaction = true;
+
+ pgstat_report_activity(STATE_RUNNING, NULL);
+}
+
+/*
+ * Handle COMMIT message.
+ *
+ * TODO, support tracking of multiple origins
+ */
+static void
+handle_commit(StringInfo s)
+{
+ XLogRecPtr commit_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz commit_time;
+
+ logicalrep_read_commit(s, &commit_lsn, &end_lsn, &commit_time);
+
+ Assert(commit_lsn == replorigin_session_origin_lsn);
+ Assert(commit_time == replorigin_session_origin_timestamp);
+
+ if (IsTransactionState())
+ {
+ FlushPosition *flushpos;
+
+ CommitTransactionCommand();
+ MemoryContextSwitchTo(CacheMemoryContext);
+
+ /* Track commit lsn */
+ flushpos = (FlushPosition *) palloc(sizeof(FlushPosition));
+ flushpos->local_end = XactLastCommitEnd;
+ flushpos->remote_end = end_lsn;
+
+ dlist_push_tail(&lsn_mapping, &flushpos->node);
+ MemoryContextSwitchTo(ApplyContext);
+ }
+
+ in_remote_transaction = false;
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle ORIGIN message.
+ *
+ * TODO, support tracking of multiple origins
+ */
+static void
+handle_origin(StringInfo s)
+{
+ /*
+ * ORIGIN message can only come inside remote transaction and before
+ * any actual writes.
+ */
+ if (!in_remote_transaction || IsTransactionState())
+ elog(ERROR, "ORIGIN message sent out of order");
+}
+
+/*
+ * Handle RELATION message.
+ *
+ * Note we don't do validation against local schema here. The validation is
+ * posponed until first change for given relation comes.
+ */
+static void
+handle_relation(StringInfo s)
+{
+ LogicalRepRelation *rel;
+
+ rel = logicalrep_read_rel(s);
+ remoterelmap_update(rel);
+}
+
+
+/*
+ * Handle INSERT message.
+ */
+static void
+handle_insert(StringInfo s)
+{
+ LogicalRepRelMapEntry *rel;
+ LogicalRepTupleData newtup;
+ LogicalRepRelId relid;
+ EState *estate;
+ TupleTableSlot *remoteslot;
+ MemoryContext oldctx;
+
+ ensure_transaction();
+
+ relid = logicalrep_read_insert(s, &newtup);
+ rel = logicalreprel_open(relid, RowExclusiveLock);
+
+ /* Initialize the executor state. */
+ estate = create_estate_for_relation(rel);
+ remoteslot = ExecInitExtraTupleSlot(estate);
+ ExecSetSlotDescriptor(remoteslot, RelationGetDescr(rel->rel));
+
+ /* Process and store remote tuple in the slot */
+ oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+ SlotStoreCStrings(remoteslot, newtup.values);
+ FillSlotDefaults(rel, estate, remoteslot);
+ MemoryContextSwitchTo(oldctx);
+
+ PushActiveSnapshot(GetTransactionSnapshot());
+ ExecOpenIndices(estate->es_result_relation_info, false);
+
+ ExecInsert(NULL, /* mtstate is only used for onconflict handling which we don't support atm */
+ remoteslot,
+ remoteslot,
+ NIL,
+ ONCONFLICT_NONE,
+ estate,
+ false);
+
+ /* Cleanup. */
+ ExecCloseIndices(estate->es_result_relation_info);
+ PopActiveSnapshot();
+ ExecResetTupleTable(estate->es_tupleTable, false);
+ FreeExecutorState(estate);
+
+ logicalreprel_close(rel, NoLock);
+
+ CommandCounterIncrement();
+}
+
+/*
+ * Handle UPDATE message.
+ *
+ * TODO: FDW support
+ */
+static void
+handle_update(StringInfo s)
+{
+ LogicalRepRelMapEntry *rel;
+ LogicalRepRelId relid;
+ EState *estate;
+ EPQState epqstate;
+ LogicalRepTupleData oldtup;
+ LogicalRepTupleData newtup;
+ bool hasoldtup;
+ TupleTableSlot *localslot;
+ TupleTableSlot *remoteslot;
+ bool found;
+ MemoryContext oldctx;
+
+ ensure_transaction();
+
+ relid = logicalrep_read_update(s, &hasoldtup, &oldtup,
+ &newtup);
+ rel = logicalreprel_open(relid, RowExclusiveLock);
+
+ /* Initialize the executor state. */
+ estate = create_estate_for_relation(rel);
+ remoteslot = ExecInitExtraTupleSlot(estate);
+ ExecSetSlotDescriptor(remoteslot, RelationGetDescr(rel->rel));
+ localslot = ExecInitExtraTupleSlot(estate);
+ ExecSetSlotDescriptor(localslot, RelationGetDescr(rel->rel));
+ EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
+
+ PushActiveSnapshot(GetTransactionSnapshot());
+ ExecOpenIndices(estate->es_result_relation_info, false);
+
+ /* Find the tuple using the replica identity index. */
+ oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+ SlotStoreCStrings(remoteslot, hasoldtup ? oldtup.values : newtup.values);
+ MemoryContextSwitchTo(oldctx);
+ found = tuple_find_by_replidx(rel->rel, LockTupleExclusive,
+ remoteslot, localslot);
+ ExecClearTuple(remoteslot);
+
+ /*
+ * Tuple found.
+ *
+ * Note this will fail if there are other conflicting unique indexes.
+ */
+ if (found)
+ {
+ /* Process and store remote tuple in the slot */
+ oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+ ExecStoreTuple(localslot->tts_tuple, remoteslot, InvalidBuffer, false);
+ SlotModifyCStrings(remoteslot, newtup.values, newtup.changed);
+ MemoryContextSwitchTo(oldctx);
+
+ EvalPlanQualSetSlot(&epqstate, remoteslot);
+
+ ExecUpdate(&localslot->tts_tuple->t_self,
+ localslot->tts_tuple,
+ remoteslot,
+ localslot,
+ &epqstate,
+ estate,
+ false);
+ }
+ else
+ {
+ /*
+ * The tuple to be updated could not be found.
+ *
+ * TODO what to do here?
+ */
+ }
+
+ /* Cleanup. */
+ ExecCloseIndices(estate->es_result_relation_info);
+ PopActiveSnapshot();
+ EvalPlanQualEnd(&epqstate);
+ ExecResetTupleTable(estate->es_tupleTable, false);
+ FreeExecutorState(estate);
+
+ logicalreprel_close(rel, NoLock);
+
+ CommandCounterIncrement();
+}
+
+/*
+ * Handle DELETE message.
+ *
+ * TODO: FDW support
+ */
+static void
+handle_delete(StringInfo s)
+{
+ LogicalRepRelMapEntry *rel;
+ LogicalRepTupleData oldtup;
+ LogicalRepRelId relid;
+ EState *estate;
+ EPQState epqstate;
+ TupleTableSlot *remoteslot;
+ TupleTableSlot *localslot;
+ bool found;
+ MemoryContext oldctx;
+
+ ensure_transaction();
+
+ relid = logicalrep_read_delete(s, &oldtup);
+ rel = logicalreprel_open(relid, RowExclusiveLock);
+
+ /* Initialize the executor state. */
+ estate = create_estate_for_relation(rel);
+ remoteslot = ExecInitExtraTupleSlot(estate);
+ ExecSetSlotDescriptor(remoteslot, RelationGetDescr(rel->rel));
+ localslot = ExecInitExtraTupleSlot(estate);
+ ExecSetSlotDescriptor(localslot, RelationGetDescr(rel->rel));
+ EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
+
+ PushActiveSnapshot(GetTransactionSnapshot());
+ ExecOpenIndices(estate->es_result_relation_info, false);
+
+ /* Find the tuple using the replica identity index. */
+ oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+ SlotStoreCStrings(remoteslot, oldtup.values);
+ MemoryContextSwitchTo(oldctx);
+ found = tuple_find_by_replidx(rel->rel, LockTupleExclusive,
+ remoteslot, localslot);
+ /* If found delete it. */
+ if (found)
+ {
+ EvalPlanQualSetSlot(&epqstate, localslot);
+ ExecDelete(&localslot->tts_tuple->t_self,
+ localslot->tts_tuple,
+ localslot,
+ &epqstate,
+ estate,
+ false);
+ }
+ else
+ {
+ /* The tuple to be deleted could not be found.*/
+ }
+
+ /* Cleanup. */
+ ExecCloseIndices(estate->es_result_relation_info);
+ PopActiveSnapshot();
+ EvalPlanQualEnd(&epqstate);
+ ExecResetTupleTable(estate->es_tupleTable, false);
+ FreeExecutorState(estate);
+
+ logicalreprel_close(rel, NoLock);
+
+ CommandCounterIncrement();
+}
+
+
+/*
+ * Logical replication protocol message dispatcher.
+ */
+static void
+handle_message(StringInfo s)
+{
+ char action = pq_getmsgbyte(s);
+
+ switch (action)
+ {
+ /* BEGIN */
+ case 'B':
+ handle_begin(s);
+ break;
+ /* COMMIT */
+ case 'C':
+ handle_commit(s);
+ break;
+ /* INSERT */
+ case 'I':
+ handle_insert(s);
+ break;
+ /* UPDATE */
+ case 'U':
+ handle_update(s);
+ break;
+ /* DELETE */
+ case 'D':
+ handle_delete(s);
+ break;
+ /* RELATION */
+ case 'R':
+ handle_relation(s);
+ break;
+ /* ORIGIN */
+ case 'O':
+ handle_origin(s);
+ break;
+ default:
+ elog(ERROR, "unknown action of type %c", action);
+ }
+}
+
+/*
+ * Figure out which write/flush positions to report to the walsender process.
+ *
+ * We can't simply report back the last LSN the walsender sent us because the
+ * local transaction might not yet be flushed to disk locally. Instead we
+ * build a list that associates local with remote LSNs for every commit. When
+ * reporting back the flush position to the sender we iterate that list and
+ * check which entries on it are already locally flushed. Those we can report
+ * as having been flushed.
+ *
+ * Returns true if there's no outstanding transactions that need to be
+ * flushed.
+ */
+static bool
+get_flush_position(XLogRecPtr *write, XLogRecPtr *flush)
+{
+ dlist_mutable_iter iter;
+ XLogRecPtr local_flush = GetFlushRecPtr();
+
+ *write = InvalidXLogRecPtr;
+ *flush = InvalidXLogRecPtr;
+
+ dlist_foreach_modify(iter, &lsn_mapping)
+ {
+ FlushPosition *pos =
+ dlist_container(FlushPosition, node, iter.cur);
+
+ *write = pos->remote_end;
+
+ if (pos->local_end <= local_flush)
+ {
+ *flush = pos->remote_end;
+ dlist_delete(iter.cur);
+ pfree(pos);
+ }
+ else
+ {
+ /*
+ * Don't want to uselessly iterate over the rest of the list which
+ * could potentially be long. Instead get the last element and
+ * grab the write position from there.
+ */
+ pos = dlist_tail_element(FlushPosition, node,
+ &lsn_mapping);
+ *write = pos->remote_end;
+ return false;
+ }
+ }
+
+ return dlist_is_empty(&lsn_mapping);
+}
+
+
+/*
+ * Apply main loop.
+ */
+static void
+ApplyLoop(void)
+{
+ XLogRecPtr last_received = InvalidXLogRecPtr;
+
+ /* Init the ApplyContext which we use for easier cleanup. */
+ ApplyContext = AllocSetContextCreate(TopMemoryContext,
+ "ApplyContext",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ /* mark as idle, before starting to loop */
+ pgstat_report_activity(STATE_IDLE, NULL);
+
+ while (!got_SIGTERM)
+ {
+ pgsocket fd = PGINVALID_SOCKET;
+ int rc;
+ int len;
+ char *buf = NULL;
+ bool endofstream = false;
+
+ CHECK_FOR_INTERRUPTS();
+
+ MemoryContextSwitchTo(ApplyContext);
+
+ len = wrcapi->receive(wrchandle, &buf, &fd);
+
+ if (len != 0)
+ {
+ /* Process the data */
+ for (;;)
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ if (len == 0)
+ {
+ break;
+ }
+ else if (len < 0)
+ {
+ elog(NOTICE, "data stream from provider has ended");
+ endofstream = true;
+ break;
+ }
+ else
+ {
+ int c;
+ StringInfoData s;
+
+ /* Ensure we are reading the data into our memory context. */
+ MemoryContextSwitchTo(ApplyContext);
+
+ initStringInfo(&s);
+ s.data = buf;
+ s.len = len;
+ s.maxlen = -1;
+
+ c = pq_getmsgbyte(&s);
+
+ if (c == 'w')
+ {
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+
+ start_lsn = pq_getmsgint64(&s);
+ end_lsn = pq_getmsgint64(&s);
+ pq_getmsgint64(&s); /* sendTime */
+
+ if (last_received < start_lsn)
+ last_received = start_lsn;
+
+ if (last_received < end_lsn)
+ last_received = end_lsn;
+
+ handle_message(&s);
+ }
+ else if (c == 'k')
+ {
+ XLogRecPtr endpos;
+ bool reply_requested;
+
+ endpos = pq_getmsgint64(&s);
+ /* timestamp = */ pq_getmsgint64(&s);
+ reply_requested = pq_getmsgbyte(&s);
+
+ send_feedback(endpos,
+ GetCurrentTimestamp(),
+ reply_requested);
+ }
+ /* other message types are purposefully ignored */
+ }
+
+ len = wrcapi->receive(wrchandle, &buf, &fd);
+ }
+ }
+
+ /* confirm all writes at once */
+ send_feedback(last_received, GetCurrentTimestamp(), false);
+
+ /* Cleanup the memory. */
+ MemoryContextResetAndDeleteChildren(ApplyContext);
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ /* Check if we need to exit the streaming loop. */
+ if (endofstream)
+ break;
+
+ /*
+ * Wait for more data or latch.
+ */
+ rc = WaitLatchOrSocket(&MyProc->procLatch,
+ WL_SOCKET_READABLE | WL_LATCH_SET |
+ WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ fd, 1000L);
+
+ /* Emergency bailout if postmaster has died */
+ if (rc & WL_POSTMASTER_DEATH)
+ proc_exit(1);
+
+ ResetLatch(&MyProc->procLatch);
+ }
+}
+
+/*
+ * Send a Standby Status Update message to server.
+ *
+ * 'recvpos' is the latest LSN we've received data to, force is set if we need
+ * to send a response to avoid timeouts.
+ */
+static void
+send_feedback(XLogRecPtr recvpos, int64 now, bool force)
+{
+ static StringInfo reply_message = NULL;
+
+ static XLogRecPtr last_recvpos = InvalidXLogRecPtr;
+ static XLogRecPtr last_writepos = InvalidXLogRecPtr;
+ static XLogRecPtr last_flushpos = InvalidXLogRecPtr;
+
+ XLogRecPtr writepos;
+ XLogRecPtr flushpos;
+
+ /* It's legal to not pass a recvpos */
+ if (recvpos < last_recvpos)
+ recvpos = last_recvpos;
+
+ if (get_flush_position(&writepos, &flushpos))
+ {
+ /*
+ * No outstanding transactions to flush, we can report the latest
+ * received position. This is important for synchronous replication.
+ */
+ flushpos = writepos = recvpos;
+ }
+
+ if (writepos < last_writepos)
+ writepos = last_writepos;
+
+ if (flushpos < last_flushpos)
+ flushpos = last_flushpos;
+
+ /* if we've already reported everything we're good */
+ if (!force &&
+ writepos == last_writepos &&
+ flushpos == last_flushpos)
+ return;
+
+ if (!reply_message)
+ {
+ MemoryContext oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+ reply_message = makeStringInfo();
+ MemoryContextSwitchTo(oldctx);
+ }
+ else
+ resetStringInfo(reply_message);
+
+ pq_sendbyte(reply_message, 'r');
+ pq_sendint64(reply_message, recvpos); /* write */
+ pq_sendint64(reply_message, flushpos); /* flush */
+ pq_sendint64(reply_message, writepos); /* apply */
+ pq_sendint64(reply_message, now); /* sendTime */
+ pq_sendbyte(reply_message, false); /* replyRequested */
+
+ elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, flush %X/%X",
+ force,
+ (uint32) (recvpos >> 32), (uint32) recvpos,
+ (uint32) (writepos >> 32), (uint32) writepos,
+ (uint32) (flushpos >> 32), (uint32) flushpos
+ );
+
+ wrcapi->send(wrchandle, reply_message->data, reply_message->len);
+
+ if (recvpos > last_recvpos)
+ last_recvpos = recvpos;
+ if (writepos > last_writepos)
+ last_writepos = writepos;
+ if (flushpos > last_flushpos)
+ last_flushpos = flushpos;
+}
+
+/* SIGTERM: set flag to exit at next convenient time */
+static void
+LogicalWorkerSigTermHandler(SIGNAL_ARGS)
+{
+ got_SIGTERM = true;
+}
+
+/* Logical Replication Apply worker entry point */
+void
+ApplyWorkerMain(Datum main_arg)
+{
+ int worker_slot = DatumGetObjectId(main_arg);
+ MemoryContext oldctx;
+ RepOriginId originid;
+ XLogRecPtr origin_startpos;
+ char *options;
+ walrcvconn_init_fn walrcvconn_init;
+
+ /* Attach to slot */
+ logicalrep_worker_attach(worker_slot);
+
+ /* Setup signal handling */
+ pqsignal(SIGTERM, LogicalWorkerSigTermHandler);
+ BackgroundWorkerUnblockSignals();
+
+ /* Make it easy to identify our processes. */
+ SetConfigOption("application_name", MyBgworkerEntry->bgw_name,
+ PGC_USERSET, PGC_S_SESSION);
+
+ /* Load the libpq-specific functions */
+ wrcapi = palloc0(sizeof(WalReceiverConnAPI));
+
+ walrcvconn_init = (walrcvconn_init_fn)
+ load_external_function("libpqwalreceiver",
+ "_PG_walreceirver_conn_init", false, NULL);
+
+ if (walrcvconn_init == NULL)
+ elog(ERROR, "libpqwalreceiver does not declare _PG_walreceirver_conn_init symbol");
+
+ wrchandle = walrcvconn_init(wrcapi);
+ if (wrcapi->connect == NULL ||
+ wrcapi->startstreaming_logical == NULL ||
+ wrcapi->identify_system == NULL ||
+ wrcapi->receive == NULL || wrcapi->send == NULL ||
+ wrcapi->disconnect == NULL)
+ elog(ERROR, "libpqwalreceiver didn't initialize correctly");
+
+ Assert(CurrentResourceOwner == NULL);
+ CurrentResourceOwner = ResourceOwnerCreate(NULL,
+ "logical replication apply");
+
+ /* Setup synchronous commit according to the user's wishes */
+/* SetConfigOption("synchronous_commit",
+ logical_apply_synchronous_commit,
+ PGC_BACKEND, PGC_S_OVERRIDE);
+*/
+ /* Run as replica session replication role. */
+ SetConfigOption("session_replication_role", "replica",
+ PGC_SUSET, PGC_S_OVERRIDE);
+
+ /* Connect to our database. */
+ BackgroundWorkerInitializeConnectionByOid(MyLogicalRepWorker->dbid,
+ InvalidOid);
+
+ StartTransactionCommand();
+
+ /* Load the subscription into persistent memory context. */
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+ MySubscription = GetSubscription(MyLogicalRepWorker->subid);
+ MemoryContextSwitchTo(oldctx);
+
+ elog(LOG, "logical replication apply for subscription %s started",
+ MySubscription->name);
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(MySubscription->slotname, true);
+ if (!OidIsValid(originid))
+ originid = replorigin_create(MySubscription->slotname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ /* Connect to the origin and start the replication. */
+ elog(DEBUG1, "connecting to provider using connection string %s",
+ MySubscription->conninfo);
+ wrcapi->connect(wrchandle, MySubscription->conninfo, true,
+ MySubscription->name);
+
+ /* Build option string for the plugin. */
+ options = logicalrep_build_options(MySubscription->publications);
+
+ /* Start streaming from the slot. */
+ wrcapi->startstreaming_logical(wrchandle, origin_startpos,
+ MySubscription->slotname, options);
+
+ /* Run the main loop. */
+ ApplyLoop();
+
+ wrcapi->disconnect(wrchandle);
+
+ /* We should only get here if we received sigTERM */
+ proc_exit(0);
+}
+
+/*
+ * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
+ * is setup to match 'rel' (*NOT* idxrel!).
+ *
+ * Returns whether any column contains NULLs.
+ *
+ * This is not generic routine, it expects the idxrel to be replication
+ * identity of a rel and meet all limitations associated with that.
+ */
+static bool
+build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
+ TupleTableSlot *searchslot)
+{
+ int attoff;
+ bool isnull;
+ Datum indclassDatum;
+ oidvector *opclass;
+ int2vector *indkey = &idxrel->rd_index->indkey;
+ bool hasnulls = false;
+
+ Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel));
+
+ indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ opclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ /* Build scankey for every attribute in the index. */
+ for (attoff = 0; attoff < RelationGetNumberOfAttributes(idxrel); attoff++)
+ {
+ Oid operator;
+ Oid opfamily;
+ RegProcedure regop;
+ int pkattno = attoff + 1;
+ int mainattno = indkey->values[attoff];
+ Oid atttype = attnumTypeId(rel, mainattno);
+ Oid optype = get_opclass_input_type(opclass->values[attoff]);
+
+ /*
+ * Load the operator info, we need this to get the equality operator
+ * function for the scankey.
+ */
+ opfamily = get_opclass_family(opclass->values[attoff]);
+
+ operator = get_opfamily_member(opfamily, optype,
+ optype,
+ BTEqualStrategyNumber);
+
+ if (!OidIsValid(operator))
+ elog(ERROR,
+ "could not lookup equality operator for type %u, optype %u in opfamily %u",
+ atttype, optype, opfamily);
+
+ regop = get_opcode(operator);
+
+ /* Initialize the scankey. */
+ ScanKeyInit(&skey[attoff],
+ pkattno,
+ BTEqualStrategyNumber,
+ regop,
+ searchslot->tts_values[mainattno - 1]);
+
+ /* Check for null value. */
+ if (searchslot->tts_isnull[mainattno - 1])
+ {
+ hasnulls = true;
+ skey[attoff].sk_flags |= SK_ISNULL;
+ }
+ }
+
+ return hasnulls;
+}
+
+/*
+ * Search the relation 'rel' for tuple using the replication index.
+ *
+ * If a matching tuple is found lock it with lockmode, fill the slot with its
+ * contents and return true, return false is returned otherwise.
+ */
+static bool
+tuple_find_by_replidx(Relation rel, LockTupleMode lockmode,
+ TupleTableSlot *searchslot, TupleTableSlot *slot)
+{
+ HeapTuple scantuple;
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ IndexScanDesc scan;
+ SnapshotData snap;
+ TransactionId xwait;
+ Oid idxoid;
+ Relation idxrel;
+ bool found;
+
+ /* Open REPLICA IDENTITY index.*/
+ idxoid = RelationGetReplicaIndex(rel);
+ if (!OidIsValid(idxoid))
+ {
+ elog(ERROR, "could not find configured replica identity for table \"%s\"",
+ RelationGetRelationName(rel));
+ return false;
+ }
+ idxrel = index_open(idxoid, RowExclusiveLock);
+
+ /* Start an index scan. */
+ InitDirtySnapshot(snap);
+ scan = index_beginscan(rel, idxrel, &snap,
+ RelationGetNumberOfAttributes(idxrel),
+ 0);
+
+ /* Build scan key. */
+ build_replindex_scan_key(skey, rel, idxrel, searchslot);
+
+retry:
+ found = false;
+
+ index_rescan(scan, skey, RelationGetNumberOfAttributes(idxrel), NULL, 0);
+
+ /* Try to find the tuple */
+ if ((scantuple = index_getnext(scan, ForwardScanDirection)) != NULL)
+ {
+ found = true;
+ ExecStoreTuple(scantuple, slot, InvalidBuffer, false);
+ ExecMaterializeSlot(slot);
+
+ xwait = TransactionIdIsValid(snap.xmin) ?
+ snap.xmin : snap.xmax;
+
+ /*
+ * If the tuple is locked, wait for locking transaction to finish
+ * and retry.
+ */
+ if (TransactionIdIsValid(xwait))
+ {
+ XactLockTableWait(xwait, NULL, NULL, XLTW_None);
+ goto retry;
+ }
+ }
+
+ /* Found tuple, try to lock it in the lockmode. */
+ if (found)
+ {
+ Buffer buf;
+ HeapUpdateFailureData hufd;
+ HTSU_Result res;
+ HeapTupleData locktup;
+
+ ItemPointerCopy(&slot->tts_tuple->t_self, &locktup.t_self);
+
+ PushActiveSnapshot(GetLatestSnapshot());
+
+ res = heap_lock_tuple(rel, &locktup, GetCurrentCommandId(false),
+ lockmode,
+ false /* wait */,
+ false /* don't follow updates */,
+ &buf, &hufd);
+ /* the tuple slot already has the buffer pinned */
+ ReleaseBuffer(buf);
+
+ PopActiveSnapshot();
+
+ switch (res)
+ {
+ case HeapTupleMayBeUpdated:
+ break;
+ case HeapTupleUpdated:
+ /* XXX: Improve handling here */
+ ereport(LOG,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("concurrent update, retrying")));
+ goto retry;
+ case HeapTupleInvisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ default:
+ elog(ERROR, "unexpected heap_lock_tuple status: %u", res);
+ break;
+ }
+ }
+
+ index_endscan(scan);
+
+ /* Don't release lock until commit. */
+ index_close(idxrel, NoLock);
+
+ return found;
+}
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
new file mode 100644
index 0000000..385260e
--- /dev/null
+++ b/src/backend/replication/logical/launcher.c
@@ -0,0 +1,542 @@
+/*-------------------------------------------------------------------------
+ * launcher.c
+ * PostgreSQL logical replication apply launcher process
+ *
+ * Copyright (c) 2012-2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logical/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "pgstat.h"
+
+#include "access/heapam.h"
+#include "access/htup.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+
+#include "catalog/pg_subscription.h"
+
+#include "libpq/pqsignal.h"
+
+#include "postmaster/bgworker.h"
+#include "postmaster/fork_process.h"
+#include "postmaster/postmaster.h"
+
+#include "replication/logicalworker.h"
+#include "replication/subscription.h"
+
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/procsignal.h"
+
+#include "tcop/tcopprot.h"
+
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/timeout.h"
+#include "utils/snapmgr.h"
+
+int max_logical_replication_workers = 4;
+LogicalRepWorker *MyLogicalRepWorker = NULL;
+
+typedef struct LogicalRepCtxStruct
+{
+ /* Supervisor process. */
+ pid_t launcher_pid;
+
+ /* Background workers. */
+ LogicalRepWorker workers[FLEXIBLE_ARRAY_MEMBER];
+} LogicalRepCtxStruct;
+
+LogicalRepCtxStruct *LogicalRepCtx;
+
+static LogicalRepWorker *logicalrep_worker_find(Oid subid);
+static void logicalrep_worker_launch(Oid dbid, Oid subid);
+static void logicalrep_worker_stop(LogicalRepWorker *worker);
+static void logicalrep_worker_onexit(int code, Datum arg);
+static void logicalrep_worker_detach(void);
+
+static bool xacthook_do_signal_launcher = false;
+
+/*
+ * Load the list of subscriptions.
+ *
+ * Only the fields interesting for worker start/stop functions are filled for
+ * each subscription.
+ */
+static List *
+get_subscription_list(void)
+{
+ List *res = NIL;
+ Relation rel;
+ HeapScanDesc scan;
+ HeapTuple tup;
+ MemoryContext resultcxt;
+
+ /* This is the context that we will allocate our output data in */
+ resultcxt = CurrentMemoryContext;
+
+ /*
+ * Start a transaction so we can access pg_database, and get a snapshot.
+ * We don't have a use for the snapshot itself, but we're interested in
+ * the secondary effect that it sets RecentGlobalXmin. (This is critical
+ * for anything that reads heap pages, because HOT may decide to prune
+ * them even if the process doesn't attempt to modify any tuples.)
+ */
+ StartTransactionCommand();
+ (void) GetTransactionSnapshot();
+
+ rel = heap_open(SubscriptionRelationId, AccessShareLock);
+ scan = heap_beginscan_catalog(rel, 0, NULL);
+
+ while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+ {
+ Form_pg_subscription subform = (Form_pg_subscription) GETSTRUCT(tup);
+ Subscription *sub;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate our results in the caller's context, not the
+ * transaction's. We do this inside the loop, and restore the original
+ * context at the end, so that leaky things like heap_getnext() are
+ * not called in a potentially long-lived context.
+ */
+ oldcxt = MemoryContextSwitchTo(resultcxt);
+
+ sub = (Subscription *) palloc(sizeof(Subscription));
+ sub->oid = HeapTupleGetOid(tup);
+ sub->dbid = subform->dbid;
+ sub->enabled = subform->subenabled;
+
+ /* We don't fill fields we are not intereste in. */
+ sub->name = NULL;
+ sub->conninfo = NULL;
+ sub->slotname = NULL;
+ sub->publications = NIL;
+
+ res = lappend(res, sub);
+ MemoryContextSwitchTo(oldcxt);
+ }
+
+ heap_endscan(scan);
+ heap_close(rel, AccessShareLock);
+
+ CommitTransactionCommand();
+
+ return res;
+}
+
+/*
+ * Wait for a background worker to start up and attach to the shmem context.
+ *
+ * This is like WaitForBackgroundWorkerStartup(), except that we wait for
+ * attaching, not just start and we also just exit if postmaster died.
+ */
+static bool
+WaitForReplicationWorkerAttach(LogicalRepWorker *worker,
+ BackgroundWorkerHandle *handle)
+{
+ BgwHandleStatus status;
+ int rc;
+
+ for (;;)
+ {
+ pid_t pid;
+
+ CHECK_FOR_INTERRUPTS();
+
+ status = GetBackgroundWorkerPid(handle, &pid);
+
+ /*
+ * Worker started and attached to our shmem. This check is safe
+ * because only laucher ever starts the workers, so nobody can steal
+ * the worker slot.
+ */
+ if (status == BGWH_STARTED && worker->proc)
+ return true;
+ /* Worker didn't start or died before attaching to our shmem. */
+ if (status == BGWH_STOPPED)
+ return false;
+
+ /*
+ * We need timeout because we generaly don't get notified via latch
+ * about the worker attach.
+ */
+ rc = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, 1000L);
+
+ if (rc & WL_POSTMASTER_DEATH)
+ proc_exit(1);
+
+ ResetLatch(MyLatch);
+ }
+
+ return status;
+}
+
+/*
+ * Walks the workers array and searches for one that matches given
+ * subscription id.
+ */
+static LogicalRepWorker *
+logicalrep_worker_find(Oid subid)
+{
+ int i;
+ LogicalRepWorker *res = NULL;
+
+ /* Block concurrent modification. */
+ LWLockAcquire(LogicalRepLauncherLock, LW_SHARED);
+
+ /* Search for attached worker for a given subscription id. */
+ for (i = 0; i < max_logical_replication_workers; i++)
+ {
+ LogicalRepWorker *w = &LogicalRepCtx->workers[i];
+ if (w->subid == subid && w->proc && IsBackendPid(w->proc->pid))
+ {
+ res = w;
+ break;
+ }
+ }
+
+ LWLockRelease(LogicalRepLauncherLock);
+
+ return res;
+}
+
+/*
+ * Start new apply background worker.
+ */
+static void
+logicalrep_worker_launch(Oid dbid, Oid subid)
+{
+ BackgroundWorker bgw;
+ BackgroundWorkerHandle *bgw_handle;
+ int slot;
+ LogicalRepWorker *worker = NULL;
+
+ ereport(LOG,
+ (errmsg("starting logical replication worker for subscription %u",
+ subid)));
+
+ /*
+ * We need to do the modification of the shared memory under lock so that
+ * we have consistent view.
+ */
+ LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE);
+
+ /* Find unused worker slot. */
+ for (slot = 0; slot < max_logical_replication_workers; slot++)
+ {
+ if (!LogicalRepCtx->workers[slot].proc)
+ {
+ worker = &LogicalRepCtx->workers[slot];
+ break;
+ }
+ }
+
+ /* Bail if not found */
+ if (worker == NULL)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+ errmsg("logical replication worker registration failed, "
+ "you might want to increase "
+ "max_logical_replication_workers setting")));
+ return;
+ }
+
+ /* Prepare the worker info. */
+ memset(worker, 0, sizeof(LogicalRepWorker));
+ worker->dbid = dbid;
+ worker->subid = subid;
+
+ LWLockRelease(LogicalRepLauncherLock);
+
+ /* Register the new dynamic worker. */
+ bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+ BGWORKER_BACKEND_DATABASE_CONNECTION;
+ bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+ bgw.bgw_main = ApplyWorkerMain;
+
+ bgw.bgw_restart_time = BGW_NEVER_RESTART;
+ bgw.bgw_notify_pid = MyProcPid;
+ bgw.bgw_main_arg = slot;
+
+ if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+ errmsg("logical replication worker registration failed, "
+ "you might want to increase "
+ "max_logical_replication_workers setting")));
+ return;
+ }
+
+ /* Now wait until it attaches. */
+ if (!WaitForReplicationWorkerAttach(worker, bgw_handle))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+ errmsg("could not launch logical replication worker")));
+ return;
+ }
+}
+
+/*
+ * Stop the logical replication worker and wait until it detaches from the
+ * slot.
+ */
+static void
+logicalrep_worker_stop(LogicalRepWorker *worker)
+{
+ LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE);
+
+ /* Check that the worker is up and what we expect. */
+ if (!worker->proc)
+ return;
+ if (!IsBackendPid(worker->proc->pid))
+ return;
+
+ /* Terminate the worker. */
+ kill(worker->proc->pid, SIGTERM);
+
+ LWLockRelease(LogicalRepLauncherLock);
+
+ /* Wait for it to detach. */
+ for (;;)
+ {
+ int rc = WaitLatch(&MyProc->procLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ 1000L);
+
+ /* emergency bailout if postmaster has died */
+ if (rc & WL_POSTMASTER_DEATH)
+ proc_exit(1);
+
+ ResetLatch(&MyProc->procLatch);
+
+ CHECK_FOR_INTERRUPTS();
+
+ if (!worker->proc)
+ return;
+ }
+}
+
+/*
+ * Attach to a slot.
+ */
+void
+logicalrep_worker_attach(int slot)
+{
+ /* Block concurrent access. */
+ LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE);
+
+ Assert(slot >= 0 && slot < max_logical_replication_workers);
+ MyLogicalRepWorker = &LogicalRepCtx->workers[slot];
+
+ if (MyLogicalRepWorker->proc)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication worker slot %d already used by "
+ "another worker", slot)));
+
+ MyLogicalRepWorker->proc = MyProc;
+ before_shmem_exit(logicalrep_worker_onexit, (Datum) 0);
+
+ LWLockRelease(LogicalRepLauncherLock);
+}
+
+/*
+ * Detach the worker (cleans up the worker info).
+ */
+static void
+logicalrep_worker_detach(void)
+{
+ /* Block concurrent access. */
+ LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE);
+
+ MyLogicalRepWorker->dbid = InvalidOid;
+ MyLogicalRepWorker->subid = InvalidOid;
+ MyLogicalRepWorker->proc = NULL;
+
+ LWLockRelease(LogicalRepLauncherLock);
+}
+
+/*
+ * Cleanup function.
+ *
+ * Called on logical replication worker exit.
+ */
+static void
+logicalrep_worker_onexit(int code, Datum arg)
+{
+ logicalrep_worker_detach();
+}
+
+/*
+ * ApplyLauncherShmemSize
+ * Compute space needed for replication launcher shared memory
+ */
+Size
+ApplyLauncherShmemSize(void)
+{
+ Size size;
+
+ /*
+ * Need the fixed struct and the array of LogicalRepWorker.
+ */
+ size = sizeof(LogicalRepCtxStruct);
+ size = MAXALIGN(size);
+ size = add_size(size, mul_size(max_logical_replication_workers,
+ sizeof(LogicalRepWorker)));
+ return size;
+}
+
+/*
+ * ApplyLauncherShmemInit
+ * Allocate and initialize replication launcher shared memory
+ */
+void
+ApplyLauncherShmemInit(void)
+{
+ bool found;
+
+ LogicalRepCtx = (LogicalRepCtxStruct *)
+ ShmemInitStruct("Logical Replication Launcher Data",
+ ApplyLauncherShmemSize(),
+ &found);
+
+ if (IsUnderPostmaster)
+ {
+ Assert(found);
+ return;
+ }
+
+ memset(LogicalRepCtx, 0, ApplyLauncherShmemSize());
+}
+
+static void
+xacthook_signal_launcher(XactEvent event, void *arg)
+{
+ switch (event)
+ {
+ case XACT_EVENT_COMMIT:
+ if (xacthook_do_signal_launcher)
+ ApplyLauncherWakeup();
+ break;
+ default:
+ /* We're not interested in other tx events */
+ break;
+ }
+}
+
+void
+ApplyLauncherWakeupOnCommit(void)
+{
+ if (!xacthook_do_signal_launcher)
+ {
+ RegisterXactCallback(xacthook_signal_launcher, NULL);
+ xacthook_do_signal_launcher = true;
+ }
+}
+
+void
+ApplyLauncherWakeup(void)
+{
+ if (IsBackendPid(LogicalRepCtx->launcher_pid))
+ kill(LogicalRepCtx->launcher_pid, SIGUSR1);
+}
+
+/*
+ * Main loop for the apply launcher process.
+ */
+void
+ApplyLauncherMain(Datum main_arg)
+{
+ ereport(LOG,
+ (errmsg("logical replication launcher started")));
+
+ /* Establish signal handlers. */
+ pqsignal(SIGTERM, die);
+ BackgroundWorkerUnblockSignals();
+
+ /* Make it easy to identify our processes. */
+ SetConfigOption("application_name", MyBgworkerEntry->bgw_name,
+ PGC_USERSET, PGC_S_SESSION);
+
+ LogicalRepCtx->launcher_pid = MyProcPid;
+
+ /*
+ * Establish connection to nailed catalogs (we only ever access
+ * pg_subscription).
+ */
+ BackgroundWorkerInitializeConnection(NULL, NULL);
+
+ /* Enter main loop */
+ for (;;)
+ {
+ int rc;
+ List *sublist;
+ ListCell *lc;
+ Subscription *startsub = NULL;
+ MemoryContext subctx;
+ MemoryContext oldctx;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Use temporary context for the database list and worker info. */
+ subctx = AllocSetContextCreate(TopMemoryContext,
+ "Logical Replication Launcher sublist",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ oldctx = MemoryContextSwitchTo(subctx);
+
+ /* Search for subscriptions to start or stop. */
+ sublist = get_subscription_list();
+ foreach(lc, sublist)
+ {
+ Subscription *sub = (Subscription *) lfirst(lc);
+ LogicalRepWorker *w = logicalrep_worker_find(sub->oid);
+
+ if (sub->enabled && w == NULL && startsub == NULL)
+ startsub = sub;
+ else if (!sub->enabled && w != NULL)
+ logicalrep_worker_stop(w);
+ }
+
+ if (startsub)
+ logicalrep_worker_launch(startsub->dbid, startsub->oid);
+
+ /* Switch back to original memory context. */
+ MemoryContextSwitchTo(oldctx);
+ /* Clean the temporary memory. */
+ MemoryContextDelete(subctx);
+
+ /* Wait for more work. */
+ rc = WaitLatch(&MyProc->procLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ startsub ? 5000L : 180000L);
+
+ /* emergency bailout if postmaster has died */
+ if (rc & WL_POSTMASTER_DEATH)
+ proc_exit(1);
+
+ ResetLatch(&MyProc->procLatch);
+ }
+
+ LogicalRepCtx->launcher_pid = 0;
+
+ /* ... and if it returns, we're done */
+ ereport(LOG,
+ (errmsg("logical replication launcher shutting down")));
+
+ proc_exit(0);
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index c04b17f..423cb0f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
#include "postmaster/bgworker_internals.h"
#include "postmaster/bgwriter.h"
#include "postmaster/postmaster.h"
+#include "replication/logicalworker.h"
#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
@@ -137,6 +138,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
size = add_size(size, ReplicationOriginShmemSize());
size = add_size(size, WalSndShmemSize());
size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, ApplyLauncherShmemSize());
size = add_size(size, SnapMgrShmemSize());
size = add_size(size, BTreeShmemSize());
size = add_size(size, SyncScanShmemSize());
@@ -245,6 +247,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
ReplicationOriginShmemInit();
WalSndShmemInit();
WalRcvShmemInit();
+ ApplyLauncherShmemInit();
/*
* Set up other modules that need some shared memory space
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f8996cd..4488ff7 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -47,3 +47,4 @@ CommitTsLock 39
ReplicationOriginLock 40
MultiXactTruncationLock 41
OldSnapshotTimeMapLock 42
+LogicalRepLauncherLock 43
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9c93df0..32856db 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -58,6 +58,7 @@
#include "postmaster/postmaster.h"
#include "postmaster/syslogger.h"
#include "postmaster/walwriter.h"
+#include "replication/logicalworker.h"
#include "replication/slot.h"
#include "replication/syncrep.h"
#include "replication/walreceiver.h"
@@ -171,6 +172,7 @@ static const char *show_tcp_keepalives_interval(void);
static const char *show_tcp_keepalives_count(void);
static bool check_maxconnections(int *newval, void **extra, GucSource source);
static bool check_max_worker_processes(int *newval, void **extra, GucSource source);
+static bool check_max_logical_replication_workers(int *newval, void **extra, GucSource source);
static bool check_autovacuum_max_workers(int *newval, void **extra, GucSource source);
static bool check_autovacuum_work_mem(int *newval, void **extra, GucSource source);
static bool check_effective_io_concurrency(int *newval, void **extra, GucSource source);
@@ -2474,6 +2476,18 @@ static struct config_int ConfigureNamesInt[] =
},
{
+ {"max_logical_replication_processes",
+ PGC_POSTMASTER,
+ RESOURCES_ASYNCHRONOUS,
+ gettext_noop("Maximum number of logical replication worker processes."),
+ NULL,
+ },
+ &max_logical_replication_workers,
+ 4, 1, MAX_BACKENDS,
+ check_max_logical_replication_workers, NULL, NULL
+ },
+
+ {
{"log_rotation_age", PGC_SIGHUP, LOGGING_WHERE,
gettext_noop("Automatic log file rotation will occur after N minutes."),
NULL,
@@ -10184,6 +10198,14 @@ check_max_worker_processes(int *newval, void **extra, GucSource source)
}
static bool
+check_max_logical_replication_workers(int *newval, void **extra, GucSource source)
+{
+ if (*newval > max_worker_processes)
+ return false;
+ return true;
+}
+
+static bool
check_effective_io_concurrency(int *newval, void **extra, GucSource source)
{
#ifdef USE_PREFETCH
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 6b66353..dfb7e7c 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -19,5 +19,25 @@ extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate,
extern TupleTableSlot *ExecModifyTable(ModifyTableState *node);
extern void ExecEndModifyTable(ModifyTableState *node);
extern void ExecReScanModifyTable(ModifyTableState *node);
+extern TupleTableSlot *ExecInsert(ModifyTableState *mtstate,
+ TupleTableSlot *slot,
+ TupleTableSlot *planSlot,
+ List *arbiterIndexes,
+ OnConflictAction onconflict,
+ EState *estate,
+ bool canSetTag);
+extern TupleTableSlot *ExecDelete(ItemPointer tupleid,
+ HeapTuple oldtuple,
+ TupleTableSlot *planSlot,
+ EPQState *epqstate,
+ EState *estate,
+ bool canSetTag);
+extern TupleTableSlot *ExecUpdate(ItemPointer tupleid,
+ HeapTuple oldtuple,
+ TupleTableSlot *slot,
+ TupleTableSlot *planSlot,
+ EPQState *epqstate,
+ EState *estate,
+ bool canSetTag);
#endif /* NODEMODIFYTABLE_H */
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index cd6cd44..3f8e764 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -52,4 +52,6 @@ extern void StartBackgroundWorker(void) pg_attribute_noreturn();
extern BackgroundWorker *BackgroundWorkerEntry(int slotno);
#endif
+extern bool internal_bgworker_registration_in_progress;
+
#endif /* BGWORKER_INTERNALS_H */
diff --git a/src/include/replication/logicalworker.h b/src/include/replication/logicalworker.h
new file mode 100644
index 0000000..64f36d3
--- /dev/null
+++ b/src/include/replication/logicalworker.h
@@ -0,0 +1,41 @@
+/*-------------------------------------------------------------------------
+ *
+ * logicalworker.h
+ * Exports for logical replication workers.
+ *
+ * Portions Copyright (c) 2010-2016, PostgreSQL Global Development Group
+ *
+ * src/include/replication/logicalworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LOGICALWORKER_H
+#define LOGICALWORKER_H
+
+typedef struct LogicalRepWorker
+{
+ /* Pointer to proc array. NULL if not running. */
+ PGPROC *proc;
+
+ /* Database id to connect to. */
+ Oid dbid;
+
+ /* Subscription id for the worker. */
+ Oid subid;
+} LogicalRepWorker;
+
+extern int max_logical_replication_workers;
+extern LogicalRepWorker *MyLogicalRepWorker;
+
+extern void ApplyLauncherMain(Datum main_arg);
+extern void ApplyWorkerMain(Datum main_arg);
+
+extern Size ApplyLauncherShmemSize(void);
+extern void ApplyLauncherShmemInit(void);
+
+extern void ApplyLauncherWakeupOnCommit(void);
+extern void ApplyLauncherWakeup(void);
+
+extern void logicalrep_worker_attach(int slot);
+
+#endif /* LOGICALWORKER_H */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 0f42008..3801949 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -152,6 +152,9 @@ typedef char *(*walrcvconn_create_slot_fn) (
WalReceiverConnHandle *handle,
char *slotname, bool logical,
XLogRecPtr *lsn);
+typedef void (*walrcvconn_drop_slot_fn) (
+ WalReceiverConnHandle *handle,
+ char *slotname);
typedef bool (*walrcvconn_startstreaming_physical_fn) (
WalReceiverConnHandle *handle,
TimeLineID tli, XLogRecPtr startpoint,
@@ -174,6 +177,7 @@ typedef struct WalReceiverConnAPI {
walrcvconn_identify_system_fn identify_system;
walrcvconn_readtimelinehistoryfile_fn readtimelinehistoryfile;
walrcvconn_create_slot_fn create_slot;
+ walrcvconn_drop_slot_fn drop_slot;
walrcvconn_startstreaming_physical_fn startstreaming_physical;
walrcvconn_startstreaming_logical_fn startstreaming_logical;
walrcvconn_endstreaming_fn endstreaming;
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index e629373..408a2c1 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -407,8 +407,16 @@ sub init
if ($params{allows_streaming})
{
- print $conf "wal_level = replica\n";
+ if ($params{allows_streaming} eq "logical")
+ {
+ print $conf "wal_level = logical\n";
+ }
+ else
+ {
+ print $conf "wal_level = replica\n";
+ }
print $conf "max_wal_senders = 5\n";
+ print $conf "max_replication_slots = 5\n";
print $conf "wal_keep_segments = 20\n";
print $conf "max_wal_size = 128MB\n";
print $conf "shared_buffers = 1MB\n";
diff --git a/src/test/subscription/.gitignore b/src/test/subscription/.gitignore
new file mode 100644
index 0000000..871e943
--- /dev/null
+++ b/src/test/subscription/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/subscription/Makefile b/src/test/subscription/Makefile
new file mode 100644
index 0000000..54c4d19
--- /dev/null
+++ b/src/test/subscription/Makefile
@@ -0,0 +1,20 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/subscription
+#
+# Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/subscription/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/subscription
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+ $(prove_check)
+
+clean distclean maintainer-clean:
+ rm -rf tmp_check
diff --git a/src/test/subscription/README b/src/test/subscription/README
new file mode 100644
index 0000000..e9e9375
--- /dev/null
+++ b/src/test/subscription/README
@@ -0,0 +1,16 @@
+src/test/subscription/README
+
+Regression tests for subscription/logical replication
+=====================================================
+
+This directory contains a test suite for subscription/logical replication.
+
+Running the tests
+=================
+
+ make check
+
+NOTE: This creates a temporary installation, and some tests may
+create one or multiple nodes, for the purpose of the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
new file mode 100644
index 0000000..dca19c4
--- /dev/null
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -0,0 +1,89 @@
+# Basic logical replication test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 3;
+
+# Initialize provider node
+my $node_provider = get_new_node('provider');
+$node_provider->init(allows_streaming => 'logical');
+$node_provider->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create some preexisting content on provider
+$node_provider->safe_psql('postgres',
+ "CREATE TABLE tab_notrep AS SELECT generate_series(1,10) AS a");
+$node_provider->safe_psql('postgres',
+ "CREATE TABLE tab_ins (a int)");
+$node_provider->safe_psql('postgres',
+ "CREATE TABLE tab_rep (a int primary key)");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_notrep (a int)");
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_ins (a int)");
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_rep (a int primary key)");
+
+# Setup logical replication
+my $provider_connstr = $node_provider->connstr . ' dbname=postgres';
+$node_provider->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub");
+$node_provider->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub_ins_only WITH noreplicate_delete noreplicate_update");
+$node_provider->safe_psql('postgres',
+ "ALTER PUBLICATION tap_pub ADD TABLE tab_rep");
+$node_provider->safe_psql('postgres',
+ "ALTER PUBLICATION tap_pub_ins_only ADD TABLE tab_ins");
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub WITH CONNECTION '$provider_connstr' PUBLICATION tap_pub, tap_pub_ins_only");
+
+# Wait for subscriber to finish table sync
+my $appname = 'tap_sub';
+my $caughtup_query =
+"SELECT pg_current_xlog_location() <= write_location FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_provider->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_notrep");
+print "node_subscriber: $result\n";
+is($result, qq(0), 'check non-replicated table is empty on subscriber');
+
+
+$node_provider->safe_psql('postgres',
+ "INSERT INTO tab_ins SELECT generate_series(1,50)");
+$node_provider->safe_psql('postgres',
+ "DELETE FROM tab_ins WHERE a > 20");
+$node_provider->safe_psql('postgres',
+ "UPDATE tab_ins SET a = -a");
+
+$node_provider->safe_psql('postgres',
+ "INSERT INTO tab_rep SELECT generate_series(1,50)");
+$node_provider->safe_psql('postgres',
+ "DELETE FROM tab_rep WHERE a > 20");
+$node_provider->safe_psql('postgres',
+ "UPDATE tab_rep SET a = -a");
+
+$node_provider->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*), min(a), max(a) FROM tab_ins");
+print "node_subscriber: $result\n";
+is($result, qq(50|1|50), 'check replicated inserts on subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*), min(a), max(a) FROM tab_rep");
+print "node_subscriber: $result\n";
+is($result, qq(20|-20|-1), 'check replicated changes on subscriber');
+
+$node_subscriber->stop('fast');
+$node_provider->stop('fast');
diff --git a/src/test/subscription/t/002_types.pl b/src/test/subscription/t/002_types.pl
new file mode 100644
index 0000000..a126201
--- /dev/null
+++ b/src/test/subscription/t/002_types.pl
@@ -0,0 +1,509 @@
+# This tests that more complex datatypes are replicated correctly
+# by logical replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 3;
+
+# Initialize provider node
+my $node_provider = get_new_node('provider');
+$node_provider->init(allows_streaming => 'logical');
+$node_provider->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create some preexisting content on provider
+my $ddl = qq(
+ CREATE TABLE public.tst_one_array (
+ a INTEGER PRIMARY KEY,
+ b INTEGER[]
+ );
+ CREATE TABLE public.tst_arrays (
+ a INTEGER[] PRIMARY KEY,
+ b TEXT[],
+ c FLOAT[],
+ d INTERVAL[]
+ );
+
+ CREATE TYPE public.tst_enum_t AS ENUM ('a', 'b', 'c', 'd', 'e');
+ CREATE TABLE public.tst_one_enum (
+ a INTEGER PRIMARY KEY,
+ b public.tst_enum_t
+ );
+ CREATE TABLE public.tst_enums (
+ a public.tst_enum_t PRIMARY KEY,
+ b public.tst_enum_t[]
+ );
+
+ CREATE TYPE public.tst_comp_basic_t AS (a FLOAT, b TEXT, c INTEGER);
+ CREATE TYPE public.tst_comp_enum_t AS (a FLOAT, b public.tst_enum_t, c INTEGER);
+ CREATE TYPE public.tst_comp_enum_array_t AS (a FLOAT, b public.tst_enum_t[], c INTEGER);
+ CREATE TABLE public.tst_one_comp (
+ a INTEGER PRIMARY KEY,
+ b public.tst_comp_basic_t
+ );
+ CREATE TABLE public.tst_comps (
+ a public.tst_comp_basic_t PRIMARY KEY,
+ b public.tst_comp_basic_t[]
+ );
+ CREATE TABLE public.tst_comp_enum (
+ a INTEGER PRIMARY KEY,
+ b public.tst_comp_enum_t
+ );
+ CREATE TABLE public.tst_comp_enum_array (
+ a public.tst_comp_enum_t PRIMARY KEY,
+ b public.tst_comp_enum_t[]
+ );
+ CREATE TABLE public.tst_comp_one_enum_array (
+ a INTEGER PRIMARY KEY,
+ b public.tst_comp_enum_array_t
+ );
+ CREATE TABLE public.tst_comp_enum_what (
+ a public.tst_comp_enum_array_t PRIMARY KEY,
+ b public.tst_comp_enum_array_t[]
+ );
+
+ CREATE TYPE public.tst_comp_mix_t AS (
+ a public.tst_comp_basic_t,
+ b public.tst_comp_basic_t[],
+ c public.tst_enum_t,
+ d public.tst_enum_t[]
+ );
+ CREATE TABLE public.tst_comp_mix_array (
+ a public.tst_comp_mix_t PRIMARY KEY,
+ b public.tst_comp_mix_t[]
+ );
+ CREATE TABLE public.tst_range (
+ a INTEGER PRIMARY KEY,
+ b int4range
+ );
+ CREATE TABLE public.tst_range_array (
+ a INTEGER PRIMARY KEY,
+ b TSTZRANGE,
+ c int8range[]
+ ););
+
+# Setup structure on both nodes
+$node_provider->safe_psql('postgres', $ddl);
+$node_subscriber->safe_psql('postgres', $ddl);
+
+# Setup logical replication
+my $provider_connstr = $node_provider->connstr . ' dbname=postgres';
+$node_provider->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub");
+$node_provider->safe_psql('postgres',
+ "ALTER PUBLICATION tap_pub ADD TABLE ALL IN SCHEMA public");
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub WITH CONNECTION '$provider_connstr' PUBLICATION tap_pub");
+
+# Wait for subscriber to finish table sync
+my $appname = 'tap_sub';
+my $caughtup_query =
+"SELECT pg_current_xlog_location() <= write_location FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_provider->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Insert initial test data
+$node_provider->safe_psql('postgres', qq(
+ -- test_tbl_one_array_col
+ INSERT INTO tst_one_array (a, b) VALUES
+ (1, '{1, 2, 3}'),
+ (2, '{2, 3, 1}'),
+ (3, '{3, 2, 1}'),
+ (4, '{4, 3, 2}'),
+ (5, '{5, NULL, 3}');
+
+ -- test_tbl_arrays
+ INSERT INTO tst_arrays (a, b, c, d) VALUES
+ ('{1, 2, 3}', '{"a", "b", "c"}', '{1.1, 2.2, 3.3}', '{"1 day", "2 days", "3 days"}'),
+ ('{2, 3, 1}', '{"b", "c", "a"}', '{2.2, 3.3, 1.1}', '{"2 minutes", "3 minutes", "1 minute"}'),
+ ('{3, 1, 2}', '{"c", "a", "b"}', '{3.3, 1.1, 2.2}', '{"3 years", "1 year", "2 years"}'),
+ ('{4, 1, 2}', '{"d", "a", "b"}', '{4.4, 1.1, 2.2}', '{"4 years", "1 year", "2 years"}'),
+ ('{5, NULL, NULL}', '{"e", NULL, "b"}', '{5.5, 1.1, NULL}', '{"5 years", NULL, NULL}');
+
+ -- test_tbl_single_enum
+ INSERT INTO tst_one_enum (a, b) VALUES
+ (1, 'a'),
+ (2, 'b'),
+ (3, 'c'),
+ (4, 'd'),
+ (5, NULL);
+
+ -- test_tbl_enums
+ INSERT INTO tst_enums (a, b) VALUES
+ ('a', '{b, c}'),
+ ('b', '{c, a}'),
+ ('c', '{b, a}'),
+ ('d', '{c, b}'),
+ ('e', '{d, NULL}');
+
+ -- test_tbl_single_composites
+ INSERT INTO tst_one_comp (a, b) VALUES
+ (1, ROW(1.0, 'a', 1)),
+ (2, ROW(2.0, 'b', 2)),
+ (3, ROW(3.0, 'c', 3)),
+ (4, ROW(4.0, 'd', 4)),
+ (5, ROW(NULL, NULL, 5));
+
+ -- test_tbl_composites
+ INSERT INTO tst_comps (a, b) VALUES
+ (ROW(1.0, 'a', 1), ARRAY[ROW(1, 'a', 1)::tst_comp_basic_t]),
+ (ROW(2.0, 'b', 2), ARRAY[ROW(2, 'b', 2)::tst_comp_basic_t]),
+ (ROW(3.0, 'c', 3), ARRAY[ROW(3, 'c', 3)::tst_comp_basic_t]),
+ (ROW(4.0, 'd', 4), ARRAY[ROW(4, 'd', 3)::tst_comp_basic_t]),
+ (ROW(5.0, 'e', NULL), ARRAY[NULL, ROW(5, NULL, 5)::tst_comp_basic_t]);
+
+ -- test_tbl_composite_with_enums
+ INSERT INTO tst_comp_enum (a, b) VALUES
+ (1, ROW(1.0, 'a', 1)),
+ (2, ROW(2.0, 'b', 2)),
+ (3, ROW(3.0, 'c', 3)),
+ (4, ROW(4.0, 'd', 4)),
+ (5, ROW(NULL, 'e', NULL));
+
+ -- test_tbl_composite_with_enums_array
+ INSERT INTO tst_comp_enum_array (a, b) VALUES
+ (ROW(1.0, 'a', 1), ARRAY[ROW(1, 'a', 1)::tst_comp_enum_t]),
+ (ROW(2.0, 'b', 2), ARRAY[ROW(2, 'b', 2)::tst_comp_enum_t]),
+ (ROW(3.0, 'c', 3), ARRAY[ROW(3, 'c', 3)::tst_comp_enum_t]),
+ (ROW(4.0, 'd', 3), ARRAY[ROW(3, 'd', 3)::tst_comp_enum_t]),
+ (ROW(5.0, 'e', 3), ARRAY[ROW(3, 'e', 3)::tst_comp_enum_t, NULL]);
+
+ -- test_tbl_composite_with_single_enums_array_in_composite
+ INSERT INTO tst_comp_one_enum_array (a, b) VALUES
+ (1, ROW(1.0, '{a, b, c}', 1)),
+ (2, ROW(2.0, '{a, b, c}', 2)),
+ (3, ROW(3.0, '{a, b, c}', 3)),
+ (4, ROW(4.0, '{c, b, d}', 4)),
+ (5, ROW(5.0, '{NULL, e, NULL}', 5));
+
+ -- test_tbl_composite_with_enums_array_in_composite
+ INSERT INTO tst_comp_enum_what (a, b) VALUES
+ (ROW(1.0, '{a, b, c}', 1), ARRAY[ROW(1, '{a, b, c}', 1)::tst_comp_enum_array_t]),
+ (ROW(2.0, '{b, c, a}', 2), ARRAY[ROW(2, '{b, c, a}', 1)::tst_comp_enum_array_t]),
+ (ROW(3.0, '{c, a, b}', 1), ARRAY[ROW(3, '{c, a, b}', 1)::tst_comp_enum_array_t]),
+ (ROW(4.0, '{c, b, d}', 4), ARRAY[ROW(4, '{c, b, d}', 4)::tst_comp_enum_array_t]),
+ (ROW(5.0, '{c, NULL, b}', NULL), ARRAY[ROW(5, '{c, e, b}', 1)::tst_comp_enum_array_t]);
+
+ -- test_tbl_mixed_composites
+ INSERT INTO tst_comp_mix_array (a, b) VALUES
+ (ROW(
+ ROW(1,'a',1),
+ ARRAY[ROW(1,'a',1)::tst_comp_basic_t, ROW(2,'b',2)::tst_comp_basic_t],
+ 'a',
+ '{a,b,NULL,c}'),
+ ARRAY[
+ ROW(
+ ROW(1,'a',1),
+ ARRAY[
+ ROW(1,'a',1)::tst_comp_basic_t,
+ ROW(2,'b',2)::tst_comp_basic_t,
+ NULL
+ ],
+ 'a',
+ '{a,b,c}'
+ )::tst_comp_mix_t
+ ]
+ );
+
+ -- test_tbl_range
+ INSERT INTO tst_range (a, b) VALUES
+ (1, '[1, 10]'),
+ (2, '[2, 20]'),
+ (3, '[3, 30]'),
+ (4, '[4, 40]'),
+ (5, '[5, 50]');
+
+ -- test_tbl_range_array
+ INSERT INTO tst_range_array (a, b, c) VALUES
+ (1, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz, 'infinity'), '{"[1,2]", "[10,20]"}'),
+ (2, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz - interval '2 days', 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz), '{"[2,3]", "[20,30]"}'),
+ (3, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz - interval '3 days', 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz), '{"[3,4]"}'),
+ (4, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz - interval '4 days', 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz), '{"[4,5]", NULL, "[40,50]"}'),
+ (5, NULL, NULL);
+));
+
+$node_provider->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Check the data on subscriber
+my $result = $node_subscriber->safe_psql('postgres', qq(
+ SELECT a, b FROM tst_one_array ORDER BY a;
+ SELECT a, b, c, d FROM tst_arrays ORDER BY a;
+ SELECT a, b FROM tst_one_enum ORDER BY a;
+ SELECT a, b FROM tst_enums ORDER BY a;
+ SELECT a, b FROM tst_one_comp ORDER BY a;
+ SELECT a, b FROM tst_comps ORDER BY a;
+ SELECT a, b FROM tst_comp_enum ORDER BY a;
+ SELECT a, b FROM tst_comp_enum_array ORDER BY a;
+ SELECT a, b FROM tst_comp_one_enum_array ORDER BY a;
+ SELECT a, b FROM tst_comp_enum_what ORDER BY a;
+ SELECT a, b FROM tst_comp_mix_array ORDER BY a;
+ SELECT a, b FROM tst_range ORDER BY a;
+ SELECT a, b, c FROM tst_range_array ORDER BY a;
+));
+
+is($result, '1|{1,2,3}
+2|{2,3,1}
+3|{3,2,1}
+4|{4,3,2}
+5|{5,NULL,3}
+{1,2,3}|{a,b,c}|{1.1,2.2,3.3}|{"1 day","2 days","3 days"}
+{2,3,1}|{b,c,a}|{2.2,3.3,1.1}|{00:02:00,00:03:00,00:01:00}
+{3,1,2}|{c,a,b}|{3.3,1.1,2.2}|{"3 years","1 year","2 years"}
+{4,1,2}|{d,a,b}|{4.4,1.1,2.2}|{"4 years","1 year","2 years"}
+{5,NULL,NULL}|{e,NULL,b}|{5.5,1.1,NULL}|{"5 years",NULL,NULL}
+1|a
+2|b
+3|c
+4|d
+5|
+a|{b,c}
+b|{c,a}
+c|{b,a}
+d|{c,b}
+e|{d,NULL}
+1|(1,a,1)
+2|(2,b,2)
+3|(3,c,3)
+4|(4,d,4)
+5|(,,5)
+(1,a,1)|{"(1,a,1)"}
+(2,b,2)|{"(2,b,2)"}
+(3,c,3)|{"(3,c,3)"}
+(4,d,4)|{"(4,d,3)"}
+(5,e,)|{NULL,"(5,,5)"}
+1|(1,a,1)
+2|(2,b,2)
+3|(3,c,3)
+4|(4,d,4)
+5|(,e,)
+(1,a,1)|{"(1,a,1)"}
+(2,b,2)|{"(2,b,2)"}
+(3,c,3)|{"(3,c,3)"}
+(4,d,3)|{"(3,d,3)"}
+(5,e,3)|{"(3,e,3)",NULL}
+1|(1,"{a,b,c}",1)
+2|(2,"{a,b,c}",2)
+3|(3,"{a,b,c}",3)
+4|(4,"{c,b,d}",4)
+5|(5,"{NULL,e,NULL}",5)
+(1,"{a,b,c}",1)|{"(1,\"{a,b,c}\",1)"}
+(2,"{b,c,a}",2)|{"(2,\"{b,c,a}\",1)"}
+(3,"{c,a,b}",1)|{"(3,\"{c,a,b}\",1)"}
+(4,"{c,b,d}",4)|{"(4,\"{c,b,d}\",4)"}
+(5,"{c,NULL,b}",)|{"(5,\"{c,e,b}\",1)"}
+("(1,a,1)","{""(1,a,1)"",""(2,b,2)""}",a,"{a,b,NULL,c}")|{"(\"(1,a,1)\",\"{\"\"(1,a,1)\"\",\"\"(2,b,2)\"\",NULL}\",a,\"{a,b,c}\")"}
+1|[1,11)
+2|[2,21)
+3|[3,31)
+4|[4,41)
+5|[5,51)
+1|["2014-08-04 00:00:00+02",infinity)|{"[1,3)","[10,21)"}
+2|["2014-08-02 00:00:00+02","2014-08-04 00:00:00+02")|{"[2,4)","[20,31)"}
+3|["2014-08-01 00:00:00+02","2014-08-04 00:00:00+02")|{"[3,5)"}
+4|["2014-07-31 00:00:00+02","2014-08-04 00:00:00+02")|{"[4,6)",NULL,"[40,51)"}
+5||',
+'check replicated inserts on subscriber');
+
+# Run batch of updates
+$node_provider->safe_psql('postgres', qq(
+ UPDATE tst_one_array SET b = '{4, 5, 6}' WHERE a = 1;
+ UPDATE tst_one_array SET b = '{4, 5, 6, 1}' WHERE a > 3;
+ UPDATE tst_arrays SET b = '{"1a", "2b", "3c"}', c = '{1.0, 2.0, 3.0}', d = '{"1 day 1 second", "2 days 2 seconds", "3 days 3 second"}' WHERE a = '{1, 2, 3}';
+ UPDATE tst_arrays SET b = '{"c", "d", "e"}', c = '{3.0, 4.0, 5.0}', d = '{"3 day 1 second", "4 days 2 seconds", "5 days 3 second"}' WHERE a[1] > 3;
+ UPDATE tst_one_enum SET b = 'c' WHERE a = 1;
+ UPDATE tst_one_enum SET b = NULL WHERE a > 3;
+ UPDATE tst_enums SET b = '{e, NULL}' WHERE a = 'a';
+ UPDATE tst_enums SET b = '{e, d}' WHERE a > 'c';
+ UPDATE tst_one_comp SET b = ROW(1.0, 'A', 1) WHERE a = 1;
+ UPDATE tst_one_comp SET b = ROW(NULL, 'x', -1) WHERE a > 3;
+ UPDATE tst_comps SET b = ARRAY[ROW(9, 'x', -1)::tst_comp_basic_t] WHERE (a).a = 1.0;
+ UPDATE tst_comps SET b = ARRAY[NULL, ROW(9, 'x', NULL)::tst_comp_basic_t] WHERE (a).a > 3.9;
+ UPDATE tst_comp_enum SET b = ROW(1.0, NULL, NULL) WHERE a = 1;
+ UPDATE tst_comp_enum SET b = ROW(4.0, 'd', 44) WHERE a > 3;
+ UPDATE tst_comp_enum_array SET b = ARRAY[NULL, ROW(3, 'd', 3)::tst_comp_enum_t] WHERE a = ROW(1.0, 'a', 1)::tst_comp_enum_t;
+ UPDATE tst_comp_enum_array SET b = ARRAY[ROW(1, 'a', 1)::tst_comp_enum_t, ROW(2, 'b', 2)::tst_comp_enum_t] WHERE (a).a > 3;
+ UPDATE tst_comp_one_enum_array SET b = ROW(1.0, '{a, e, c}', NULL) WHERE a = 1;
+ UPDATE tst_comp_one_enum_array SET b = ROW(4.0, '{c, b, d}', 4) WHERE a > 3;
+ UPDATE tst_comp_enum_what SET b = ARRAY[NULL, ROW(1, '{a, b, c}', 1)::tst_comp_enum_array_t, ROW(NULL, '{a, e, c}', 2)::tst_comp_enum_array_t] WHERE (a).a = 1;
+ UPDATE tst_comp_enum_what SET b = ARRAY[ROW(5, '{a, b, c}', 5)::tst_comp_enum_array_t] WHERE (a).a > 3;
+ UPDATE tst_comp_mix_array SET b[2] = NULL WHERE ((a).a).a = 1;
+ UPDATE tst_range SET b = '[100, 1000]' WHERE a = 1;
+ UPDATE tst_range SET b = '(1, 90)' WHERE a > 3;
+ UPDATE tst_range_array SET c = '{"[100, 1000]"}' WHERE a = 1;
+ UPDATE tst_range_array SET b = tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz, 'infinity'), c = '{NULL, "[11,9999999]"}' WHERE a > 3;
+));
+
+$node_provider->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Check the data on subscriber
+$result = $node_subscriber->safe_psql('postgres', qq(
+ SELECT a, b FROM tst_one_array ORDER BY a;
+ SELECT a, b, c, d FROM tst_arrays ORDER BY a;
+ SELECT a, b FROM tst_one_enum ORDER BY a;
+ SELECT a, b FROM tst_enums ORDER BY a;
+ SELECT a, b FROM tst_one_comp ORDER BY a;
+ SELECT a, b FROM tst_comps ORDER BY a;
+ SELECT a, b FROM tst_comp_enum ORDER BY a;
+ SELECT a, b FROM tst_comp_enum_array ORDER BY a;
+ SELECT a, b FROM tst_comp_one_enum_array ORDER BY a;
+ SELECT a, b FROM tst_comp_enum_what ORDER BY a;
+ SELECT a, b FROM tst_comp_mix_array ORDER BY a;
+ SELECT a, b FROM tst_range ORDER BY a;
+ SELECT a, b, c FROM tst_range_array ORDER BY a;
+));
+
+is($result, '1|{4,5,6}
+2|{2,3,1}
+3|{3,2,1}
+4|{4,5,6,1}
+5|{4,5,6,1}
+{1,2,3}|{1a,2b,3c}|{1,2,3}|{"1 day 00:00:01","2 days 00:00:02","3 days 00:00:03"}
+{2,3,1}|{b,c,a}|{2.2,3.3,1.1}|{00:02:00,00:03:00,00:01:00}
+{3,1,2}|{c,a,b}|{3.3,1.1,2.2}|{"3 years","1 year","2 years"}
+{4,1,2}|{c,d,e}|{3,4,5}|{"3 days 00:00:01","4 days 00:00:02","5 days 00:00:03"}
+{5,NULL,NULL}|{c,d,e}|{3,4,5}|{"3 days 00:00:01","4 days 00:00:02","5 days 00:00:03"}
+1|c
+2|b
+3|c
+4|
+5|
+a|{e,NULL}
+b|{c,a}
+c|{b,a}
+d|{e,d}
+e|{e,d}
+1|(1,A,1)
+2|(2,b,2)
+3|(3,c,3)
+4|(,x,-1)
+5|(,x,-1)
+(1,a,1)|{"(9,x,-1)"}
+(2,b,2)|{"(2,b,2)"}
+(3,c,3)|{"(3,c,3)"}
+(4,d,4)|{NULL,"(9,x,)"}
+(5,e,)|{NULL,"(9,x,)"}
+1|(1,,)
+2|(2,b,2)
+3|(3,c,3)
+4|(4,d,44)
+5|(4,d,44)
+(1,a,1)|{NULL,"(3,d,3)"}
+(2,b,2)|{"(2,b,2)"}
+(3,c,3)|{"(3,c,3)"}
+(4,d,3)|{"(1,a,1)","(2,b,2)"}
+(5,e,3)|{"(1,a,1)","(2,b,2)"}
+1|(1,"{a,e,c}",)
+2|(2,"{a,b,c}",2)
+3|(3,"{a,b,c}",3)
+4|(4,"{c,b,d}",4)
+5|(4,"{c,b,d}",4)
+(1,"{a,b,c}",1)|{NULL,"(1,\"{a,b,c}\",1)","(,\"{a,e,c}\",2)"}
+(2,"{b,c,a}",2)|{"(2,\"{b,c,a}\",1)"}
+(3,"{c,a,b}",1)|{"(3,\"{c,a,b}\",1)"}
+(4,"{c,b,d}",4)|{"(5,\"{a,b,c}\",5)"}
+(5,"{c,NULL,b}",)|{"(5,\"{a,b,c}\",5)"}
+("(1,a,1)","{""(1,a,1)"",""(2,b,2)""}",a,"{a,b,NULL,c}")|{"(\"(1,a,1)\",\"{\"\"(1,a,1)\"\",\"\"(2,b,2)\"\",NULL}\",a,\"{a,b,c}\")",NULL}
+1|[100,1001)
+2|[2,21)
+3|[3,31)
+4|[2,90)
+5|[2,90)
+1|["2014-08-04 00:00:00+02",infinity)|{"[100,1001)"}
+2|["2014-08-02 00:00:00+02","2014-08-04 00:00:00+02")|{"[2,4)","[20,31)"}
+3|["2014-08-01 00:00:00+02","2014-08-04 00:00:00+02")|{"[3,5)"}
+4|["2014-08-04 00:00:00+02",infinity)|{NULL,"[11,10000000)"}
+5|["2014-08-04 00:00:00+02",infinity)|{NULL,"[11,10000000)"}',
+'check replicated updates on subscriber');
+
+# Run batch of deletes
+$node_provider->safe_psql('postgres', qq(
+ DELETE FROM tst_one_array WHERE a = 1;
+ DELETE FROM tst_one_array WHERE b = '{2, 3, 1}';
+ DELETE FROM tst_arrays WHERE a = '{1, 2, 3}';
+ DELETE FROM tst_arrays WHERE a[1] = 2;
+ DELETE FROM tst_one_enum WHERE a = 1;
+ DELETE FROM tst_one_enum WHERE b = 'b';
+ DELETE FROM tst_enums WHERE a = 'a';
+ DELETE FROM tst_enums WHERE b[1] = 'b';
+ DELETE FROM tst_one_comp WHERE a = 1;
+ DELETE FROM tst_one_comp WHERE (b).a = 2.0;
+ DELETE FROM tst_comps WHERE (a).b = 'a';
+ DELETE FROM tst_comps WHERE ROW(3, 'c', 3)::tst_comp_basic_t = ANY(b);
+ DELETE FROM tst_comp_enum WHERE a = 1;
+ DELETE FROM tst_comp_enum WHERE (b).a = 2.0;
+ DELETE FROM tst_comp_enum_array WHERE a = ROW(1.0, 'a', 1)::tst_comp_enum_t;
+ DELETE FROM tst_comp_enum_array WHERE ROW(3, 'c', 3)::tst_comp_enum_t = ANY(b);
+ DELETE FROM tst_comp_one_enum_array WHERE a = 1;
+ DELETE FROM tst_comp_one_enum_array WHERE 'a' = ANY((b).b);
+ DELETE FROM tst_comp_enum_what WHERE (a).a = 1;
+ DELETE FROM tst_comp_enum_what WHERE (b[1]).b = '{c, a, b}';
+ DELETE FROM tst_comp_mix_array WHERE ((a).a).a = 1;
+ DELETE FROM tst_range WHERE a = 1;
+ DELETE FROM tst_range WHERE '[10,20]' && b;
+ DELETE FROM tst_range_array WHERE a = 1;
+ DELETE FROM tst_range_array WHERE tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz, 'Mon Aug 05 00:00:00 2014 CEST'::timestamptz) && b;
+));
+
+$node_provider->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Check the data on subscriber
+$result = $node_subscriber->safe_psql('postgres', qq(
+ SELECT a, b FROM tst_one_array ORDER BY a;
+ SELECT a, b, c, d FROM tst_arrays ORDER BY a;
+ SELECT a, b FROM tst_one_enum ORDER BY a;
+ SELECT a, b FROM tst_enums ORDER BY a;
+ SELECT a, b FROM tst_one_comp ORDER BY a;
+ SELECT a, b FROM tst_comps ORDER BY a;
+ SELECT a, b FROM tst_comp_enum ORDER BY a;
+ SELECT a, b FROM tst_comp_enum_array ORDER BY a;
+ SELECT a, b FROM tst_comp_one_enum_array ORDER BY a;
+ SELECT a, b FROM tst_comp_enum_what ORDER BY a;
+ SELECT a, b FROM tst_comp_mix_array ORDER BY a;
+ SELECT a, b FROM tst_range ORDER BY a;
+ SELECT a, b, c FROM tst_range_array ORDER BY a;
+));
+
+is($result, '3|{3,2,1}
+4|{4,5,6,1}
+5|{4,5,6,1}
+{3,1,2}|{c,a,b}|{3.3,1.1,2.2}|{"3 years","1 year","2 years"}
+{4,1,2}|{c,d,e}|{3,4,5}|{"3 days 00:00:01","4 days 00:00:02","5 days 00:00:03"}
+{5,NULL,NULL}|{c,d,e}|{3,4,5}|{"3 days 00:00:01","4 days 00:00:02","5 days 00:00:03"}
+3|c
+4|
+5|
+b|{c,a}
+d|{e,d}
+e|{e,d}
+3|(3,c,3)
+4|(,x,-1)
+5|(,x,-1)
+(2,b,2)|{"(2,b,2)"}
+(4,d,4)|{NULL,"(9,x,)"}
+(5,e,)|{NULL,"(9,x,)"}
+3|(3,c,3)
+4|(4,d,44)
+5|(4,d,44)
+(2,b,2)|{"(2,b,2)"}
+(4,d,3)|{"(1,a,1)","(2,b,2)"}
+(5,e,3)|{"(1,a,1)","(2,b,2)"}
+4|(4,"{c,b,d}",4)
+5|(4,"{c,b,d}",4)
+(2,"{b,c,a}",2)|{"(2,\"{b,c,a}\",1)"}
+(4,"{c,b,d}",4)|{"(5,\"{a,b,c}\",5)"}
+(5,"{c,NULL,b}",)|{"(5,\"{a,b,c}\",5)"}
+2|["2014-08-02 00:00:00+02","2014-08-04 00:00:00+02")|{"[2,4)","[20,31)"}
+3|["2014-08-01 00:00:00+02","2014-08-04 00:00:00+02")|{"[3,5)"}',
+'check replicated deletes on subscriber');
+
+$node_subscriber->stop('fast');
+$node_provider->stop('fast');
--
2.7.4
0006-Logical-replication-support-for-initial-data-copy.patchapplication/x-patch; name=0006-Logical-replication-support-for-initial-data-copy.patchDownload
From 2b2bc8c5d670c765ed3eb771dee266f6b9b3f3c2 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Tue, 19 Jul 2016 01:55:25 +0200
Subject: [PATCH 6/6] Logical replication support for initial data copy
---
src/backend/catalog/Makefile | 2 +-
src/backend/commands/subscriptioncmds.c | 34 +-
.../libpqwalreceiver/libpqwalreceiver.c | 126 +++-
src/backend/replication/logical/Makefile | 2 +-
src/backend/replication/logical/apply.c | 173 +++++-
src/backend/replication/logical/launcher.c | 92 +--
src/backend/replication/logical/logical.c | 190 +++++-
src/backend/replication/logical/proto.c | 63 +-
src/backend/replication/logical/publication.c | 40 ++
src/backend/replication/logical/snapbuild.c | 5 +-
src/backend/replication/logical/subscription.c | 180 ++++++
src/backend/replication/logical/tablesync.c | 672 +++++++++++++++++++++
src/backend/replication/pgoutput/pgoutput.c | 108 ++++
src/backend/replication/repl_gram.y | 31 +-
src/backend/replication/repl_scanner.l | 3 +
src/backend/replication/walsender.c | 199 +++++-
src/backend/utils/cache/syscache.c | 23 +
src/include/catalog/indexing.h | 6 +
src/include/catalog/pg_subscription_rel.h | 61 ++
src/include/commands/replicationcmds.h | 1 +
src/include/nodes/nodes.h | 4 +-
src/include/nodes/replnodes.h | 23 +
src/include/replication/logical.h | 22 +-
src/include/replication/logicalproto.h | 3 +-
src/include/replication/logicalworker.h | 9 +-
src/include/replication/output_plugin.h | 18 +
src/include/replication/publication.h | 1 +
src/include/replication/subscription.h | 6 +
src/include/replication/walreceiver.h | 9 +
src/include/replication/worker_internal.h | 32 +
src/include/utils/syscache.h | 2 +
src/test/Makefile | 2 +-
src/test/README | 3 +
src/test/regress/expected/sanity_check.out | 1 +
src/test/subscription/t/001_rep_changes.pl | 18 +-
src/test/subscription/t/002_types.pl | 8 +-
36 files changed, 2043 insertions(+), 129 deletions(-)
create mode 100644 src/backend/replication/logical/tablesync.c
create mode 100644 src/include/catalog/pg_subscription_rel.h
create mode 100644 src/include/replication/worker_internal.h
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 60737d4..22f78f8 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -43,7 +43,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_default_acl.h pg_init_privs.h pg_seclabel.h pg_shseclabel.h \
pg_collation.h pg_range.h pg_transform.h \
pg_publication.h pg_publication_rel.h pg_subscription.h \
- toasting.h indexing.h \
+ pg_subscription_rel.h toasting.h indexing.h \
)
# location of Catalog.pm
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 43e2853..bfef492 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -22,11 +22,13 @@
#include "access/htup_details.h"
#include "access/xact.h"
+#include "catalog/dependency.h"
#include "catalog/indexing.h"
#include "catalog/namespace.h"
#include "catalog/objectaddress.h"
#include "catalog/pg_type.h"
#include "catalog/pg_subscription.h"
+#include "catalog/pg_subscription_rel.h"
#include "commands/defrem.h"
#include "commands/replicationcmds.h"
@@ -40,9 +42,12 @@
#include "replication/logicalworker.h"
#include "replication/origin.h"
#include "replication/reorderbuffer.h"
-#include "replication/logicalworker.h"
+#include "replication/subscription.h"
#include "replication/walreceiver.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/catcache.h"
@@ -172,10 +177,13 @@ CreateSubscription(CreateSubscriptionStmt *stmt)
char *conninfo;
char *slotname;
List *publications;
+ char *options;
WalReceiverConnHandle *wrchandle = NULL;
WalReceiverConnAPI *wrcapi = NULL;
walrcvconn_init_fn walrcvconn_init;
XLogRecPtr lsn;
+ List *tables;
+ ListCell *lc;
check_replication_permissions();
@@ -248,6 +256,12 @@ CreateSubscription(CreateSubscriptionStmt *stmt)
wrcapi->create_slot == NULL)
elog(ERROR, "libpqwalreceiver didn't initialize correctly");
+ /*
+ * Create the replication slot on remote side for our newly created
+ * subscription.
+ *
+ * TODO: ensure drop of the slot on subsequent failure/rollback?
+ */
wrcapi->connect(wrchandle, conninfo, true, stmt->subname);
wrcapi->create_slot(wrchandle, slotname, true, &lsn);
ereport(NOTICE,
@@ -259,6 +273,24 @@ CreateSubscription(CreateSubscriptionStmt *stmt)
*/
replorigin_create(slotname);
+ /* Build option string for the plugin. */
+ options = logicalrep_build_options(publications);
+
+ /* Get the table list from provider and build local table status info. */
+ tables = wrcapi->list_tables(wrchandle, slotname, options);
+ foreach (lc, tables)
+ {
+ LogicalRepTableListEntry *entry = lfirst(lc);
+ Oid nspid = LookupExplicitNamespace(entry->nspname, false);
+ Oid relid = get_relname_relid(entry->relname, nspid);
+
+ SetSubscriptionRelState(subid, relid, SUBREL_STATE_INIT,
+ InvalidXLogRecPtr);
+ }
+
+ ereport(NOTICE,
+ (errmsg("synchronized table states")));
+
/* And we are done with the remote side. */
wrcapi->disconnect(wrchandle);
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 4c4d441..94648c7 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -23,6 +23,7 @@
#include "pqexpbuffer.h"
#include "access/xlog.h"
#include "miscadmin.h"
+#include "replication/logical.h"
#include "replication/walreceiver.h"
#include "utils/builtins.h"
#include "utils/pg_lsn.h"
@@ -73,6 +74,11 @@ static int libpqrcv_receive(WalReceiverConnHandle *handle, char **buffer,
pgsocket *wait_fd);
static void libpqrcv_send(WalReceiverConnHandle *handle, const char *buffer,
int nbytes);
+static List *libpqrcv_list_tables(WalReceiverConnHandle *handle,
+ char *slotname, char *options);
+static bool libpqrcv_copy_table(WalReceiverConnHandle *handle,
+ char *slotname, char *nspname,
+ char *relname, char *options);
static void libpqrcv_disconnect(WalReceiverConnHandle *handle);
/* Prototypes for private functions */
@@ -104,6 +110,8 @@ _PG_walreceirver_conn_init(WalReceiverConnAPI *wrcapi)
wrcapi->receive = libpqrcv_receive;
wrcapi->send = libpqrcv_send;
wrcapi->disconnect = libpqrcv_disconnect;
+ wrcapi->copy_table = libpqrcv_copy_table;
+ wrcapi->list_tables = libpqrcv_list_tables;
return handle;
}
@@ -416,15 +424,15 @@ libpqrcv_endstreaming(WalReceiverConnHandle *handle, TimeLineID *next_tli)
(errmsg("could not send end-of-streaming message to primary: %s",
PQerrorMessage(handle->streamConn))));
+ *next_tli = 0;
+
/*
* After COPY is finished, we should receive a result set indicating the
* next timeline's ID, or just CommandComplete if the server was shut
* down.
*
- * If we had not yet received CopyDone from the backend, PGRES_COPY_IN
- * would also be possible. However, at the moment this function is only
- * called after receiving CopyDone from the backend - the walreceiver
- * never terminates replication on its own initiative.
+ * If we had not yet received CopyDone from the backend, PGRES_COPY_OUT
+ * is also possible in case we aborted the copy in mid-stream.
*/
res = PQgetResult(handle->streamConn);
if (PQresultStatus(res) == PGRES_TUPLES_OK)
@@ -442,8 +450,16 @@ libpqrcv_endstreaming(WalReceiverConnHandle *handle, TimeLineID *next_tli)
/* the result set should be followed by CommandComplete */
res = PQgetResult(handle->streamConn);
}
- else
- *next_tli = 0;
+ else if (PQresultStatus(res) == PGRES_COPY_OUT)
+ {
+ PQclear(res);
+
+ /* End the copy */
+ PQendcopy(handle->streamConn);
+
+ /* CommandComplete should follow */
+ res = PQgetResult(handle->streamConn);
+ }
if (PQresultStatus(res) != PGRES_COMMAND_OK)
ereport(ERROR,
@@ -642,6 +658,104 @@ libpqrcv_PQexec(WalReceiverConnHandle *handle, const char *query)
}
/*
+ * Run the LIST_TABLES command which will send list of the tables to copy
+ * in whatever format the plugin choses.
+ */
+static List *
+libpqrcv_list_tables(WalReceiverConnHandle *handle, char *slotname,
+ char *options)
+{
+ StringInfoData cmd;
+ PGresult *res;
+ int i;
+ List *tablelist = NIL;
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "LIST_TABLES SLOT \"%s\"",
+ slotname);
+
+ /* Add options */
+ if (options)
+ appendStringInfo(&cmd, "( %s )", options);
+
+ res = libpqrcv_PQexec(handle, cmd.data);
+ pfree(cmd.data);
+
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ PQclear(res);
+ ereport(ERROR,
+ (errmsg("could not receive list of replicated tables from the provider: %s",
+ PQerrorMessage(handle->streamConn))));
+ }
+ if (PQnfields(res) != 3)
+ {
+ int nfields = PQnfields(res);
+ PQclear(res);
+ ereport(ERROR,
+ (errmsg("invalid response from provider"),
+ errdetail("Expected 3 fields, got %d fields.", nfields)));
+ }
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ LogicalRepTableListEntry *entry;
+
+ entry = palloc(sizeof(LogicalRepTableListEntry));
+ entry->nspname = pstrdup(PQgetvalue(res, i, 0));
+ entry->relname = pstrdup(PQgetvalue(res, i, 1));
+ if (!PQgetisnull(res, i, 2))
+ entry->info = pstrdup(PQgetvalue(res, i, 2));
+ else
+ entry->info = NULL;
+
+ tablelist = lappend(tablelist, entry);
+ }
+
+ PQclear(res);
+
+ return tablelist;
+}
+
+/*
+ * Run the COPY_TABLE command which will start streaming the existing data
+ * in the table.
+ */
+static bool
+libpqrcv_copy_table(WalReceiverConnHandle *handle, char *slotname,
+ char *nspname, char *relname, char *options)
+{
+ StringInfoData cmd;
+ PGresult *res;
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "COPY_TABLE SLOT \"%s\" TABLE \"%s\" \"%s\"",
+ slotname, nspname, relname);
+
+ /* Add options */
+ if (options)
+ appendStringInfo(&cmd, "( %s )", options);
+
+ res = libpqrcv_PQexec(handle, cmd.data);
+
+ if (PQresultStatus(res) == PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ return false;
+ }
+ else if (PQresultStatus(res) != PGRES_COPY_BOTH)
+ {
+ PQclear(res);
+ ereport(ERROR,
+ (errmsg("could not start initial table contents streaming: %s",
+ PQerrorMessage(handle->streamConn))));
+ }
+ PQclear(res);
+ pfree(cmd.data);
+ return true;
+}
+
+/*
* Disconnect connection to primary, if any.
*/
static void
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index ab6e11e..a1e1d81 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -16,6 +16,6 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
OBJS = apply.o decode.o launcher.o logical.o logicalfuncs.o message.o \
origin.o proto.o publication.o reorderbuffer.o snapbuild.o \
- subscription.o
+ subscription.o tablesync.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/apply.c b/src/backend/replication/logical/apply.c
index eb7af19..809b77b 100644
--- a/src/backend/replication/logical/apply.c
+++ b/src/backend/replication/logical/apply.c
@@ -24,6 +24,7 @@
#include "access/xlog_internal.h"
#include "catalog/namespace.h"
+#include "catalog/pg_subscription_rel.h"
#include "commands/trigger.h"
@@ -51,6 +52,7 @@
#include "replication/snapbuild.h"
#include "replication/subscription.h"
#include "replication/walreceiver.h"
+#include "replication/worker_internal.h"
#include "rewrite/rewriteHandler.h"
@@ -62,6 +64,7 @@
#include "utils/builtins.h"
#include "utils/catcache.h"
+#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/inval.h"
#include "utils/lsyscache.h"
@@ -72,17 +75,27 @@
typedef struct FlushPosition
{
- dlist_node node;
- XLogRecPtr local_end;
- XLogRecPtr remote_end;
+ dlist_node node;
+ XLogRecPtr local_end;
+ XLogRecPtr remote_end;
} FlushPosition;
static dlist_head lsn_mapping = DLIST_STATIC_INIT(lsn_mapping);
+
+typedef struct TableState
+{
+ dlist_node node;
+ Oid relid;
+ XLogRecPtr lsn;
+ char state;
+} TableState;
+
+static dlist_head table_states = DLIST_STATIC_INIT(table_states);
+static XLogRecPtr last_commit_lsn;
+
static MemoryContext ApplyContext;
-static bool in_remote_transaction = false;
-static Subscription *MySubscription = NULL;
static bool got_SIGTERM = false;
typedef struct LogicalRepRelMapEntry {
@@ -95,13 +108,19 @@ typedef struct LogicalRepRelMapEntry {
* local ones */
AttInMetadata *attin; /* cached info used in type
* conversion */
+ char state;
} LogicalRepRelMapEntry;
static HTAB *LogicalRepRelMap = NULL;
-/* filled by libpqreceiver when loaded */
-static WalReceiverConnAPI *wrcapi = NULL;
-static WalReceiverConnHandle *wrchandle = NULL;
+WalReceiverConnAPI *wrcapi = NULL;
+WalReceiverConnHandle *wrchandle = NULL;
+
+LogicalRepWorker *MyLogicalRepWorker = NULL;
+Subscription *MySubscription = NULL;
+
+static char *myslotname = NULL;
+bool in_remote_transaction = false;
static void send_feedback(XLogRecPtr recvpos, int64 now, bool force);
void pglogical_apply_main(Datum main_arg);
@@ -109,6 +128,19 @@ void pglogical_apply_main(Datum main_arg);
static bool tuple_find_by_replidx(Relation rel, LockTupleMode lockmode,
TupleTableSlot *searchslot, TupleTableSlot *slot);
+/*
+ * Should this worker apply changes for given relation.
+ *
+ * This is mainly needed for initial relation data sync as that runs in
+ * parallel worker and we need some way to skip changes coming to the main
+ * apply worker during the sync of a table.
+ */
+static bool
+interesting_relation(LogicalRepRelMapEntry *rel)
+{
+ return rel->state == SUBREL_STATE_READY ||
+ rel->reloid == MyLogicalRepWorker->relid;
+}
/*
@@ -266,12 +298,16 @@ tupdesc_get_att_by_name(TupleDesc desc, const char *attname)
/*
* Open the local relation associated with the remote one.
+ *
+ * Optionally rebuilds the Relcache mapping if it was invalidated
+ * by local DDL.
*/
static LogicalRepRelMapEntry *
logicalreprel_open(uint32 remoteid, LOCKMODE lockmode)
{
LogicalRepRelMapEntry *entry;
bool found;
+ XLogRecPtr lsn;
if (LogicalRepRelMap == NULL)
remoterelmap_init();
@@ -309,6 +345,10 @@ logicalreprel_open(uint32 remoteid, LOCKMODE lockmode)
else
entry->rel = heap_open(entry->reloid, lockmode);
+ /* TODO cache this */
+ entry->state = GetSubscriptionRelState(MySubscription->oid,
+ entry->reloid, &lsn, true);
+
return entry;
}
@@ -322,7 +362,6 @@ logicalreprel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
rel->rel = NULL;
}
-
/*
* Make sure that we started local transaction.
*
@@ -599,6 +638,9 @@ handle_commit(StringInfo s)
in_remote_transaction = false;
+ last_commit_lsn = end_lsn;
+ process_syncing_tables(myslotname, end_lsn);
+
pgstat_report_activity(STATE_IDLE, NULL);
}
@@ -651,6 +693,15 @@ handle_insert(StringInfo s)
relid = logicalrep_read_insert(s, &newtup);
rel = logicalreprel_open(relid, RowExclusiveLock);
+ if (!interesting_relation(rel))
+ {
+ /*
+ * The relation can't become interestin in the middle of the
+ * transaction so it's safe to unlock it.
+ */
+ logicalreprel_close(rel, RowExclusiveLock);
+ return;
+ }
/* Initialize the executor state. */
estate = create_estate_for_relation(rel);
@@ -710,6 +761,15 @@ handle_update(StringInfo s)
relid = logicalrep_read_update(s, &hasoldtup, &oldtup,
&newtup);
rel = logicalreprel_open(relid, RowExclusiveLock);
+ if (!interesting_relation(rel))
+ {
+ /*
+ * The relation can't become interestin in the middle of the
+ * transaction so it's safe to unlock it.
+ */
+ logicalreprel_close(rel, RowExclusiveLock);
+ return;
+ }
/* Initialize the executor state. */
estate = create_estate_for_relation(rel);
@@ -796,6 +856,15 @@ handle_delete(StringInfo s)
relid = logicalrep_read_delete(s, &oldtup);
rel = logicalreprel_open(relid, RowExclusiveLock);
+ if (!interesting_relation(rel))
+ {
+ /*
+ * The relation can't become interestin in the middle of the
+ * transaction so it's safe to unlock it.
+ */
+ logicalreprel_close(rel, RowExclusiveLock);
+ return;
+ }
/* Initialize the executor state. */
estate = create_estate_for_relation(rel);
@@ -942,11 +1011,9 @@ get_flush_position(XLogRecPtr *write, XLogRecPtr *flush)
/*
* Apply main loop.
*/
-static void
-ApplyLoop(void)
+void
+LogicalRepApplyLoop(XLogRecPtr last_received)
{
- XLogRecPtr last_received = InvalidXLogRecPtr;
-
/* Init the ApplyContext which we use for easier cleanup. */
ApplyContext = AllocSetContextCreate(TopMemoryContext,
"ApplyContext",
@@ -1029,6 +1096,9 @@ ApplyLoop(void)
/* timestamp = */ pq_getmsgint64(&s);
reply_requested = pq_getmsgbyte(&s);
+ if (last_received < endpos)
+ last_received = endpos;
+
send_feedback(endpos,
GetCurrentTimestamp(),
reply_requested);
@@ -1040,6 +1110,18 @@ ApplyLoop(void)
}
}
+ if (!in_remote_transaction)
+ {
+ /*
+ * If we didn't get any transactions for a while there might be
+ * unconsumer invalidation messages in the queue, consume them now.
+ */
+ AcceptInvalidationMessages();
+
+ /* Process any table synchronization changes. */
+ process_syncing_tables(myslotname, last_received);
+ }
+
/* confirm all writes at once */
send_feedback(last_received, GetCurrentTimestamp(), false);
@@ -1049,7 +1131,11 @@ ApplyLoop(void)
/* Check if we need to exit the streaming loop. */
if (endofstream)
+ {
+ TimeLineID tli;
+ wrcapi->endstreaming(wrchandle, &tli);
break;
+ }
/*
* Wait for more data or latch.
@@ -1156,7 +1242,6 @@ ApplyWorkerMain(Datum main_arg)
{
int worker_slot = DatumGetObjectId(main_arg);
MemoryContext oldctx;
- RepOriginId originid;
XLogRecPtr origin_startpos;
char *options;
walrcvconn_init_fn walrcvconn_init;
@@ -1207,41 +1292,65 @@ ApplyWorkerMain(Datum main_arg)
BackgroundWorkerInitializeConnectionByOid(MyLogicalRepWorker->dbid,
InvalidOid);
- StartTransactionCommand();
-
/* Load the subscription into persistent memory context. */
+ StartTransactionCommand();
oldctx = MemoryContextSwitchTo(CacheMemoryContext);
MySubscription = GetSubscription(MyLogicalRepWorker->subid);
MemoryContextSwitchTo(oldctx);
- elog(LOG, "logical replication apply for subscription %s started",
- MySubscription->name);
-
- /* Setup replication origin tracking. */
- originid = replorigin_by_name(MySubscription->slotname, true);
- if (!OidIsValid(originid))
- originid = replorigin_create(MySubscription->slotname);
- replorigin_session_setup(originid);
- replorigin_session_origin = originid;
- origin_startpos = replorigin_session_get_progress(false);
-
+ if (OidIsValid(MyLogicalRepWorker->relid))
+ elog(LOG, "logical replication sync for subscription %s, table %s started",
+ MySubscription->name, get_rel_name(MyLogicalRepWorker->relid));
+ else
+ elog(LOG, "logical replication apply for subscription %s started",
+ MySubscription->name);
CommitTransactionCommand();
/* Connect to the origin and start the replication. */
elog(DEBUG1, "connecting to provider using connection string %s",
MySubscription->conninfo);
- wrcapi->connect(wrchandle, MySubscription->conninfo, true,
- MySubscription->name);
/* Build option string for the plugin. */
options = logicalrep_build_options(MySubscription->publications);
- /* Start streaming from the slot. */
+ if (OidIsValid(MyLogicalRepWorker->relid))
+ {
+ /* This is table synchroniation worker, call initial sync. */
+ myslotname = LogicalRepSyncTableStart(&origin_startpos);
+ }
+ else
+ {
+ /* This is main apply worker */
+ RepOriginId originid;
+
+ myslotname = MySubscription->slotname;
+
+ StartTransactionCommand();
+ originid = replorigin_by_name(myslotname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ CommitTransactionCommand();
+
+ wrcapi->connect(wrchandle, MySubscription->conninfo, true,
+ myslotname);
+ }
+
+ /*
+ * Setup callback for syscache so that we know when something
+ * changes in the subscription relation state.
+ */
+ CacheRegisterSyscacheCallback(SUBSCRIPTIONRELOID,
+ invalidate_syncing_table_states,
+ (Datum) 0);
+
+ /* Start normal logical streaming replication. */
wrcapi->startstreaming_logical(wrchandle, origin_startpos,
- MySubscription->slotname, options);
+ myslotname, options);
+
+ pfree(options);
/* Run the main loop. */
- ApplyLoop();
+ LogicalRepApplyLoop(origin_startpos);
wrcapi->disconnect(wrchandle);
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 385260e..927af69 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -30,6 +30,7 @@
#include "replication/logicalworker.h"
#include "replication/subscription.h"
+#include "replication/worker_internal.h"
#include "storage/ipc.h"
#include "storage/proc.h"
@@ -44,7 +45,6 @@
#include "utils/snapmgr.h"
int max_logical_replication_workers = 4;
-LogicalRepWorker *MyLogicalRepWorker = NULL;
typedef struct LogicalRepCtxStruct
{
@@ -57,8 +57,6 @@ typedef struct LogicalRepCtxStruct
LogicalRepCtxStruct *LogicalRepCtx;
-static LogicalRepWorker *logicalrep_worker_find(Oid subid);
-static void logicalrep_worker_launch(Oid dbid, Oid subid);
static void logicalrep_worker_stop(LogicalRepWorker *worker);
static void logicalrep_worker_onexit(int code, Datum arg);
static void logicalrep_worker_detach(void);
@@ -183,29 +181,50 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker,
/*
* Walks the workers array and searches for one that matches given
- * subscription id.
+ * subscription id and relid.
*/
-static LogicalRepWorker *
-logicalrep_worker_find(Oid subid)
+LogicalRepWorker *
+logicalrep_worker_find(Oid subid, Oid relid)
{
int i;
LogicalRepWorker *res = NULL;
- /* Block concurrent modification. */
- LWLockAcquire(LogicalRepLauncherLock, LW_SHARED);
+ Assert(LWLockHeldByMe(LogicalRepLauncherLock));
/* Search for attached worker for a given subscription id. */
for (i = 0; i < max_logical_replication_workers; i++)
{
LogicalRepWorker *w = &LogicalRepCtx->workers[i];
- if (w->subid == subid && w->proc && IsBackendPid(w->proc->pid))
+ if (w->subid == subid && w->relid == relid &&
+ w->proc && IsBackendPid(w->proc->pid))
{
res = w;
break;
}
}
- LWLockRelease(LogicalRepLauncherLock);
+ return res;
+}
+
+/*
+ * Walks the workers array and searches for ones that matches given
+ * subscription id and counts them.
+ */
+int
+logicalrep_worker_count(Oid subid)
+{
+ int i;
+ int res = 0;
+
+ Assert(LWLockHeldByMe(LogicalRepLauncherLock));
+
+ /* Search for attached worker for a given subscription id. */
+ for (i = 0; i < max_logical_replication_workers; i++)
+ {
+ LogicalRepWorker *w = &LogicalRepCtx->workers[i];
+ if (w->subid == subid && w->proc && IsBackendPid(w->proc->pid))
+ res++;
+ }
return res;
}
@@ -213,17 +232,18 @@ logicalrep_worker_find(Oid subid)
/*
* Start new apply background worker.
*/
-static void
-logicalrep_worker_launch(Oid dbid, Oid subid)
+void
+logicalrep_worker_launch(Oid dbid, Oid subid, Oid relid)
{
BackgroundWorker bgw;
BackgroundWorkerHandle *bgw_handle;
int slot;
LogicalRepWorker *worker = NULL;
- ereport(LOG,
- (errmsg("starting logical replication worker for subscription %u",
- subid)));
+ ereport(DEBUG1,
+ (errmsg("starting logical replication worker for "
+ "subscription %u, relation %u",
+ subid, relid)));
/*
* We need to do the modification of the shared memory under lock so that
@@ -256,6 +276,7 @@ logicalrep_worker_launch(Oid dbid, Oid subid)
memset(worker, 0, sizeof(LogicalRepWorker));
worker->dbid = dbid;
worker->subid = subid;
+ worker->relid = relid;
LWLockRelease(LogicalRepLauncherLock);
@@ -265,6 +286,13 @@ logicalrep_worker_launch(Oid dbid, Oid subid)
bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
bgw.bgw_main = ApplyWorkerMain;
+ if (OidIsValid(relid))
+ snprintf(bgw.bgw_name, BGW_MAXLEN,
+ "logical replication worker %u sync %u", subid, relid);
+ else
+ snprintf(bgw.bgw_name, BGW_MAXLEN,
+ "logical replication worker %u", subid);
+
bgw.bgw_restart_time = BGW_NEVER_RESTART;
bgw.bgw_notify_pid = MyProcPid;
bgw.bgw_main_arg = slot;
@@ -283,7 +311,7 @@ logicalrep_worker_launch(Oid dbid, Oid subid)
if (!WaitForReplicationWorkerAttach(worker, bgw_handle))
{
ereport(WARNING,
- (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("could not launch logical replication worker")));
return;
}
@@ -296,7 +324,7 @@ logicalrep_worker_launch(Oid dbid, Oid subid)
static void
logicalrep_worker_stop(LogicalRepWorker *worker)
{
- LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE);
+ Assert(LWLockHeldByMe(LogicalRepLauncherLock));
/* Check that the worker is up and what we expect. */
if (!worker->proc)
@@ -306,27 +334,6 @@ logicalrep_worker_stop(LogicalRepWorker *worker)
/* Terminate the worker. */
kill(worker->proc->pid, SIGTERM);
-
- LWLockRelease(LogicalRepLauncherLock);
-
- /* Wait for it to detach. */
- for (;;)
- {
- int rc = WaitLatch(&MyProc->procLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
- 1000L);
-
- /* emergency bailout if postmaster has died */
- if (rc & WL_POSTMASTER_DEATH)
- proc_exit(1);
-
- ResetLatch(&MyProc->procLatch);
-
- CHECK_FOR_INTERRUPTS();
-
- if (!worker->proc)
- return;
- }
}
/*
@@ -504,16 +511,21 @@ ApplyLauncherMain(Datum main_arg)
foreach(lc, sublist)
{
Subscription *sub = (Subscription *) lfirst(lc);
- LogicalRepWorker *w = logicalrep_worker_find(sub->oid);
+ LogicalRepWorker *w;
+
+ LWLockAcquire(LogicalRepLauncherLock, LW_SHARED);
+ w = logicalrep_worker_find(sub->oid, InvalidOid);
if (sub->enabled && w == NULL && startsub == NULL)
startsub = sub;
else if (!sub->enabled && w != NULL)
logicalrep_worker_stop(w);
+ LWLockRelease(LogicalRepLauncherLock);
}
if (startsub)
- logicalrep_worker_launch(startsub->dbid, startsub->oid);
+ logicalrep_worker_launch(startsub->dbid, startsub->oid,
+ InvalidOid);
/* Switch back to original memory context. */
MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index ecf9a03..4f456f6 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -30,6 +30,8 @@
#include "miscadmin.h"
+#include "access/heapam.h"
+#include "access/htup.h"
#include "access/xact.h"
#include "access/xlog_internal.h"
@@ -43,6 +45,7 @@
#include "storage/procarray.h"
#include "utils/memutils.h"
+#include "utils/tuplestore.h"
/* data for errcontext callback */
typedef struct LogicalErrorCallbackState
@@ -65,6 +68,9 @@ static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
static void message_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr message_lsn, bool transactional,
const char *prefix, Size message_size, const char *message);
+static List *list_tables_cb_wrapper(LogicalDecodingContext *ctx);
+static void tuple_cb_wrapper(LogicalDecodingContext *ctx, Relation relation,
+ HeapTuple tup);
static void LoadOutputPlugin(OutputPluginCallbacks *callbacks, char *plugin);
@@ -401,6 +407,127 @@ CreateDecodingContext(XLogRecPtr start_lsn,
}
/*
+ * Create a new limited decoding context for base copy.
+ *
+ * nspname:
+ * name of a schema
+ *
+ * relname
+ * name of a relation
+ *
+ * output_plugin_options
+ * contains options passed to the output plugin.
+ *
+ * prepare_write, do_write
+ * callbacks that have to be filled to perform the use-case dependent,
+ * actual work.
+ *
+ * Needs to be called while in a memory context that's at least as long lived
+ * as the decoding context because further memory contexts will be created
+ * inside it.
+ *
+ * Needs to be called inside transaction.
+ *
+ * Returns an initialized decoding context after calling the output plugin's
+ * startup function.
+ */
+LogicalDecodingContext *
+CreateCopyDecodingContext(List *output_plugin_options,
+ LogicalOutputPluginWriterPrepareWrite prepare_write,
+ LogicalOutputPluginWriterWrite do_write)
+{
+ LogicalDecodingContext *ctx;
+ ReplicationSlot *slot;
+ MemoryContext context,
+ old_context;
+
+ /* shorter lines... */
+ slot = MyReplicationSlot;
+
+ /* first some sanity checks that are unlikely to be violated */
+ if (slot == NULL)
+ elog(ERROR, "cannot perform logical base copy without an acquired slot");
+
+ /* make sure the passed slot is suitable, these are user facing errors */
+ if (SlotIsPhysical(slot))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ (errmsg("cannot use physical replication slot for logical base copy"))));
+
+ if (slot->data.database != MyDatabaseId)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ (errmsg("replication slot \"%s\" was not created in this database",
+ NameStr(slot->data.name)))));
+
+ if (!IsTransactionOrTransactionBlock())
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ (errmsg("cannot perform table copy without snapshot"))));
+
+
+ context = AllocSetContextCreate(CurrentMemoryContext,
+ "Table Copy Context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ old_context = MemoryContextSwitchTo(context);
+ ctx = palloc0(sizeof(LogicalDecodingContext));
+
+ ctx->context = context;
+
+ /*
+ * (re-)load output plugins, so we detect a bad (removed) output plugin
+ * now.
+ */
+ LoadOutputPlugin(&ctx->callbacks, NameStr(slot->data.plugin));
+
+ /* CHeck if the output plugin actually supports the copy operations. */
+ if (ctx->callbacks.list_tables_cb == NULL ||
+ ctx->callbacks.tuple_cb == NULL)
+ elog(ERROR, "output plugin \"%s\" does not support LIST_TABLES and COPY_TABLE comamnds",
+ NameStr(slot->data.plugin));
+
+ /* Initialize non NULL fields. */
+ ctx->slot = slot;
+ ctx->out = makeStringInfo();
+ ctx->prepare_write = prepare_write;
+ ctx->write = do_write;
+
+ /* Make sure plugin sees the options. */
+ ctx->output_plugin_options = output_plugin_options;
+
+ /* call output plugin initialization callback */
+ if (ctx->callbacks.startup_cb != NULL)
+ startup_cb_wrapper(ctx, &ctx->options, false);
+
+ MemoryContextSwitchTo(old_context);
+
+ return ctx;
+}
+
+/*
+ * Process the tuple tup of a relation rel - send it to the tuple
+ * callback of the plugin.
+ */
+void
+DecodingContextProccessTuple(LogicalDecodingContext *ctx, Relation rel,
+ HeapTuple tup)
+{
+ tuple_cb_wrapper(ctx, rel, tup);
+}
+
+/*
+ * Process the tuple tup of a relation rel - send it to the tuple
+ * callback of the plugin.
+ */
+List *
+DecodingContextGetTableList(LogicalDecodingContext *ctx)
+{
+ return list_tables_cb_wrapper(ctx);
+}
+
+/*
* Returns true if a consistent initial decoding snapshot has been built.
*/
bool
@@ -461,9 +588,12 @@ FreeDecodingContext(LogicalDecodingContext *ctx)
if (ctx->callbacks.shutdown_cb != NULL)
shutdown_cb_wrapper(ctx);
- ReorderBufferFree(ctx->reorder);
- FreeSnapshotBuilder(ctx->snapshot_builder);
- XLogReaderFree(ctx->reader);
+ if (ctx->reorder)
+ ReorderBufferFree(ctx->reorder);
+ if (ctx->snapshot_builder)
+ FreeSnapshotBuilder(ctx->snapshot_builder);
+ if (ctx->reader)
+ XLogReaderFree(ctx->reader);
MemoryContextDelete(ctx->context);
}
@@ -748,6 +878,60 @@ message_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static List *
+list_tables_cb_wrapper(LogicalDecodingContext *ctx)
+{
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ List *res;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "list_tables";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+
+ /* do the actual work: call callback */
+ res = ctx->callbacks.list_tables_cb(ctx);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return res;
+}
+
+static void
+tuple_cb_wrapper(LogicalDecodingContext *ctx, Relation relation,
+ HeapTuple tup)
+{
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "tuple";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+
+ /* do the actual work: call callback */
+ ctx->callbacks.tuple_cb(ctx, relation, tup);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
/*
* Set the required catalog xmin horizon for historic snapshots in the current
* replication slot.
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 2b82495..e2f203b 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -356,60 +356,79 @@ logicalrep_read_delete(StringInfo in, LogicalRepTupleData *oldtup)
}
/*
+ * Write qualified relation name to the output stream.
+ */
+void
+logicalrep_write_rel_name(StringInfo out, char *nspname, char *relname)
+{
+ uint8 nspnamelen;
+ uint8 relnamelen;
+
+ nspnamelen = strlen(nspname) + 1;
+ relnamelen = strlen(relname) + 1;
+
+ pq_sendbyte(out, nspnamelen); /* schema name length */
+ pq_sendbytes(out, nspname, nspnamelen);
+
+ pq_sendbyte(out, relnamelen); /* table name length */
+ pq_sendbytes(out, relname, relnamelen);
+}
+
+/*
+ * Read qualified relation name from the stream.
+ */
+void
+logicalrep_read_rel_name(StringInfo in, char **nspname, char **relname)
+{
+ int len;
+
+ len = pq_getmsgbyte(in);
+ *nspname = (char *) pq_getmsgbytes(in, len);
+
+ len = pq_getmsgbyte(in);
+ *relname = (char *) pq_getmsgbytes(in, len);
+}
+
+
+/*
* Write relation description to the output stream.
*/
void
logicalrep_write_rel(StringInfo out, Relation rel)
{
char *nspname;
- uint8 nspnamelen;
- const char *relname;
- uint8 relnamelen;
+ char *relname;
pq_sendbyte(out, 'R'); /* sending RELATION */
/* use Oid as relation identifier */
pq_sendint(out, RelationGetRelid(rel), 4);
+ /* send the relation name */
nspname = get_namespace_name(RelationGetNamespace(rel));
if (nspname == NULL)
elog(ERROR, "cache lookup failed for namespace %u",
rel->rd_rel->relnamespace);
- nspnamelen = strlen(nspname) + 1;
-
relname = RelationGetRelationName(rel);
- relnamelen = strlen(relname) + 1;
-
- pq_sendbyte(out, nspnamelen); /* schema name length */
- pq_sendbytes(out, nspname, nspnamelen);
- pq_sendbyte(out, relnamelen); /* table name length */
- pq_sendbytes(out, relname, relnamelen);
+ logicalrep_write_rel_name(out, nspname, relname);
/* send the attribute info */
logicalrep_write_attrs(out, rel);
-
- pfree(nspname);
}
/*
- * Read schema.relation from stream and return as LogicalRepRelation opened in
- * lockmode.
+ * Read the relation info from stream and return as LogicalRepRelation.
*/
LogicalRepRelation *
logicalrep_read_rel(StringInfo in)
{
LogicalRepRelation *rel = palloc(sizeof(LogicalRepRelation));
- int len;
rel->remoteid = pq_getmsgint(in, 4);
- /* Read relation from stream */
- len = pq_getmsgbyte(in);
- rel->nspname = (char *) pq_getmsgbytes(in, len);
-
- len = pq_getmsgbyte(in);
- rel->relname = (char *) pq_getmsgbytes(in, len);
+ /* Read relation name from stream */
+ logicalrep_read_rel_name(in, &rel->nspname, &rel->relname);
/* Get attribute description */
logicalrep_read_attrs(in, &rel->attnames, &rel->natts);
diff --git a/src/backend/replication/logical/publication.c b/src/backend/replication/logical/publication.c
index b86611e..62f40e9 100644
--- a/src/backend/replication/logical/publication.c
+++ b/src/backend/replication/logical/publication.c
@@ -274,6 +274,46 @@ GetRelationPublications(Relation rel)
return result;
}
+/*
+ * Gets list of relation oids for a publication.
+ */
+List *
+GetPublicationRelations(Oid pubid)
+{
+ List *result;
+ Relation pubrelsrel;
+ ScanKeyData scankey;
+ SysScanDesc scan;
+ HeapTuple tup;
+
+ /* Find all publications associated with the relation. */
+ pubrelsrel = heap_open(PublicationRelRelationId, AccessShareLock);
+
+ ScanKeyInit(&scankey,
+ Anum_pg_publication_rel_pubid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(pubid));
+
+ scan = systable_beginscan(pubrelsrel, PublicationRelMapIndexId, true,
+ NULL, 1, &scankey);
+
+ result = NIL;
+ while (HeapTupleIsValid(tup = systable_getnext(scan)))
+ {
+ Form_pg_publication_rel pubrel;
+
+ pubrel = (Form_pg_publication_rel) GETSTRUCT(tup);
+
+ result = lappend_oid(result, pubrel->relid);
+ }
+
+ systable_endscan(scan);
+ heap_close(pubrelsrel, NoLock);
+
+ return result;
+}
+
+
Publication *
GetPublication(Oid pubid)
{
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index b5fa3db..33e15ab 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -591,9 +591,10 @@ SnapBuildExportSnapshot(SnapBuild *builder)
snap->xip = newxip;
/*
- * now that we've built a plain snapshot, use the normal mechanisms for
- * exporting it
+ * now that we've built a plain snapshot, make it active and use the
+ * normal mechanisms for exporting it
*/
+ PushActiveSnapshot(snap);
snapname = ExportSnapshot(snap);
ereport(LOG,
diff --git a/src/backend/replication/logical/subscription.c b/src/backend/replication/logical/subscription.c
index 7d1de2c..3ba2b45 100644
--- a/src/backend/replication/logical/subscription.c
+++ b/src/backend/replication/logical/subscription.c
@@ -22,11 +22,15 @@
#include "access/htup_details.h"
#include "access/xact.h"
+#include "catalog/dependency.h"
#include "catalog/indexing.h"
#include "catalog/namespace.h"
#include "catalog/objectaddress.h"
#include "catalog/pg_type.h"
#include "catalog/pg_subscription.h"
+#include "catalog/pg_subscription_rel.h"
+
+#include "commands/replicationcmds.h"
#include "executor/spi.h"
@@ -41,6 +45,7 @@
#include "utils/fmgroids.h"
#include "utils/inval.h"
#include "utils/lsyscache.h"
+#include "utils/pg_lsn.h"
#include "utils/rel.h"
#include "utils/syscache.h"
@@ -144,3 +149,178 @@ textarray_to_stringlist(ArrayType *textarray)
return res;
}
+
+/*
+ * Set the state of a subscription table.
+ */
+Oid
+SetSubscriptionRelState(Oid subid, Oid relid, char state,
+ XLogRecPtr sublsn)
+{
+ Relation rel;
+ HeapTuple tup;
+ Oid subrelid;
+ bool nulls[Natts_pg_subscription_rel];
+ Datum values[Natts_pg_subscription_rel];
+
+ rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
+
+ /* Try finding existing mapping. */
+ tup = SearchSysCacheCopy2(SUBSCRIPTIONRELMAP,
+ ObjectIdGetDatum(relid),
+ ObjectIdGetDatum(subid));
+
+ memset(values, 0, sizeof(values));
+
+ /*
+ * If the record for given table does not exist yet create new
+ * record, otherwise update the existing one.
+ */
+ if (!HeapTupleIsValid(tup))
+ {
+ ObjectAddress myself,
+ referenced;
+
+ /* Form the tuple. */
+ memset(nulls, false, sizeof(nulls));
+ values[Anum_pg_subscription_rel_subid - 1] = ObjectIdGetDatum(subid);
+ values[Anum_pg_subscription_rel_subrelid - 1] = ObjectIdGetDatum(relid);
+ values[Anum_pg_subscription_rel_substate - 1] = CharGetDatum(state);
+ if (sublsn != InvalidXLogRecPtr)
+ values[Anum_pg_subscription_rel_sublsn - 1] = LSNGetDatum(sublsn);
+ else
+ nulls[Anum_pg_subscription_rel_sublsn - 1] = true;
+
+ tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+
+ /* Insert tuple into catalog. */
+ subrelid = simple_heap_insert(rel, tup);
+ CatalogUpdateIndexes(rel, tup);
+
+ heap_freetuple(tup);
+
+ /* Add dependency on the publication */
+ ObjectAddressSet(myself, SubscriptionRelRelationId, subrelid);
+ ObjectAddressSet(referenced, SubscriptionRelationId, subid);
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
+
+ /* Add dependency on the relation */
+ ObjectAddressSet(referenced, RelationRelationId, relid);
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
+ }
+ else
+ {
+ bool replaces[Natts_pg_subscription_rel];
+
+ /* Update the tuple. */
+ memset(nulls, true, sizeof(nulls));
+ memset(replaces, false, sizeof(replaces));
+
+ replaces[Anum_pg_subscription_rel_substate - 1] = true;
+ nulls[Anum_pg_subscription_rel_substate - 1] = false;
+ values[Anum_pg_subscription_rel_substate - 1] = CharGetDatum(state);
+
+ replaces[Anum_pg_subscription_rel_sublsn - 1] = true;
+ if (sublsn != InvalidXLogRecPtr)
+ {
+ nulls[Anum_pg_subscription_rel_sublsn - 1] = false;
+ values[Anum_pg_subscription_rel_sublsn - 1] = LSNGetDatum(sublsn);
+ }
+
+ tup = heap_modify_tuple(tup, RelationGetDescr(rel), values, nulls,
+ replaces);
+
+ /* Update the catalog. */
+ simple_heap_update(rel, &tup->t_self, tup);
+ CatalogUpdateIndexes(rel, tup);
+
+ subrelid = HeapTupleGetOid(tup);
+ }
+
+ /* Cleanup. */
+ heap_close(rel, NoLock);
+
+ /* Make the changes visible. */
+ CommandCounterIncrement();
+
+ return subrelid;
+}
+
+/*
+ * Get state of subscription table.
+ *
+ * Returns SUBREL_STATE_UNKNOWN when not found and missing_ok is true.
+ */
+char
+GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn,
+ bool missing_ok)
+{
+ Relation rel;
+ HeapTuple tup;
+ char substate;
+ bool isnull;
+ Datum d;
+
+ rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
+
+ /* Try finding the mapping. */
+ tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
+ ObjectIdGetDatum(relid),
+ ObjectIdGetDatum(subid));
+
+ if (!HeapTupleIsValid(tup))
+ {
+ if (missing_ok)
+ {
+ *sublsn = InvalidXLogRecPtr;
+ return '\0';
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("subscription table %u in subscription %d does not exist",
+ relid, subid)));
+ }
+
+ /* Get the state. */
+ d = SysCacheGetAttr(SUBSCRIPTIONRELMAP, tup,
+ Anum_pg_subscription_rel_substate, &isnull);
+ Assert(!isnull);
+ substate = DatumGetChar(d);
+ d = SysCacheGetAttr(SUBSCRIPTIONRELMAP, tup,
+ Anum_pg_subscription_rel_sublsn, &isnull);
+ if (isnull)
+ *sublsn = InvalidXLogRecPtr;
+ else
+ *sublsn = DatumGetLSN(d);
+
+ /* Cleanup */
+ ReleaseSysCache(tup);
+ heap_close(rel, RowExclusiveLock);
+
+ return substate;
+}
+
+/*
+ * Drop subscription table by OID
+ */
+void
+DropSubscriptionRelById(Oid subrelid)
+{
+ Relation rel;
+ HeapTuple tup;
+
+ rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
+
+ tup = SearchSysCache1(SUBSCRIPTIONRELOID, ObjectIdGetDatum(subrelid));
+
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for subscription table %u",
+ subrelid);
+
+ simple_heap_delete(rel, &tup->t_self);
+
+ ReleaseSysCache(tup);
+
+ heap_close(rel, RowExclusiveLock);
+}
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
new file mode 100644
index 0000000..7e844e8
--- /dev/null
+++ b/src/backend/replication/logical/tablesync.c
@@ -0,0 +1,672 @@
+/*-------------------------------------------------------------------------
+ * tablesync.c
+ * PostgreSQL logical replication
+ *
+ * Copyright (c) 2012-2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/logical/tablesync.c
+ *
+ * NOTES
+ * This file contains code for initial table data synchronization for
+ * logical replication.
+ *
+ * The initial data synchronization is done separately for each table,
+ * in separate apply worker that only fetches the initial snapshot data
+ * from the provider and then synchronizes the position in stream with
+ * the main apply worker.
+ *
+ * The stream position synchronization works in multiple steps.
+ * - sync finishes copy and sets table state as SYNCWAIT and waits
+ * for state to change in a loop
+ * - apply periodically checks unsynced tables for SYNCWAIT, when it
+ * appears it will compare its position in the stream with the
+ * SYNCWAIT position and decides to either set it to CATCHUP when
+ * the apply was infront (and wait for the sync to do the catchup),
+ * or set the state to SYNCDONE if the sync was infront or in case
+ * both sync and apply are at the same position it will set it to
+ * READY and stops tracking it
+ * - if the state was set to CATCHUP sync will read the stream and
+ * apply changes until it catches up to the specified stream
+ * position and then sets state to READY and signals apply that it
+ * can stop waiting and exits, if the state was set to something
+ * else than CATCHUP the sync process will simply end
+ * - if the state was set to SYNCDONE by apply, the apply will
+ * continue tracking the table until it reaches the SYNCDONE stream
+ * position at which point it sets state to READY and stops tracking
+ *
+ * Example flows look like this:
+ * - Apply is infront:
+ * sync:8 -> set SYNCWAIT
+ * apply:10 -> set CATCHUP
+ * sync:10 -> set ready
+ * exit
+ * apply:10
+ * stop tracking
+ * continue rep
+ * - Sync infront:
+ * sync:10
+ * set SYNCWAIT
+ * apply:8
+ * set SYNCDONE
+ * sync:10
+ * exit
+ * apply:10
+ * set READY
+ * stop tracking
+ * continue rep
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/xact.h"
+#include "access/xlog_internal.h"
+
+#include "catalog/namespace.h"
+#include "catalog/pg_subscription_rel.h"
+
+#include "commands/trigger.h"
+
+#include "executor/executor.h"
+#include "executor/nodeModifyTable.h"
+
+#include "libpq/pqformat.h"
+#include "libpq/pqsignal.h"
+
+#include "mb/pg_wchar.h"
+
+#include "optimizer/planner.h"
+
+#include "parser/parse_relation.h"
+
+#include "postmaster/bgworker.h"
+#include "postmaster/postmaster.h"
+
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/logicalproto.h"
+#include "replication/logicalworker.h"
+#include "replication/reorderbuffer.h"
+#include "replication/origin.h"
+#include "replication/snapbuild.h"
+#include "replication/subscription.h"
+#include "replication/walreceiver.h"
+#include "replication/worker_internal.h"
+
+#include "rewrite/rewriteHandler.h"
+
+#include "storage/bufmgr.h"
+#include "storage/ipc.h"
+#include "storage/lmgr.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+
+#include "utils/builtins.h"
+#include "utils/catcache.h"
+#include "utils/fmgroids.h"
+#include "utils/guc.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/timeout.h"
+#include "utils/tqual.h"
+#include "utils/syscache.h"
+
+typedef struct TableState
+{
+ dlist_node node;
+ Oid relid;
+ XLogRecPtr lsn;
+ char state;
+} TableState;
+
+static dlist_head table_states = DLIST_STATIC_INIT(table_states);
+static bool table_states_valid = false;
+
+
+/*
+ * Exit routine for synchronization worker.
+ */
+static void
+finish_sync_worker(char *slotname)
+{
+ LogicalRepWorker *worker;
+ RepOriginId originid;
+ MemoryContext oldctx = CurrentMemoryContext;
+
+ /*
+ * Drop the replication slot on remote server.
+ * We want to continue even in the case that the slot on remote side
+ * is already gone. This means that we can leave slot on the remote
+ * side but that can happen for other reasons as well so we can't
+ * really protect against that.
+ */
+ PG_TRY();
+ {
+ wrcapi->drop_slot(wrchandle, slotname);
+ }
+ PG_CATCH();
+ {
+ MemoryContext ectx;
+ ErrorData *edata;
+
+ ectx = MemoryContextSwitchTo(oldctx);
+ /* Save error info */
+ edata = CopyErrorData();
+ MemoryContextSwitchTo(ectx);
+ FlushErrorState();
+
+ ereport(WARNING,
+ (errmsg("there was problem dropping the replication slot "
+ "\"%s\" on provider", slotname),
+ errdetail("The error was: %s", edata->message),
+ errhint("You may have to drop it manually")));
+ FreeErrorData(edata);
+ }
+ PG_END_TRY();
+
+ /* Also remove the origin tracking for the slot if it exists. */
+ StartTransactionCommand();
+ originid = replorigin_by_name(slotname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ if (originid == replorigin_session_origin)
+ {
+ replorigin_session_reset();
+ replorigin_session_origin = InvalidRepOriginId;
+ }
+ replorigin_drop(originid);
+ }
+ CommitTransactionCommand();
+
+ /* Flush all writes. */
+ XLogFlush(GetXLogWriteRecPtr());
+
+ /* Find the main apply worker and signal it. */
+ LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE);
+ worker = logicalrep_worker_find(MyLogicalRepWorker->subid, InvalidOid);
+ if (worker && worker->proc)
+ SetLatch(&worker->proc->procLatch);
+ LWLockRelease(LogicalRepLauncherLock);
+
+ ereport(LOG,
+ (errmsg("logical replication synchronization worker finished processing")));
+
+ /* Stop gracefully */
+ wrcapi->disconnect(wrchandle);
+ proc_exit(0);
+}
+
+/*
+ * Wait until the table synchronization change matches the desired state.
+ */
+static bool
+wait_for_sync_status_change(TableState *tstate)
+{
+ int rc;
+ char state = tstate->state;
+
+ for (;;)
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ tstate->state = GetSubscriptionRelState(MyLogicalRepWorker->subid,
+ tstate->relid,
+ &tstate->lsn,
+ true);
+ CommitTransactionCommand();
+
+ if (tstate->state != state)
+ return true;
+
+ rc = WaitLatch(&MyProc->procLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ 10000L);
+
+ /* emergency bailout if postmaster has died */
+ if (rc & WL_POSTMASTER_DEATH)
+ proc_exit(1);
+
+ ResetLatch(&MyProc->procLatch);
+ }
+
+ return false;
+}
+
+/*
+ * Read the state of the tables in the subscription and update our table
+ * state list.
+ */
+static void
+reread_sync_state(Oid relid)
+{
+ dlist_mutable_iter iter;
+ Relation rel;
+ HeapTuple tup;
+ ScanKeyData skey[2];
+ HeapScanDesc scan;
+
+ /* Clean the old list. */
+ dlist_foreach_modify(iter, &table_states)
+ {
+ TableState *tstate = dlist_container(TableState, node, iter.cur);
+
+ dlist_delete(iter.cur);
+ pfree(tstate);
+ }
+
+ /*
+ * Fetch all the subscription relation states that are not marked as
+ * ready and push them into our table state tracking list.
+ */
+ rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
+
+ ScanKeyInit(&skey[0],
+ Anum_pg_subscription_rel_subid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(MyLogicalRepWorker->subid));
+
+ if (OidIsValid(relid))
+ {
+ ScanKeyInit(&skey[1],
+ Anum_pg_subscription_rel_subrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(relid));
+ }
+ else
+ {
+ ScanKeyInit(&skey[1],
+ Anum_pg_subscription_rel_substate,
+ BTEqualStrategyNumber, F_CHARNE,
+ CharGetDatum(SUBREL_STATE_READY));
+ }
+
+ scan = heap_beginscan_catalog(rel, 2, skey);
+
+ while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+ {
+ Form_pg_subscription_rel subrel;
+ TableState *tstate;
+ MemoryContext oldctx;
+
+ subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
+
+ /* Allocate the tracking info in a permament memory context. */
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+
+ tstate = (TableState *) palloc(sizeof(TableState));
+ tstate->relid = subrel->relid;
+ tstate->state = subrel->substate;
+ tstate->lsn = subrel->sublsn;
+
+ dlist_push_tail(&table_states, &tstate->node);
+ MemoryContextSwitchTo(oldctx);
+ }
+
+ /* Cleanup */
+ heap_endscan(scan);
+ heap_close(rel, RowExclusiveLock);
+
+ table_states_valid = true;
+}
+
+/*
+ * Callback from syscache invalidation.
+ */
+void
+invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
+{
+ table_states_valid = false;
+}
+
+/*
+ * Handle table synchronization cooperation from the synchroniation
+ * worker.
+ */
+static void
+process_syncing_tables_sync(char *slotname, XLogRecPtr end_lsn)
+{
+ TableState *tstate;
+ TimeLineID tli;
+
+ Assert(!IsTransactionState());
+
+ /*
+ * Synchronization workers don't keep track of all synchronization
+ * tables, they only care about their table.
+ */
+ if (!table_states_valid)
+ {
+ StartTransactionCommand();
+ reread_sync_state(MyLogicalRepWorker->relid);
+ CommitTransactionCommand();
+ }
+
+ /* Somebody removed table underneath this worker, nothing more to do. */
+ if (dlist_is_empty(&table_states))
+ {
+ wrcapi->endstreaming(wrchandle, &tli);
+ finish_sync_worker(slotname);
+ }
+
+ /* Check if we are done with catchup now. */
+ tstate = dlist_container(TableState, node, dlist_head_node(&table_states));
+ if (tstate->state == SUBREL_STATE_CATCHUP)
+ {
+ Assert(tstate->lsn != InvalidXLogRecPtr);
+
+ if (tstate->lsn == end_lsn)
+ {
+ tstate->state = SUBREL_STATE_READY;
+ tstate->lsn = InvalidXLogRecPtr;
+ /* Update state of the synchronization. */
+ StartTransactionCommand();
+ SetSubscriptionRelState(MyLogicalRepWorker->subid,
+ tstate->relid, tstate->state,
+ tstate->lsn);
+ CommitTransactionCommand();
+
+ wrcapi->endstreaming(wrchandle, &tli);
+ finish_sync_worker(slotname);
+ }
+ return;
+ }
+}
+
+/*
+ * Handle table synchronization cooperation from the apply worker.
+ */
+static void
+process_syncing_tables_apply(char *slotname, XLogRecPtr end_lsn)
+{
+ dlist_mutable_iter iter;
+
+ Assert(!IsTransactionState());
+
+ if (!table_states_valid)
+ {
+ StartTransactionCommand();
+ reread_sync_state(InvalidOid);
+ CommitTransactionCommand();
+ }
+
+ dlist_foreach_modify(iter, &table_states)
+ {
+ TableState *tstate = dlist_container(TableState, node, iter.cur);
+ bool start_worker;
+ LogicalRepWorker *worker;
+
+ /*
+ * When the synchronization process is at the cachup phase we need
+ * to ensure that we are not behind it (it's going to wait at this
+ * point for the change of state). Once we are infront or at the same
+ * position as the synchronization proccess we can signal it to
+ * finish the catchup.
+ */
+ if (tstate->state == SUBREL_STATE_SYNCWAIT)
+ {
+ if (end_lsn > tstate->lsn)
+ {
+ /*
+ * Apply is infront, tell sync to catchup. and wait until
+ * it does.
+ */
+ tstate->state = SUBREL_STATE_CATCHUP;
+ tstate->lsn = end_lsn;
+ StartTransactionCommand();
+ SetSubscriptionRelState(MyLogicalRepWorker->subid,
+ tstate->relid, tstate->state,
+ tstate->lsn);
+ CommitTransactionCommand();
+
+ /* Signal the worker as it may be waiting for us. */
+ LWLockAcquire(LogicalRepLauncherLock, LW_SHARED);
+ worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
+ tstate->relid);
+ if (worker && worker->proc)
+ SetLatch(&worker->proc->procLatch);
+ LWLockRelease(LogicalRepLauncherLock);
+
+ if (wait_for_sync_status_change(tstate));
+ Assert(SUBREL_STATE_READY);
+ }
+ else
+ {
+ /*
+ * Apply is either behind in which case sync worker is done
+ * but apply needs to keep tracking the table until it
+ * catches up to where sync finished.
+ * Or apply and sync are at the same position in which case
+ * table can be switched to standard replication mode
+ * immediately.
+ */
+ if (end_lsn < tstate->lsn)
+ tstate->state = SUBREL_STATE_SYNCDONE;
+ else
+ tstate->state = SUBREL_STATE_READY;
+
+ StartTransactionCommand();
+ SetSubscriptionRelState(MyLogicalRepWorker->subid,
+ tstate->relid, tstate->state,
+ tstate->lsn);
+ CommitTransactionCommand();
+
+ /* Signal the worker as it may be waiting for us. */
+ LWLockAcquire(LogicalRepLauncherLock, LW_SHARED);
+ worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
+ tstate->relid);
+ if (worker && worker->proc)
+ SetLatch(&worker->proc->procLatch);
+ LWLockRelease(LogicalRepLauncherLock);
+ }
+ }
+ else if (tstate->state == SUBREL_STATE_SYNCDONE &&
+ end_lsn >= tstate->lsn)
+ {
+ /*
+ * Apply catched up to the position where table sync finished,
+ * mark the table as ready for normal replication.
+ */
+ tstate->state = SUBREL_STATE_READY;
+ tstate->lsn = InvalidXLogRecPtr;
+ StartTransactionCommand();
+ SetSubscriptionRelState(MyLogicalRepWorker->subid,
+ tstate->relid, tstate->state,
+ tstate->lsn);
+ CommitTransactionCommand();
+ }
+
+ /*
+ * In case table is supposed to be synchronizing but the
+ * synchronization worker is not running, start it.
+ * Limit the number of launched workers here to one (for now).
+ */
+ if (tstate->state != SUBREL_STATE_READY &&
+ tstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ LWLockAcquire(LogicalRepLauncherLock, LW_SHARED);
+ worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
+ tstate->relid);
+ start_worker = !worker &&
+ logicalrep_worker_count(MyLogicalRepWorker->subid) < 2;
+ LWLockRelease(LogicalRepLauncherLock);
+ if (start_worker)
+ logicalrep_worker_launch(MyLogicalRepWorker->dbid,
+ MyLogicalRepWorker->subid,
+ tstate->relid);
+
+ }
+ }
+}
+
+/*
+ * Proccess state possible change(s) of tables that are being synchronized
+ * in parallel.
+ */
+void
+process_syncing_tables(char *slotname, XLogRecPtr end_lsn)
+{
+ if (OidIsValid(MyLogicalRepWorker->relid))
+ process_syncing_tables_sync(slotname, end_lsn);
+ else
+ process_syncing_tables_apply(slotname, end_lsn);
+}
+
+/*
+ * Setup replication origin tracking.
+ */
+static XLogRecPtr
+setup_origin_tracking(char *origin_name)
+{
+ RepOriginId originid;
+
+ StartTransactionCommand();
+ originid = replorigin_by_name(origin_name, true);
+ if (!OidIsValid(originid))
+ originid = replorigin_create(origin_name);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ CommitTransactionCommand();
+ return replorigin_session_get_progress(false);
+}
+
+
+/*
+ * Start syncing the table in the sync worker.
+ */
+char *
+LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
+{
+ StringInfoData s;
+ TableState tstate;
+ MemoryContext oldctx;
+ char *slotname;
+
+ /* Check the state of the table synchronization. */
+ StartTransactionCommand();
+ tstate.relid = MyLogicalRepWorker->relid;
+ tstate.state = GetSubscriptionRelState(MySubscription->oid, tstate.relid,
+ &tstate.lsn, false);
+
+ /*
+ * Build unique slot name.
+ * TODO: protect against too long slot name.
+ */
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+ initStringInfo(&s);
+ appendStringInfo(&s, "%s_sync_%s", MySubscription->slotname,
+ get_rel_name(tstate.relid));
+ slotname = s.data;
+ MemoryContextSwitchTo(oldctx);
+
+ CommitTransactionCommand();
+
+ wrcapi->connect(wrchandle, MySubscription->conninfo, true, slotname);
+
+ switch (tstate.state)
+ {
+ case SUBREL_STATE_INIT:
+ case SUBREL_STATE_DATA:
+ {
+ Relation rel;
+ XLogRecPtr lsn;
+ char *options;
+
+ /* Update the state and make it visible to others. */
+ StartTransactionCommand();
+ SetSubscriptionRelState(MySubscription->oid, tstate.relid,
+ SUBREL_STATE_DATA,
+ InvalidXLogRecPtr);
+ CommitTransactionCommand();
+
+ *origin_startpos = setup_origin_tracking(slotname);
+
+ /*
+ * We want to do the table data sync in single
+ * transaction so do not close the transaction opened
+ * above.
+ * There will be no BEGIN or COMMIT messages coming via
+ * logical replication while the copy table command is
+ * running so start the transaction here.
+ * Note the memory context for data handling will still
+ * be done using ensure_transaction called by the insert
+ * handler.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Don't allow parallel access other than SELECT while
+ * the initial contents are being copied.
+ */
+ rel = heap_open(tstate.relid, ExclusiveLock);
+
+ /* Create temporary slot for the sync proccess. */
+ wrcapi->create_slot(wrchandle, slotname, true, &lsn);
+
+ /* Build option string for the plugin. */
+ options = logicalrep_build_options(MySubscription->publications);
+
+ wrcapi->copy_table(wrchandle, slotname,
+ get_namespace_name(RelationGetNamespace(rel)),
+ RelationGetRelationName(rel),
+ options);
+
+ /*
+ * Run the standard apply loop for the initial data
+ * stream.
+ */
+ in_remote_transaction = true;
+ LogicalRepApplyLoop(*origin_startpos);
+
+ /*
+ * We are done with the initial data synchronization,
+ * update the state.
+ */
+ SetSubscriptionRelState(MySubscription->oid, tstate.relid,
+ SUBREL_STATE_SYNCWAIT, lsn);
+ heap_close(rel, NoLock);
+
+ /* End the transaction. */
+ CommitTransactionCommand();
+ in_remote_transaction = false;
+
+ /*
+ * Wait for main apply worker to either tell us to
+ * catchup or that we are done.
+ */
+ wait_for_sync_status_change(&tstate);
+ if (tstate.state != SUBREL_STATE_CATCHUP)
+ finish_sync_worker(slotname);
+ break;
+ }
+
+ case SUBREL_STATE_SYNCWAIT:
+ *origin_startpos = setup_origin_tracking(slotname);
+ /*
+ * Wait for main apply worker to either tell us to
+ * catchup or that we are done.
+ */
+ wait_for_sync_status_change(&tstate);
+ if (tstate.state != SUBREL_STATE_CATCHUP)
+ finish_sync_worker(slotname);
+ break;
+ case SUBREL_STATE_CATCHUP:
+ /* Catchup is handled by streaming loop. */
+ *origin_startpos = setup_origin_tracking(slotname);
+ break;
+ case SUBREL_STATE_SYNCDONE:
+ case SUBREL_STATE_READY:
+ /* Nothing to do here but finish. */
+ finish_sync_worker(slotname);
+ default:
+ elog(ERROR, "unknown relation state \"%c\"", tstate.state);
+ }
+
+ return slotname;
+}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index d74c7e9..e18939e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -14,6 +14,8 @@
#include "access/xact.h"
+#include "catalog/pg_publication.h"
+
#include "mb/pg_wchar.h"
#include "replication/logical.h"
@@ -24,7 +26,10 @@
#include "utils/builtins.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
#include "utils/memutils.h"
+#include "utils/tuplestore.h"
+#include "utils/syscache.h"
PG_MODULE_MAGIC;
@@ -40,6 +45,9 @@ static void pgoutput_commit_txn(LogicalDecodingContext *ctx,
static void pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, Relation rel,
ReorderBufferChange *change);
+static void pgoutput_tuple(LogicalDecodingContext *ctx, Relation relation,
+ HeapTuple tuple);
+static List *pgoutput_list_tables(LogicalDecodingContext *ctx);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
@@ -74,6 +82,8 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->commit_cb = pgoutput_commit_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
+ cb->tuple_cb = pgoutput_tuple;
+ cb->list_tables_cb = pgoutput_list_tables;
}
/*
@@ -295,6 +305,104 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
/*
+ * Sends the tuple from relation over wire.
+ * Currenly this behaves same as the INSERT replication.
+ */
+static void
+pgoutput_tuple(LogicalDecodingContext *ctx, Relation relation,
+ HeapTuple tuple)
+{
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ MemoryContext old;
+ RelSchemaSyncEntry *relentry = NULL;
+
+ /*
+ * First check the table filter
+ * TODO: do we actually need this?
+ */
+ if (!publication_change_is_replicated(relation,
+ PublicationChangeInsert,
+ data->publication_names))
+ return;
+
+ /* Avoid leaking memory by using and resetting our own context */
+ old = MemoryContextSwitchTo(data->context);
+
+ /*
+ * Write the relation schema if the current schema haven't been sent yet.
+ */
+ relentry = get_rel_schema_sync_entry(RelationGetRelid(relation));
+ if (!relentry->schema_sent)
+ {
+ OutputPluginPrepareWrite(ctx, false);
+ logicalrep_write_rel(ctx->out, relation);
+ OutputPluginWrite(ctx, false);
+ relentry->schema_sent = true;
+ }
+
+ /* Send the data */
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_insert(ctx->out, relation, tuple);
+ OutputPluginWrite(ctx, true);
+
+ /* Cleanup */
+ MemoryContextSwitchTo(old);
+ MemoryContextReset(data->context);
+}
+
+/*
+ * Get the list of tables replicated by current connection.
+ */
+static List *
+pgoutput_list_tables(LogicalDecodingContext *ctx)
+{
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ MemoryContext old;
+ List *rellist = NIL,
+ *res = NIL;
+ ListCell *lc;
+
+ /* Avoid leaking memory by using and resetting our own context */
+ old = MemoryContextSwitchTo(data->context);
+
+ /* Build unique list of relations in all subscribed publications. */
+ foreach(lc, data->publication_names)
+ {
+ char *pubname = (char *) lfirst(lc);
+ Oid pubid;
+ List *pubrellist;
+
+ pubid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(pubname));
+ if (!OidIsValid(pubid))
+ elog(ERROR, "cache lookup failed for publication %u", pubid);
+
+ pubrellist = GetPublicationRelations(pubid);
+ rellist = list_concat_unique_oid(rellist, pubrellist);
+ }
+
+ MemoryContextSwitchTo(old);
+
+ /* Put all the relations list of LogicalRepTableListEntry. */
+ foreach (lc, rellist)
+ {
+ Oid relid = lfirst_oid(lc);
+ LogicalRepTableListEntry *entry;
+
+ entry = palloc(sizeof(LogicalRepTableListEntry));
+ entry->nspname = get_namespace_name(get_rel_namespace(relid));
+ entry->relname = get_rel_name(relid);
+ entry->info = NULL;
+
+ res = lappend(res, entry);
+ }
+
+ /* Cleanup our memory context. */
+ MemoryContextReset(data->context);
+
+ return res;
+}
+
+/*
* Currently we always forward.
*/
static bool
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index d93db88..65b8ea7 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -77,11 +77,14 @@ Node *replication_parse_result;
%token K_LOGICAL
%token K_SLOT
%token K_RESERVE_WAL
+%token K_TABLE
+%token K_LIST_TABLES
+%token K_COPY_TABLE
%type <node> command
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
- timeline_history
+ timeline_history list_tables copy_table
%type <list> base_backup_opt_list
%type <defelt> base_backup_opt
%type <uintval> opt_timeline
@@ -111,6 +114,8 @@ command:
| create_replication_slot
| drop_replication_slot
| timeline_history
+ | list_tables
+ | copy_table
;
/*
@@ -323,6 +328,30 @@ plugin_opt_arg:
SCONST { $$ = (Node *) makeString($1); }
| /* EMPTY */ { $$ = NULL; }
;
+
+copy_table:
+ K_COPY_TABLE K_SLOT IDENT K_TABLE IDENT IDENT plugin_options
+ {
+ CopyTableCmd *cmd;
+ cmd = makeNode(CopyTableCmd);
+ cmd->slotname = $3;
+ cmd->relation = makeRangeVar($5, $6, -1);
+ cmd->options = $7;
+ $$ = (Node *) cmd;
+ }
+ ;
+
+list_tables:
+ K_LIST_TABLES K_SLOT IDENT plugin_options
+ {
+ ListTablesCmd *cmd;
+ cmd = makeNode(ListTablesCmd);
+ cmd->slotname = $3;
+ cmd->options = $4;
+ $$ = (Node *) cmd;
+ }
+ ;
+
%%
#include "repl_scanner.c"
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index f83ec53..69d6e86 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -98,6 +98,9 @@ PHYSICAL { return K_PHYSICAL; }
RESERVE_WAL { return K_RESERVE_WAL; }
LOGICAL { return K_LOGICAL; }
SLOT { return K_SLOT; }
+LIST_TABLES { return K_LIST_TABLES; }
+COPY_TABLE { return K_COPY_TABLE; }
+TABLE { return K_TABLE; }
"," { return ','; }
";" { return ';'; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index a0dba19..def88d3 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -43,6 +43,7 @@
#include <signal.h>
#include <unistd.h>
+#include "access/relscan.h"
#include "access/timeline.h"
#include "access/transam.h"
#include "access/xact.h"
@@ -932,7 +933,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
pq_endmessage(&buf);
/*
- * release active status again, START_REPLICATION will reacquire it
+ * release active status again, subsequent commands will reacquire it
*/
ReplicationSlotRelease();
}
@@ -1035,6 +1036,176 @@ StartLogicalReplication(StartReplicationCmd *cmd)
}
/*
+ * Handle LIST_TABLES command.
+ */
+static void
+SendTableList(ListTablesCmd *cmd)
+{
+ List *tables;
+ ListCell *lc;
+ StringInfoData buf;
+
+ /* make sure that our requirements are still fulfilled */
+ CheckLogicalDecodingRequirements();
+
+ Assert(!MyReplicationSlot);
+
+ ReplicationSlotAcquire(cmd->slotname);
+
+ /* Initialize the decoding context for table copy. */
+ logical_decoding_ctx = CreateCopyDecodingContext(cmd->options,
+ WalSndPrepareWrite,
+ WalSndWriteData);
+
+ /* Send a RowDescription message */
+ pq_beginmessage(&buf, 'T');
+ pq_sendint(&buf, 3, 2); /* 3 fields */
+
+ /* first field: namespace name */
+ pq_sendstring(&buf, "nspname"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ /* second field: relation name */
+ pq_sendstring(&buf, "relname"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ /* third field: freeform relation info (the only NULLable field) */
+ pq_sendstring(&buf, "info"); /* col name */
+ pq_sendint(&buf, 0, 4); /* table oid */
+ pq_sendint(&buf, 0, 2); /* attnum */
+ pq_sendint(&buf, TEXTOID, 4); /* type oid */
+ pq_sendint(&buf, -1, 2); /* typlen */
+ pq_sendint(&buf, 0, 4); /* typmod */
+ pq_sendint(&buf, 0, 2); /* format code */
+
+ pq_endmessage(&buf);
+
+ /* Let the decoding contex send the list. */
+ tables = DecodingContextGetTableList(logical_decoding_ctx);
+
+ /* Send the table list as tuples. */
+ foreach(lc, tables)
+ {
+ LogicalRepTableListEntry *entry = lfirst(lc);
+ Size len;
+
+ Assert(entry->nspname != NULL);
+ Assert(entry->relname != NULL);
+
+ /* Send a DataRow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint(&buf, 3, 2); /* # of columns */
+
+ /* namespace name */
+ len = strlen(entry->nspname);
+ pq_sendint(&buf, len, 4); /* col1 len */
+ pq_sendbytes(&buf, entry->nspname, len);
+
+ /* relation name name */
+ len = strlen(entry->relname);
+ pq_sendint(&buf, len, 4); /* col1 len */
+ pq_sendbytes(&buf, entry->relname, len);
+
+ /* relation info, or NULL if none */
+ if (entry->info != NULL)
+ {
+ len = strlen(entry->info);
+ pq_sendint(&buf, len, 4);
+ pq_sendbytes(&buf, entry->info, len);
+ }
+ else
+ pq_sendint(&buf, -1, 4);
+
+ pq_endmessage(&buf);
+ }
+
+ /* Clean up the logical decoding context. */
+ FreeDecodingContext(logical_decoding_ctx);
+
+ ReplicationSlotRelease();
+}
+
+/*
+ * LogicalDecodingContext 'write' callback.
+ *
+ * Actually write out data previously prepared by WalSndPrepareWrite out
+ * to the network.
+ */
+static void
+CopyTableWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
+ bool last_write)
+{
+ /* output previously gathered data in a CopyData packet */
+ if (pq_putmessage('d', ctx->out->data, ctx->out->len))
+ ereport(ERROR,
+ (errmsg("copy table could not send data, aborting")));
+}
+
+/*
+ * Handle OPY_TABLE command.
+ */
+static void
+CopyTable(CopyTableCmd *cmd)
+{
+ StringInfoData buf;
+ Relation rel;
+ HeapScanDesc scandesc;
+ HeapTuple tup;
+
+ /* make sure that our requirements are still fulfilled */
+ CheckLogicalDecodingRequirements();
+
+ Assert(!MyReplicationSlot);
+
+ ReplicationSlotAcquire(cmd->slotname);
+
+ WalSndSetState(WALSNDSTATE_BACKUP);
+
+ /* Send a CopyBothResponse message, and start streaming */
+ pq_beginmessage(&buf, 'W');
+ pq_sendbyte(&buf, 0);
+ pq_sendint(&buf, 0, 2);
+ pq_endmessage(&buf);
+ pq_flush();
+
+ /* Initialize the decoding context for table copy. */
+ logical_decoding_ctx = CreateCopyDecodingContext(cmd->options,
+ WalSndPrepareWrite,
+ CopyTableWriteData);
+
+ /* Open the relation and start the scan. */
+ rel = heap_openrv(cmd->relation, AccessShareLock);
+ scandesc = heap_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+
+ /* Scan the whole table and pass the rows to the decoding context. */
+ while (HeapTupleIsValid(tup = heap_getnext(scandesc,
+ ForwardScanDirection)))
+ DecodingContextProccessTuple(logical_decoding_ctx, rel, tup);
+
+ /* Close the scan and relation. */
+ heap_endscan(scandesc);
+ heap_close(rel, AccessShareLock);
+
+ /* Send CopyDone */
+ pq_putemptymessage('c');
+
+ FreeDecodingContext(logical_decoding_ctx);
+
+ ReplicationSlotRelease();
+}
+
+
+/*
* LogicalDecodingContext 'prepare_write' callback.
*
* Prepare a write into a StringInfo.
@@ -1299,14 +1470,6 @@ exec_replication_command(const char *cmd_string)
ereport(log_replication_commands ? LOG : DEBUG1,
(errmsg("received replication command: %s", cmd_string)));
- /*
- * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
- * command arrives. Clean up the old stuff if there's anything.
- */
- SnapBuildClearExportedSnapshot();
-
- CHECK_FOR_INTERRUPTS();
-
cmd_context = AllocSetContextCreate(CurrentMemoryContext,
"Replication command context",
ALLOCSET_DEFAULT_MINSIZE,
@@ -1324,6 +1487,16 @@ exec_replication_command(const char *cmd_string)
cmd_node = replication_parse_result;
+ /*
+ * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
+ * command arrives. Clean up the old stuff if there's anything unless
+ * the command currently being executed needs the snapshot.
+ */
+ if (cmd_node->type != T_ListTablesCmd && cmd_node->type != T_CopyTableCmd)
+ SnapBuildClearExportedSnapshot();
+
+ CHECK_FOR_INTERRUPTS();
+
switch (cmd_node->type)
{
case T_IdentifySystemCmd:
@@ -1357,6 +1530,14 @@ exec_replication_command(const char *cmd_string)
SendTimeLineHistory((TimeLineHistoryCmd *) cmd_node);
break;
+ case T_ListTablesCmd:
+ SendTableList((ListTablesCmd *) cmd_node);
+ break;
+
+ case T_CopyTableCmd:
+ CopyTable((CopyTableCmd *) cmd_node);
+ break;
+
default:
elog(ERROR, "unrecognized replication command node tag: %u",
cmd_node->type);
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 03c8916..c6e7207 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -60,6 +60,7 @@
#include "catalog/pg_replication_origin.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_subscription.h"
+#include "catalog/pg_subscription_rel.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_transform.h"
#include "catalog/pg_ts_config.h"
@@ -736,6 +737,28 @@ static const struct cachedesc cacheinfo[] = {
},
4
},
+ {SubscriptionRelRelationId, /* SUBSCRIPTIONRELOID */
+ SubscriptionRelOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0
+ },
+ 64
+ },
+ {SubscriptionRelRelationId, /* SUBSCRIPTIONRELMAP */
+ SubscriptionRelMapIndexId,
+ 2,
+ {
+ Anum_pg_subscription_rel_subrelid,
+ Anum_pg_subscription_rel_subid,
+ 0,
+ 0
+ },
+ 64
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 86e2939..02ebd12 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -337,6 +337,12 @@ DECLARE_UNIQUE_INDEX(pg_subscription_oid_index, 6114, on pg_subscription using b
DECLARE_UNIQUE_INDEX(pg_subscription_subname_index, 6115, on pg_subscription using btree(subname name_ops));
#define SubscriptionNameIndexId 6115
+DECLARE_UNIQUE_INDEX(pg_subscription_rel_oid_index, 6116, on pg_subscription_rel using btree(oid oid_ops));
+#define SubscriptionRelOidIndexId 6116
+
+DECLARE_UNIQUE_INDEX(pg_subscription_rel_map_index, 6117, on pg_subscription_rel using btree(relid oid_ops, subid oid_ops));
+#define SubscriptionRelMapIndexId 6117
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
new file mode 100644
index 0000000..300ba17
--- /dev/null
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -0,0 +1,61 @@
+/* -------------------------------------------------------------------------
+ *
+ * pg_subscription_rel.h
+ * Local info about tables that come from the provider of a
+ * subscription (pg_subscription_rel).
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * -------------------------------------------------------------------------
+ */
+#ifndef PG_SUBSCRIPTION_REL_H
+#define PG_SUBSCRIPTION_REL_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_subscription_rel definition. cpp turns this into
+ * typedef struct FormData_pg_subscription_rel
+ * ----------------
+ */
+#define SubscriptionRelRelationId 6102
+#define SubscriptionRelRelation_Rowtype_Id 6103
+
+/* Workaround for genbki not knowing about XLogRecPtr */
+#define pg_lsn XLogRecPtr
+
+CATALOG(pg_subscription_rel,6102) BKI_ROWTYPE_OID(6103)
+{
+ Oid subid; /* Oid of subscription */
+ Oid relid; /* Oid of relation */
+ char substate; /* state of the relation in subscription */
+ pg_lsn sublsn; /* remote lsn of the state change
+ * used for synchronization coordination */
+} FormData_pg_subscription_rel;
+
+typedef FormData_pg_subscription_rel *Form_pg_subscription_rel;
+
+/* ----------------
+ * compiler constants for pg_subscription_rel
+ * ----------------
+ */
+#define Natts_pg_subscription_rel 4
+#define Anum_pg_subscription_rel_subid 1
+#define Anum_pg_subscription_rel_subrelid 2
+#define Anum_pg_subscription_rel_substate 3
+#define Anum_pg_subscription_rel_sublsn 4
+
+/* ----------------
+ * substate constants
+ * ----------------
+ */
+#define SUBREL_STATE_UNKNOWN '\0' /* unknown state (sublsn NULL) */
+#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
+#define SUBREL_STATE_DATA 'd' /* data copy (sublsn NULL) */
+#define SUBREL_STATE_SYNCWAIT 'w' /* waiting for sync (sublsn set) */
+#define SUBREL_STATE_CATCHUP 'c' /* catchup (sublsn set) */
+#define SUBREL_STATE_SYNCDONE 's' /* synced (sublsn set) */
+#define SUBREL_STATE_READY 'r' /* ready (sublsn NULL) */
+
+#endif /* PG_SUBSCRIPTION_REL_H */
diff --git a/src/include/commands/replicationcmds.h b/src/include/commands/replicationcmds.h
index 7c35d72..a9895f6 100644
--- a/src/include/commands/replicationcmds.h
+++ b/src/include/commands/replicationcmds.h
@@ -26,5 +26,6 @@ extern void RemovePublicationRelById(Oid prid);
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt);
extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
extern void DropSubscriptionById(Oid subid);
+extern void DropSubscriptionRelById(Oid subrelid);
#endif /* REPLICATIONCMDS_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 322286b..21128a1 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -466,6 +466,8 @@ typedef enum NodeTag
T_DropReplicationSlotCmd,
T_StartReplicationCmd,
T_TimeLineHistoryCmd,
+ T_ListTablesCmd,
+ T_CopyTableCmd,
/*
* TAGS FOR RANDOM OTHER STUFF
@@ -475,7 +477,7 @@ typedef enum NodeTag
* purposes (usually because they are involved in APIs where we want to
* pass multiple object types through the same pointer).
*/
- T_TriggerData = 950, /* in commands/trigger.h */
+ T_TriggerData = 970, /* in commands/trigger.h */
T_EventTriggerData, /* in commands/event_trigger.h */
T_ReturnSetInfo, /* in nodes/execnodes.h */
T_WindowObjectData, /* private in nodeWindowAgg.c */
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index d2f1edb..b8180c1 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -95,4 +95,27 @@ typedef struct TimeLineHistoryCmd
TimeLineID timeline;
} TimeLineHistoryCmd;
+/* ----------------------
+ * LIST_TABLES command
+ * ----------------------
+ */
+typedef struct ListTablesCmd
+{
+ NodeTag type;
+ char *slotname;
+ List *options;
+} ListTablesCmd;
+
+/* ----------------------
+ * COPY_TABLE command
+ * ----------------------
+ */
+typedef struct CopyTableCmd
+{
+ NodeTag type;
+ char *slotname;
+ struct RangeVar *relation;
+ List *options;
+} CopyTableCmd;
+
#endif /* REPLNODES_H */
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 947000e..a356b43 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -31,9 +31,11 @@ typedef struct LogicalDecodingContext
/* memory context this is all allocated in */
MemoryContext context;
- /* infrastructure pieces */
- XLogReaderState *reader;
+ /* The associated replication slot */
ReplicationSlot *slot;
+
+ /* infrastructure pieces for decoding */
+ XLogReaderState *reader;
struct ReorderBuffer *reorder;
struct SnapBuild *snapshot_builder;
@@ -75,6 +77,14 @@ typedef struct LogicalDecodingContext
TransactionId write_xid;
} LogicalDecodingContext;
+
+/* Entry used for listing tables by logical decoding plugin. */
+typedef struct LogicalRepTableListEntry {
+ char *nspname;
+ char *relname;
+ char *info;
+} LogicalRepTableListEntry;
+
extern void CheckLogicalDecodingRequirements(void);
extern LogicalDecodingContext *CreateInitDecodingContext(char *plugin,
@@ -92,6 +102,14 @@ extern void DecodingContextFindStartpoint(LogicalDecodingContext *ctx);
extern bool DecodingContextReady(LogicalDecodingContext *ctx);
extern void FreeDecodingContext(LogicalDecodingContext *ctx);
+extern LogicalDecodingContext *CreateCopyDecodingContext(
+ List *output_plugin_options,
+ LogicalOutputPluginWriterPrepareWrite prepare_write,
+ LogicalOutputPluginWriterWrite do_write);
+extern void DecodingContextProccessTuple(LogicalDecodingContext *ctx,
+ Relation rel, HeapTuple tup);
+extern List *DecodingContextGetTableList(LogicalDecodingContext *ctx);
+
extern void LogicalIncreaseXminForSlot(XLogRecPtr lsn, TransactionId xmin);
extern void LogicalIncreaseRestartDecodingForSlot(XLogRecPtr current_lsn,
XLogRecPtr restart_lsn);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index b69d015..2491cc7 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -69,8 +69,9 @@ extern LogicalRepRelId logicalrep_read_update(StringInfo in, bool *hasoldtup,
extern void logicalrep_write_delete(StringInfo out, Relation rel,
HeapTuple oldtuple);
extern LogicalRepRelId logicalrep_read_delete(StringInfo in, LogicalRepTupleData *oldtup);
+extern void logicalrep_write_rel_name(StringInfo out, char *nspname, char *relname);
extern void logicalrep_write_rel(StringInfo out, Relation rel);
-
+extern void logicalrep_read_rel_name(StringInfo in, char **nspname, char **relname);
extern LogicalRepRelation *logicalrep_read_rel(StringInfo in);
#endif /* LOGICALREP_PROTO_H */
diff --git a/src/include/replication/logicalworker.h b/src/include/replication/logicalworker.h
index 64f36d3..6327067 100644
--- a/src/include/replication/logicalworker.h
+++ b/src/include/replication/logicalworker.h
@@ -22,20 +22,27 @@ typedef struct LogicalRepWorker
/* Subscription id for the worker. */
Oid subid;
+
+ /* Used for initial table synchronization. */
+ Oid relid;
} LogicalRepWorker;
extern int max_logical_replication_workers;
-extern LogicalRepWorker *MyLogicalRepWorker;
extern void ApplyLauncherMain(Datum main_arg);
extern void ApplyWorkerMain(Datum main_arg);
extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
+extern void ApplyLauncherWakeup(void);
+extern void ApplyLauncherWakeupOnCommit(void);
extern void ApplyLauncherWakeupOnCommit(void);
extern void ApplyLauncherWakeup(void);
extern void logicalrep_worker_attach(int slot);
+extern LogicalRepWorker *logicalrep_worker_find(Oid subid, Oid relid);
+extern int logicalrep_worker_count(Oid subid);
+extern void logicalrep_worker_launch(Oid dbid, Oid subid, Oid relid);
#endif /* LOGICALWORKER_H */
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 7911cc0..04ab611 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -93,6 +93,22 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (
RepOriginId origin_id);
/*
+ * Called from the LIST_TABLES replication command.
+ */
+typedef List *(*BaseCopyListTablesCB) (
+ struct LogicalDecodingContext *
+);
+
+/*
+ * Called for every individual tuple in a table during COPY_TABLE.
+ */
+typedef void (*BaseCopyTupleCB) (
+ struct LogicalDecodingContext *,
+ Relation relation,
+ HeapTuple tup
+);
+
+/*
* Called to shutdown an output plugin.
*/
typedef void (*LogicalDecodeShutdownCB) (
@@ -111,6 +127,8 @@ typedef struct OutputPluginCallbacks
LogicalDecodeMessageCB message_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
+ BaseCopyListTablesCB list_tables_cb;
+ BaseCopyTupleCB tuple_cb;
} OutputPluginCallbacks;
void OutputPluginPrepareWrite(struct LogicalDecodingContext *ctx, bool last_write);
diff --git a/src/include/replication/publication.h b/src/include/replication/publication.h
index 08245ee..78b4eb4 100644
--- a/src/include/replication/publication.h
+++ b/src/include/replication/publication.h
@@ -35,6 +35,7 @@ typedef struct Publication
extern Publication *GetPublication(Oid pubid);
extern Publication *GetPublicationByName(const char *pubname, bool missing_ok);
extern List *GetRelationPublications(Relation rel);
+extern List * GetPublicationRelations(Oid pubid);
extern bool publication_change_is_replicated(Relation rel,
PublicationChangeType change_type,
diff --git a/src/include/replication/subscription.h b/src/include/replication/subscription.h
index a937f4b..18504da 100644
--- a/src/include/replication/subscription.h
+++ b/src/include/replication/subscription.h
@@ -30,4 +30,10 @@ typedef struct Subscription
extern Subscription *GetSubscription(Oid subid);
extern Oid get_subscription_oid(const char *subname, bool missing_ok);
+extern Oid SetSubscriptionRelState(Oid subid, Oid relid, char state,
+ XLogRecPtr sublsn);
+extern char GetSubscriptionRelState(Oid subid, Oid relid,
+ XLogRecPtr *sublsn, bool missing_ok);
+extern void DropSubscriptionRelById(Oid subrelid);
+
#endif /* SUBSCRIPTION_H */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 3801949..d99cd72 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -169,6 +169,13 @@ typedef int (*walrcvconn_receive_fn) (WalReceiverConnHandle *handle,
char **buffer, pgsocket *wait_fd);
typedef void (*walrcvconn_send_fn) (WalReceiverConnHandle *handle,
const char *buffer, int nbytes);
+typedef List *(*walrcvconn_list_tables_fn) (
+ WalReceiverConnHandle *handle,
+ char *slotname, char *options);
+typedef bool (*walrcvconn_copy_table_fn) (
+ WalReceiverConnHandle *handle,
+ char *slotname, char *nspname,
+ char *relname, char *options);
typedef void (*walrcvconn_disconnect_fn) (WalReceiverConnHandle *handle);
typedef struct WalReceiverConnAPI {
@@ -183,6 +190,8 @@ typedef struct WalReceiverConnAPI {
walrcvconn_endstreaming_fn endstreaming;
walrcvconn_receive_fn receive;
walrcvconn_send_fn send;
+ walrcvconn_list_tables_fn list_tables;
+ walrcvconn_copy_table_fn copy_table;
walrcvconn_disconnect_fn disconnect;
} WalReceiverConnAPI;
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
new file mode 100644
index 0000000..bd4402b
--- /dev/null
+++ b/src/include/replication/worker_internal.h
@@ -0,0 +1,32 @@
+/*-------------------------------------------------------------------------
+ *
+ * worker_internal.h
+ * Internal headers shared by logical replication workers.
+ *
+ * Portions Copyright (c) 2010-2016, PostgreSQL Global Development Group
+ *
+ * src/include/replication/worker_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WORKER_INTERNAL_H
+#define WORKER_INTERNAL_H
+
+/* filled by libpqreceiver when loaded */
+extern struct WalReceiverConnAPI *wrcapi;
+extern struct WalReceiverConnHandle *wrchandle;
+
+/* Worker and subscription objects. */
+extern Subscription *MySubscription;
+extern LogicalRepWorker *MyLogicalRepWorker;
+
+extern bool in_remote_transaction;
+
+extern void LogicalRepApplyLoop(XLogRecPtr last_received);
+extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+void process_syncing_tables(char *slotname, XLogRecPtr end_lsn);
+void invalidate_syncing_table_states(Datum arg, int cacheid,
+ uint32 hashvalue);
+
+
+#endif /* WORKER_INTERNAL_H */
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index b1d03a5..5349c81 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -87,6 +87,8 @@ enum SysCacheIdentifier
STATRELATTINH,
SUBSCRIPTIONOID,
SUBSCRIPTIONNAME,
+ SUBSCRIPTIONRELOID,
+ SUBSCRIPTIONRELMAP,
TABLESPACEOID,
TRFOID,
TRFTYPELANG,
diff --git a/src/test/Makefile b/src/test/Makefile
index 7f7754f..8e8527a 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
top_builddir = ../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = regress isolation modules recovery
+SUBDIRS = regress isolation modules recovery subscription
# We don't build or execute examples/, locale/, or thread/ by default,
# but we do want "make clean" etc to recurse into them. Likewise for ssl/,
diff --git a/src/test/README b/src/test/README
index 62395e7..74bab09 100644
--- a/src/test/README
+++ b/src/test/README
@@ -37,5 +37,8 @@ regress/
ssl/
Tests to exercise and verify SSL certificate handling
+subscription/
+ Test suite for subscriptions and logical replication
+
thread/
A thread-safety-testing utility used by configure
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index ceac2c8..9e402f6 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -132,6 +132,7 @@ pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
pg_subscription|t
+pg_subscription_rel|t
pg_tablespace|t
pg_transform|t
pg_trigger|t
diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
index dca19c4..f8121d9 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 3;
+use Test::More tests => 4;
# Initialize provider node
my $node_provider = get_new_node('provider');
@@ -19,7 +19,7 @@ $node_subscriber->start;
$node_provider->safe_psql('postgres',
"CREATE TABLE tab_notrep AS SELECT generate_series(1,10) AS a");
$node_provider->safe_psql('postgres',
- "CREATE TABLE tab_ins (a int)");
+ "CREATE TABLE tab_ins AS SELECT generate_series(1,1002) AS a");
$node_provider->safe_psql('postgres',
"CREATE TABLE tab_rep (a int primary key)");
@@ -45,18 +45,28 @@ $node_provider->safe_psql('postgres',
$node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION tap_sub WITH CONNECTION '$provider_connstr' PUBLICATION tap_pub, tap_pub_ins_only");
-# Wait for subscriber to finish table sync
+# Wait for subscriber to finish initialization
my $appname = 'tap_sub';
my $caughtup_query =
"SELECT pg_current_xlog_location() <= write_location FROM pg_stat_replication WHERE application_name = '$appname';";
$node_provider->poll_query_until('postgres', $caughtup_query)
or die "Timed out while waiting for subscriber to catch up";
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE substate != 'r';";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
my $result =
$node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_notrep");
print "node_subscriber: $result\n";
is($result, qq(0), 'check non-replicated table is empty on subscriber');
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_ins");
+print "node_subscriber: $result\n";
+is($result, qq(1002), 'check initial data was copied to subscriber');
$node_provider->safe_psql('postgres',
"INSERT INTO tab_ins SELECT generate_series(1,50)");
@@ -78,7 +88,7 @@ $node_provider->poll_query_until('postgres', $caughtup_query)
$result =
$node_subscriber->safe_psql('postgres', "SELECT count(*), min(a), max(a) FROM tab_ins");
print "node_subscriber: $result\n";
-is($result, qq(50|1|50), 'check replicated inserts on subscriber');
+is($result, qq(1052|1|1002), 'check replicated inserts on subscriber');
$result =
$node_subscriber->safe_psql('postgres', "SELECT count(*), min(a), max(a) FROM tab_rep");
diff --git a/src/test/subscription/t/002_types.pl b/src/test/subscription/t/002_types.pl
index a126201..a9c8526 100644
--- a/src/test/subscription/t/002_types.pl
+++ b/src/test/subscription/t/002_types.pl
@@ -101,13 +101,19 @@ $node_provider->safe_psql('postgres',
$node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION tap_sub WITH CONNECTION '$provider_connstr' PUBLICATION tap_pub");
-# Wait for subscriber to finish table sync
+# Wait for subscriber to finish initialization
my $appname = 'tap_sub';
my $caughtup_query =
"SELECT pg_current_xlog_location() <= write_location FROM pg_stat_replication WHERE application_name = '$appname';";
$node_provider->poll_query_until('postgres', $caughtup_query)
or die "Timed out while waiting for subscriber to catch up";
+# Wait for initial sync to finish as well
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE substate != 'r';";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
# Insert initial test data
$node_provider->safe_psql('postgres', qq(
-- test_tbl_one_array_col
--
2.7.4
On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
as promised here is WIP version of logical replication patch.
Yay!
I'm about to head out for a week of, desperately needed, holidays, but
after that I plan to spend a fair amount of time helping to review
etc. this.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5 August 2016 at 16:22, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
as promised here is WIP version of logical replication patch.
Yay!
Yay2
I'm about to head out for a week of, desperately needed, holidays, but
after that I plan to spend a fair amount of time helping to review
etc. this.
Have a good one.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Aug 6, 2016 at 2:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 5 August 2016 at 16:22, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
as promised here is WIP version of logical replication patch.
Yay!
Yay2
Thank you for working on this!
I've applied these patches to current HEAD, but got the following error.
libpqwalreceiver.c:48: error: redefinition of typedef ‘WalReceiverConnHandle’
../../../../src/include/replication/walreceiver.h:137: note: previous
declaration of ‘WalReceiverConnHandle’ was here
make[2]: *** [libpqwalreceiver.o] Error 1
make[1]: *** [install-backend/replication/libpqwalreceiver-recurse] Error 2
make: *** [install-src-recurse] Error 2
After fixed this issue with attached patch, I used logical replication a little.
Some random comments and questions.
The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?
IMO the number of logical replication connections should not be
limited by max_worker_processes.
--
We need to set the publication up by at least CREATE PUBLICATION and
ALTER PUBLICATION command.
Can we make CREATE PUBLICATION possible to define tables as well?
For example,
CREATE PUBLICATION mypub [ TABLE table_name, ...] [WITH options]
--
This patch can not drop the subscription.
=# drop subscription sub;
ERROR: unrecognized object class: 6102
--
+/*-------------------------------------------------------------------------
+ *
+ * proto.c
+ * logical replication protocol functions
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
The copyright of added files are old.
And this patch has some whitespace problems.
Please run "git show --check" or "git diff origin/master --check"
Regards,
--
Masahiko Sawada
Attachments:
fix_compile_error.patchapplication/x-patch; name=fix_compile_error.patchDownload
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 94648c7..e4aaba4 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -40,12 +40,12 @@
PG_MODULE_MAGIC;
-typedef struct WalReceiverConnHandle {
+struct WalReceiverConnHandle {
/* Current connection to the primary, if any */
PGconn *streamConn;
/* Buffer for currently read records */
char *recvBuf;
-} WalReceiverConnHandle;
+};
PGDLLEXPORT WalReceiverConnHandle *_PG_walreceirver_conn_init(WalReceiverConnAPI *wrcapi);
On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?
I don't think so. The checkpointer, walwriter, autovacuum, etc predate
bgworkers. I strongly suspect that if they were to be implemented now
they'd use bgworkers.
Now, perhaps we want a new bgworker "kind" for system workers or some other
minor tweaks. But basically I think bgworkers are exactly what we should be
using here.
IMO the number of logical replication connections should not be
limited by max_worker_processes.
Well, they *are* worker processes... but I take your point, that that
setting has been "number of bgworkers the user can run" and it might not be
expected that logical replication would use the same space.
max_worker_progresses isn't just a limit, it controls how many shmem slots
we allocate.
I guess we could have a separate max_logical_workers or something, but I'm
inclined to think that adds complexity without really making things any
nicer. We'd just add them together to decide how many shmem slots to
allocate and we'd have to keep track of how many slots were used by which
types of backend. Or create a near-duplicate of the bgworker facility for
logical rep.
Sure, you can go deeper down the rabbit hole here and say that we need to
add bgworker "categories" with reserved pools of worker slots for each
category. But do we really need that?
max_connections includes everything, both system and user backends. It's
not like we don't do this elsewhere. It's at worst a mild wart.
The only argument I can see for not using bgworkers is for the supervisor
worker. It's a singleton that launches the per-database workers, and
arguably is a job that the postmaster could do better. The current design
there stems from its origins as an extension. Maybe worker management could
be simplified a bit as a result. I'd really rather not invent yet another
new and mostly duplicate category of custom workers to achieve that though.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Aug 9, 2016 at 5:13 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?I don't think so. The checkpointer, walwriter, autovacuum, etc predate
bgworkers. I strongly suspect that if they were to be implemented now they'd
use bgworkers.
+1. We could always get them now under the umbrella of the bgworker
infrastructure if this cleans up some code duplication.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/08/16 09:59, Masahiko Sawada wrote:
On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
as promised here is WIP version of logical replication patch.
Thank you for working on this!
Thanks for looking!
I've applied these patches to current HEAD, but got the following error.
libpqwalreceiver.c:48: error: redefinition of typedef ‘WalReceiverConnHandle’
../../../../src/include/replication/walreceiver.h:137: note: previous
declaration of ‘WalReceiverConnHandle’ was here
make[2]: *** [libpqwalreceiver.o] Error 1
make[1]: *** [install-backend/replication/libpqwalreceiver-recurse] Error 2
make: *** [install-src-recurse] Error 2After fixed this issue with attached patch, I used logical replication a little.
Some random comments and questions.
Interesting, my compiler does have problem. Will investigate.
The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?
IMO the number of logical replication connections should not be
limited by max_worker_processes.
What Craig said reflects my rationale for doing this pretty well.
We need to set the publication up by at least CREATE PUBLICATION and
ALTER PUBLICATION command.
Can we make CREATE PUBLICATION possible to define tables as well?
For example,
CREATE PUBLICATION mypub [ TABLE table_name, ...] [WITH options]
Agreed, that just didn't make it to the first cut to -hackers. We've
been also thinking of having special ALL TABLES parameter there that
would encompass whole db.
--
This patch can not drop the subscription.=# drop subscription sub;
ERROR: unrecognized object class: 6102
Yeah that's because of the patch 0006, I didn't finish all the
dependency tracking for the pg_subscription_rel catalog that it adds
(which is why I called it PoC). I expect to have this working in next
version (there is still quite a bit of polish work needed in general).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/08/16 10:13, Craig Ringer wrote:
On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com
<mailto:sawada.mshk@gmail.com>> wrote:The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?I don't think so. The checkpointer, walwriter, autovacuum, etc predate
bgworkers. I strongly suspect that if they were to be implemented now
they'd use bgworkers.Now, perhaps we want a new bgworker "kind" for system workers or some
other minor tweaks. But basically I think bgworkers are exactly what we
should be using here.
Agreed.
IMO the number of logical replication connections should not be
limited by max_worker_processes.Well, they *are* worker processes... but I take your point, that that
setting has been "number of bgworkers the user can run" and it might not
be expected that logical replication would use the same space.
Again agree, I think we should ultimately go towards what PeterE
suggested in
/messages/by-id/a2fffd92-6e59-a4eb-dd85-c5865ebca1a0@2ndquadrant.com
The only argument I can see for not using bgworkers is for the
supervisor worker. It's a singleton that launches the per-database
workers, and arguably is a job that the postmaster could do better. The
current design there stems from its origins as an extension. Maybe
worker management could be simplified a bit as a result. I'd really
rather not invent yet another new and mostly duplicate category of
custom workers to achieve that though.
It is simplified compared to pglogical (there is only 2 worker types not
3). I don't think it's job of postmaster to scan catalogs however so it
can't really start workers for logical replication. I actually modeled
it more after autovacuum (using bgworkers though) than the original
extension.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 9, 2016 at 5:13 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?I don't think so. The checkpointer, walwriter, autovacuum, etc predate
bgworkers. I strongly suspect that if they were to be implemented now they'd
use bgworkers.Now, perhaps we want a new bgworker "kind" for system workers or some other
minor tweaks. But basically I think bgworkers are exactly what we should be
using here.
I understood. Thanks!
IMO the number of logical replication connections should not be
limited by max_worker_processes.Well, they *are* worker processes... but I take your point, that that
setting has been "number of bgworkers the user can run" and it might not be
expected that logical replication would use the same space.max_worker_progresses isn't just a limit, it controls how many shmem slots
we allocate.I guess we could have a separate max_logical_workers or something, but I'm
inclined to think that adds complexity without really making things any
nicer. We'd just add them together to decide how many shmem slots to
allocate and we'd have to keep track of how many slots were used by which
types of backend. Or create a near-duplicate of the bgworker facility for
logical rep.Sure, you can go deeper down the rabbit hole here and say that we need to
add bgworker "categories" with reserved pools of worker slots for each
category. But do we really need that?
If we change these processes to bgworker, we can categorize them into
two, auxiliary process(check pointer and wal sender etc) and other
worker process.
And max_worker_processes controls the latter.
max_connections includes everything, both system and user backends. It's not
like we don't do this elsewhere. It's at worst a mild wart.The only argument I can see for not using bgworkers is for the supervisor
worker. It's a singleton that launches the per-database workers, and
arguably is a job that the postmaster could do better. The current design
there stems from its origins as an extension. Maybe worker management could
be simplified a bit as a result. I'd really rather not invent yet another
new and mostly duplicate category of custom workers to achieve that though.
Regards,
--
Masahiko Sawada
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9 August 2016 at 17:28, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Sure, you can go deeper down the rabbit hole here and say that we need to
add bgworker "categories" with reserved pools of worker slots for each
category. But do we really need that?If we change these processes to bgworker, we can categorize them into
two, auxiliary process(check pointer and wal sender etc) and other
worker process.
And max_worker_processes controls the latter.
Right. I think that's probably the direction we should be going eventually.
Personally I don't think such a change should block the logical replication
work from proceeding with bgworkers, though. It's been delayed a long time,
a lot of people want it, and I think we need to focus on meeting the core
requirements not getting too sidetracked on minor points.
Of course, everyone's idea of what's core and what's a minor sidetrack
differs ;)
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 09/08/16 12:16, Craig Ringer wrote:
On 9 August 2016 at 17:28, Masahiko Sawada <sawada.mshk@gmail.com
<mailto:sawada.mshk@gmail.com>> wrote:Sure, you can go deeper down the rabbit hole here and say that we need to
add bgworker "categories" with reserved pools of worker slots for each
category. But do we really need that?If we change these processes to bgworker, we can categorize them into
two, auxiliary process(check pointer and wal sender etc) and other
worker process.
And max_worker_processes controls the latter.Right. I think that's probably the direction we should be going
eventually. Personally I don't think such a change should block the
logical replication work from proceeding with bgworkers, though. It's
been delayed a long time, a lot of people want it, and I think we need
to focus on meeting the core requirements not getting too sidetracked on
minor points.Of course, everyone's idea of what's core and what's a minor sidetrack
differs ;)
Yeah that's why I added local max GUC that just handles the logical
worker limit within the max_worker_processes. I didn't want to also
write generic framework for managing the max workers using tags or
something as part of this, it's big enough as it is and we can always
move the limit to the more generic place once we have it.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Petr Jelinek wrote:
On 09/08/16 12:16, Craig Ringer wrote:
Right. I think that's probably the direction we should be going
eventually. Personally I don't think such a change should block the
logical replication work from proceeding with bgworkers, though.Yeah that's why I added local max GUC that just handles the logical worker
limit within the max_worker_processes. I didn't want to also write generic
framework for managing the max workers using tags or something as part of
this, it's big enough as it is and we can always move the limit to the more
generic place once we have it.
Parallel query does exactly that: the workers are allocated from the
bgworkers array, and if you want more, it's on you to increase that
limit (it doesn't even have the GUC for a maximum). As far as logical
replication and parallel query are concerned, that's fine. We can
improve this later, if it proves to be a problem.
I think there are far more pressing matters to review.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Petr Jelinek wrote:
On 09/08/16 10:13, Craig Ringer wrote:
The only argument I can see for not using bgworkers is for the
supervisor worker. It's a singleton that launches the per-database
workers, and arguably is a job that the postmaster could do better. The
current design there stems from its origins as an extension. Maybe
worker management could be simplified a bit as a result. I'd really
rather not invent yet another new and mostly duplicate category of
custom workers to achieve that though.It is simplified compared to pglogical (there is only 2 worker types not 3).
I don't think it's job of postmaster to scan catalogs however so it can't
really start workers for logical replication. I actually modeled it more
after autovacuum (using bgworkers though) than the original extension.
Yeah, it's a very bad idea to put postmaster on this task. We should
definitely stay away from that.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 05 Aug 2016, at 18:00, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi,
as promised here is WIP version of logical replication patch.
Great!
Proposed DDL about publication/subscriptions looks very nice to me.
Some notes and thoughts about patch:
* Clang grumbles at following pieces of code:
apply.c:1316:6: warning: variable 'origin_startpos' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
tablesync.c:436:45: warning: if statement has empty body [-Wempty-body]
if (wait_for_sync_status_change(tstate));
* max_logical_replication_workers mentioned everywhere in docs, but guc.c defines
variable called max_logical_replication_processes for postgresql.conf
* Since pg_subscription already shared across the cluster, it can be also handy to
share pg_publications too and allow publication of tables from different databases. That
is rare scenarios but quite important for virtual hosting use case — tons of small databases
in a single postgres cluster.
* There is no way to see attachecd tables/schemas to publication through \drp
* As far as I understand there is no way to add table/tablespace right in CREATE
PUBLICATION and one need explicitly do ALTER PUBLICATION right after creation.
May be add something like WITH TABLE/TABLESPACE to CREATE?
* So binary protocol goes into core. Is it still possible to use it as decoding plugin for
manually created walsender? May be also include json as it was in pglogical? While
i’m not arguing that it should be done, i’m interested about your opinion on that.
* Also I’ve noted that you got rid of reserved byte (flags) in protocol comparing to
pglogical_native. It was very handy to use it for two phase tx decoding (0 — usual
commit, 1 — prepare, 2 — commit prepared), because both prepare and commit
prepared generates commit record in xlog.
On 05 Aug 2016, at 18:00, Petr Jelinek <petr@2ndquadrant.com> wrote:
- DDL, I see several approaches we could do here for 10.0. a) don't
deal with DDL at all yet, b) provide function which pushes the DDL
into replication queue and then executes on downstream (like
londiste, slony, pglogical do), c) capture the DDL query as text
and allow user defined function to be called with that DDL text on
the subscriber
* Since here DDL is mostly ALTER / CREATE / DROP TABLE (or am I wrong?) may be
we can add something like WITH SUBSCRIBERS to statements?
* Talking about exact mechanism of DDL replication I like you variant b), but since we
have transactional DDL, we can do two phase commit here. That will require two phase
decoding and some logic about catching prepare responses through logical messages. If that
approach sounds interesting i can describe proposal in more details and create a patch.
* Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP
tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate
that more and write again.
* As far as I understand sync starts automatically on enabling publication. May be split that
logic into a different command with some options? Like don’t sync at all for example.
* When I’m trying to create subscription to non-existent publication, CREATE SUBSRITION
creates replication slot and do not destroys it:
# create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication mypub;
NOTICE: created replication slot "sub" on provider
ERROR: could not receive list of replicated tables from the provider: ERROR: cache lookup failed for publication 0
CONTEXT: slot "sub", output plugin "pgoutput", in the list_tables callback
after that:
postgres=# drop subscription sub;
ERROR: subscription "sub" does not exist
postgres=# create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication pub;
ERROR: could not crate replication slot "sub": ERROR: replication slot "sub" already exists
* Also can’t drop subscription:
postgres=# \drs
List of subscriptions
Name | Database | Enabled | Publication | Conninfo
------+----------+---------+-------------+--------------------------------
sub | postgres | t | {mypub} | host=127.0.0.1 dbname=postgres
(1 row)
postgres=# drop subscription sub;
ERROR: unrecognized object class: 6102
* Several time i’ve run in a situation where provider's postmaster ignores Ctrl-C until subscribed
node is switched off.
* Patch with small typos fixed attached.
I’ll do more testing, just want to share what i have so far.
Attachments:
typos.diffapplication/octet-stream; name=typos.diffDownload
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 3179add..f57068c 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -52,7 +52,7 @@
</listitem>
<listitem>
<para>
- Replicating between different major versions of the PostgreSQL
+ Replicating between different major versions of the PostgreSQL.
</para>
</listitem>
<listitem>
@@ -325,7 +325,7 @@
<programlisting>
wal_level = logical
max_worker_processes = 10 # one per subscription + one per instance needed on subscriber
-max_logical_replication_workers = 10 # one per subscription + one per instance needed on subscriber
+max_logical_replication_processes = 10 # one per subscription + one per instance needed on subscriber
max_replication_slots = 10 # one per subscription needed both provider and subscriber
max_wal_senders = 10 # one per subscription needed on provider
</programlisting>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index bfef492..0100d43 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -254,7 +254,7 @@ CreateSubscription(CreateSubscriptionStmt *stmt)
wrchandle = walrcvconn_init(wrcapi);
if (wrcapi->connect == NULL ||
wrcapi->create_slot == NULL)
- elog(ERROR, "libpqwalreceiver didn't initialize correctly");
+ elog(ERROR, "libpqwalreceiver didn't initialized correctly");
/*
* Create the replication slot on remote side for our newly created
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index def88d3..cc60582 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1152,7 +1152,7 @@ CopyTableWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xi
}
/*
- * Handle OPY_TABLE command.
+ * Handle COPY_TABLE command.
*/
static void
CopyTable(CopyTableCmd *cmd)Hi,
On 11/08/16 13:34, Stas Kelvich wrote:
* max_logical_replication_workers mentioned everywhere in docs, but guc.c defines
variable called max_logical_replication_processes for postgresql.conf
Ah changed it in code but not in docs, will fix.
* Since pg_subscription already shared across the cluster, it can be also handy to
share pg_publications too and allow publication of tables from different databases. That
is rare scenarios but quite important for virtual hosting use case � tons of small databases
in a single postgres cluster.
You can't decode changes from multiple databases in one slot so I don't
see the usefulness there. The pg_subscription is currently shared
because it's technical necessity (as in I don't see how to solve the
need to access the catalog from launcher in any other way) not because I
think it's great design :)
* There is no way to see attachecd tables/schemas to publication through \drp
That's mostly intentional as publications for table are visible in \d,
but I am not against adding it to \drp.
* As far as I understand there is no way to add table/tablespace right in CREATE
PUBLICATION and one need explicitly do ALTER PUBLICATION right after creation.
May be add something like WITH TABLE/TABLESPACE to CREATE?
Yes, as I said to Masahiko Sawada, it's just not there yet but I plan to
have that.
* So binary protocol goes into core. Is it still possible to use it as decoding plugin for
manually created walsender? May be also include json as it was in pglogical? While
i�m not arguing that it should be done, i�m interested about your opinion on that.
Well the plugin is bit more integrated into the publication infra so if
somebody would want to use it directly they'd have to use that part as
well. OTOH the protocol itself is provided as API so it's reusable by
other plugins if needed.
JSON plugin is something that would be nice to have in core as well, but
I don't think it's part of this patch.
* Also I�ve noted that you got rid of reserved byte (flags) in protocol comparing to
pglogical_native. It was very handy to use it for two phase tx decoding (0 � usual
commit, 1 � prepare, 2 � commit prepared), because both prepare and commit
prepared generates commit record in xlog.
Hmm maybe commit message could get it back. PGLogical has them sprinkled
all around the protocol which I don't really like so I want to limit
them to the places where they are actually useful.
On 05 Aug 2016, at 18:00, Petr Jelinek <petr@2ndquadrant.com> wrote:
- DDL, I see several approaches we could do here for 10.0. a) don't
deal with DDL at all yet, b) provide function which pushes the DDL
into replication queue and then executes on downstream (like
londiste, slony, pglogical do), c) capture the DDL query as text
and allow user defined function to be called with that DDL text on
the subscriber* Since here DDL is mostly ALTER / CREATE / DROP TABLE (or am I wrong?) may be
we can add something like WITH SUBSCRIBERS to statements?
Not sure I follow. How does that help?
* Talking about exact mechanism of DDL replication I like you variant b), but since we
have transactional DDL, we can do two phase commit here. That will require two phase
decoding and some logic about catching prepare responses through logical messages. If that
approach sounds interesting i can describe proposal in more details and create a patch.
I'd think that such approach is somewhat more interesting with c)
honestly. The difference between b) and c) is mostly about explicit vs
implicit. I definitely would like to see the 2PC patch updated to work
with this. But maybe it's wise to wait a while until the core of the
patch stabilizes during the discussion.
* Also I wasn�t able actually to run replication itself =) While regression tests passes, TAP
tests and manual run stuck. pg_subscription_rel.substate never becomes �r�. I�ll investigate
that more and write again.
Interesting, please keep me posted. It's possible for tables to stay in
's' state for some time if there is nothing happening on the server, but
that should not mean anything is stuck.
* As far as I understand sync starts automatically on enabling publication. May be split that
logic into a different command with some options? Like don�t sync at all for example.
I think SYNC should be option of subscription creation just like
INITIALLY ENABLED/DISABLED is. And then there should be interface to
resync a table manually (like pglogical has). Not yet sure how that
interface should look like in terms of DDL though.
* When I�m trying to create subscription to non-existent publication, CREATE SUBSRITION
creates replication slot and do not destroys it:# create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication mypub;
NOTICE: created replication slot "sub" on provider
ERROR: could not receive list of replicated tables from the provider: ERROR: cache lookup failed for publication 0
CONTEXT: slot "sub", output plugin "pgoutput", in the list_tables callbackafter that:
postgres=# drop subscription sub;
ERROR: subscription "sub" does not exist
postgres=# create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication pub;
ERROR: could not crate replication slot "sub": ERROR: replication slot "sub" already exists
See the TODO in CreateSubscription function :)
* Also can�t drop subscription:
postgres=# \drs
List of subscriptions
Name | Database | Enabled | Publication | Conninfo
------+----------+---------+-------------+--------------------------------
sub | postgres | t | {mypub} | host=127.0.0.1 dbname=postgres
(1 row)postgres=# drop subscription sub;
ERROR: unrecognized object class: 6102
Yes that has been already reported.
* Several time i�ve run in a situation where provider's postmaster ignores Ctrl-C until subscribed
node is switched off.
Hmm I guess there is bug in signal processing code somewhere.
* Patch with small typos fixed attached.
I�ll do more testing, just want to share what i have so far.
Thanks for both.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/05/2016 11:00 AM, Petr Jelinek wrote:
Hi,
as promised here is WIP version of logical replication patch.
Thanks for keeping on this. This is important work
Feedback is welcome.
+<sect1 id="logical-replication-publication">
+ <title>Publication</title>
+ <para>
+ A Publication object can be defined on any master node, owned by one
+ user. A Publication is a set of changes generated from a group of
+ tables, and might also be described as a Change Set or Replication Set.
+ Each Publication exists in only one database.
'A publication object can be defined on *any master node*'. I found
this confusing the first time I read it because I thought it was
circular (what makes a node a 'master' node? Having a publication object
published from it?). On reflection I realized that you mean ' any
*physical replication master*'. I think this might be better worded as
'A publication object can be defined on any node other than a standby
node'. I think referring to 'master' in the context of logical
replication might confuse people.
I am raising this in the context of the larger terminology that we want
to use and potential confusion with the terminology we use for physical
replication. I like the publication / subscription terminology you've
gone with.
<para>
+ Publications are different from table schema and do not affect
+ how the table is accessed. Each table can be added to multiple
+ Publications if needed. Publications may include both tables
+ and materialized views. Objects must be added explicitly, except
+ when a Publication is created for "ALL TABLES". There is no
+ default name for a Publication which specifies all tables.
+ </para>
+ <para>
+ The Publication is different from table schema, it does not affect
+ how the table is accessed and each table can be added to multiple
Those 2 paragraphs seem to start the same way. I get the feeling that
there is some point your trying to express that I'm not catching onto.
Of course a publication is different than a tables schema, or different
than a function.
The definition of publication you have on the CREATE PUBLICATION page
seems better and should be repeated here (A publication is essentially a
group of tables intended for managing logical replication. See Section
30.1 <cid:part1.06040100.08080900@ssinger.info> for details about how
publications fit into logical replication setup. )
+ <para>
+ Conflicts happen when the replicated changes is breaking any
+ specified constraints (with the exception of foreign keys which are
+ not checked). Currently conflicts are not resolved automatically and
+ cause replication to be stopped with an error until the conflict is
+ manually resolved.
What options are there for manually resolving conflicts? Is the only
option to change the data on the subscriber to avoid the conflict?
I assume there isn't a way to flag a particular row coming from the
publisher and say ignore it. I don't think this is something we need to
support for the first version.
<sect1 id="logical-replication-architecture">
+ <title>Architecture</title>
+ <para>
+ Logical replication starts by copying a snapshot of the data on
+ the Provider database. Once that is done, the changes on Provider
I notice the user of 'Provider' above do you intend to update that to
'Publisher' or does provider mean something different. If we like the
'publication' terminology then I think 'publishers' should publish them
not providers.
I'm trying to test a basic subscription and I do the following
I did the following:
cluster 1:
create database test1;
create table a(id serial8 primary key,b text);
create publication testpub1;
alter publication testpub1 add table a;
insert into a(b) values ('1');
cluster2
create database test1;
create table a(id serial8 primary key,b text);
create subscription testsub2 publication testpub1 connection
'host=localhost port=5440 dbname=test1';
NOTICE: created replication slot "testsub2" on provider
NOTICE: synchronized table states
CREATE SUBSCRIPTION
This resulted in
LOG: logical decoding found consistent point at 0/15625E0
DETAIL: There are no running transactions.
LOG: exported logical decoding snapshot: "00000494-1" with 0
transaction IDs
LOG: logical replication apply for subscription testsub2 started
LOG: starting logical decoding for slot "testsub2"
DETAIL: streaming transactions committing after 0/1562618, reading WAL
from 0/15625E0
LOG: logical decoding found consistent point at 0/15625E0
DETAIL: There are no running transactions.
LOG: logical replication sync for subscription testsub2, table a started
LOG: logical decoding found consistent point at 0/1562640
DETAIL: There are no running transactions.
LOG: exported logical decoding snapshot: "00000495-1" with 0
transaction IDs
LOG: logical replication synchronization worker finished processing
The initial sync completed okay, then I did
insert into a(b) values ('2');
but the second insert never replicated.
I had the following output
LOG: terminating walsender process due to replication timeout
On cluster 1 I do
select * FROM pg_stat_replication;
pid | usesysid | usename | application_name | client_addr |
client_hostname | client_port | backend_start |
backend_xmin | state | sent_location | write_location | flush_location |
replay_location | sync_priority | sy
nc_state
-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+-
-------------+-------+---------------+----------------+----------------+-----------------+---------------+---
---------
(0 rows)
If I then kill the cluster2 postmaster, I have to do a -9 or it won't die
I get
LOG: worker process: logical replication worker 16396 sync 16387 (PID
3677) exited with exit code 1
WARNING: could not launch logical replication worker
LOG: logical replication sync for subscription testsub2, table a started
ERROR: replication slot "testsub2_sync_a" does not exist
ERROR: could not start WAL streaming: ERROR: replication slot
"testsub2_sync_a" does not exist
I'm not really sure what I need to do to debug this, I suspect the
worker on cluster2 is having some issue.
[1]
/messages/by-id/CANP8+j+NMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_=-HA@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 13/08/16 17:34, Steve Singer wrote:
On 08/05/2016 11:00 AM, Petr Jelinek wrote:
Hi,
as promised here is WIP version of logical replication patch.
Thanks for keeping on this. This is important work
Feedback is welcome.
+<sect1 id="logical-replication-publication"> + <title>Publication</title> + <para> + A Publication object can be defined on any master node, owned by one + user. A Publication is a set of changes generated from a group of + tables, and might also be described as a Change Set or Replication Set. + Each Publication exists in only one database.'A publication object can be defined on *any master node*'. I found
this confusing the first time I read it because I thought it was
circular (what makes a node a 'master' node? Having a publication object
published from it?). On reflection I realized that you mean ' any
*physical replication master*'. I think this might be better worded as
'A publication object can be defined on any node other than a standby
node'. I think referring to 'master' in the context of logical
replication might confuse people.
Makes sense to me.
I am raising this in the context of the larger terminology that we want
to use and potential confusion with the terminology we use for physical
replication. I like the publication / subscription terminology you've
gone with.<para> + Publications are different from table schema and do not affect + how the table is accessed. Each table can be added to multiple + Publications if needed. Publications may include both tables + and materialized views. Objects must be added explicitly, except + when a Publication is created for "ALL TABLES". There is no + default name for a Publication which specifies all tables. + </para> + <para> + The Publication is different from table schema, it does not affect + how the table is accessed and each table can be added to multipleThose 2 paragraphs seem to start the same way. I get the feeling that
there is some point your trying to express that I'm not catching onto.
Of course a publication is different than a tables schema, or different
than a function.
Ah that's relic of some editorialization, will fix. The reason why we
think it's important to mention the difference between publication and
schema is that they are the only objects that contain tables but they
affect them in very different ways which might confuse users.
The definition of publication you have on the CREATE PUBLICATION page
seems better and should be repeated here (A publication is essentially a
group of tables intended for managing logical replication. See Section
30.1 <cid:part1.06040100.08080900@ssinger.info> for details about how
publications fit into logical replication setup. )+ <para> + Conflicts happen when the replicated changes is breaking any + specified constraints (with the exception of foreign keys which are + not checked). Currently conflicts are not resolved automatically and + cause replication to be stopped with an error until the conflict is + manually resolved.What options are there for manually resolving conflicts? Is the only
option to change the data on the subscriber to avoid the conflict?
I assume there isn't a way to flag a particular row coming from the
publisher and say ignore it. I don't think this is something we need to
support for the first version.
Yes you have to update data on subscriber or skip the the replication of
whole transaction (for which the UI is not very friendly currently as
you either have to consume the transaction
pg_logical_slot_get_binary_changes or by moving origin on subscriber
using pg_replication_origin_advance).
It's relatively easy to add some automatic conflict resolution as well,
but it didn't seem absolutely necessary so I didn't do it for the
initial version.
<sect1 id="logical-replication-architecture"> + <title>Architecture</title> + <para> + Logical replication starts by copying a snapshot of the data on + the Provider database. Once that is done, the changes on ProviderI notice the user of 'Provider' above do you intend to update that to
'Publisher' or does provider mean something different. If we like the
'publication' terminology then I think 'publishers' should publish them
not providers.
Okay, I am just used to 'provider' in general (I guess londiste habit),
but 'publisher' is fine as well.
I'm trying to test a basic subscription and I do the following
I did the following:
cluster 1:
create database test1;
create table a(id serial8 primary key,b text);
create publication testpub1;
alter publication testpub1 add table a;
insert into a(b) values ('1');cluster2
create database test1;
create table a(id serial8 primary key,b text);
create subscription testsub2 publication testpub1 connection
'host=localhost port=5440 dbname=test1';
NOTICE: created replication slot "testsub2" on provider
NOTICE: synchronized table states
CREATE SUBSCRIPTION[...]
The initial sync completed okay, then I did
insert into a(b) values ('2');
but the second insert never replicated.
I had the following output
LOG: terminating walsender process due to replication timeout
On cluster 1 I do
select * FROM pg_stat_replication;
pid | usesysid | usename | application_name | client_addr |
client_hostname | client_port | backend_start |
backend_xmin | state | sent_location | write_location | flush_location |
replay_location | sync_priority | sy
nc_state
-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+--------------+-------+---------------+----------------+----------------+-----------------+---------------+---
---------
(0 rows)If I then kill the cluster2 postmaster, I have to do a -9 or it won't die
That might explain why it didn't replicate. The wait loops in apply
worker clearly need some work. Thanks for the report.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11 Aug 2016, at 17:43, Petr Jelinek <petr@2ndquadrant.com> wrote:
* Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP
tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate
that more and write again.Interesting, please keep me posted. It's possible for tables to stay in 's' state for some time if there is nothing happening on the server, but that should not mean anything is stuck.
Slightly played around, it seems that apply worker waits forever for substate change.
(lldb) bt
* thread #1: tid = 0x183e00, 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10
frame #1: 0x00000001017ca8a3 postgres`WaitEventSetWaitBlock(set=0x00007fd2dc816b30, cur_timeout=10000, occurred_events=0x00007fff5e7f67d8, nevents=1) + 51 at latch.c:1108
frame #2: 0x00000001017ca438 postgres`WaitEventSetWait(set=0x00007fd2dc816b30, timeout=10000, occurred_events=0x00007fff5e7f67d8, nevents=1) + 248 at latch.c:941
frame #3: 0x00000001017c9fde postgres`WaitLatchOrSocket(latch=0x000000010ab208a4, wakeEvents=25, sock=-1, timeout=10000) + 254 at latch.c:347
frame #4: 0x00000001017c9eda postgres`WaitLatch(latch=0x000000010ab208a4, wakeEvents=25, timeout=10000) + 42 at latch.c:302
* frame #5: 0x0000000101793352 postgres`wait_for_sync_status_change(tstate=0x0000000101e409b0) + 178 at tablesync.c:228
frame #6: 0x0000000101792bbe postgres`process_syncing_tables_apply(slotname="subbi", end_lsn=140734778796592) + 430 at tablesync.c:436
frame #7: 0x00000001017928c1 postgres`process_syncing_tables(slotname="subbi", end_lsn=140734778796592) + 81 at tablesync.c:518
frame #8: 0x000000010177b620 postgres`LogicalRepApplyLoop(last_received=140734778796592) + 704 at apply.c:1122
frame #9: 0x000000010177bef4 postgres`ApplyWorkerMain(main_arg=0) + 1044 at apply.c:1353
frame #10: 0x000000010174cb5a postgres`StartBackgroundWorker + 826 at bgworker.c:729
frame #11: 0x0000000101762227 postgres`do_start_bgworker(rw=0x00007fd2db700000) + 343 at postmaster.c:5553
frame #12: 0x000000010175d42b postgres`maybe_start_bgworker + 427 at postmaster.c:5761
frame #13: 0x000000010175bccf postgres`sigusr1_handler(postgres_signal_arg=30) + 383 at postmaster.c:4979
frame #14: 0x00007fff9ab2352a libsystem_platform.dylib`_sigtramp + 26
frame #15: 0x00007fff88c7e07b libsystem_kernel.dylib`__select + 11
frame #16: 0x000000010175d5ac postgres`ServerLoop + 252 at postmaster.c:1665
frame #17: 0x000000010175b2e0 postgres`PostmasterMain(argc=3, argv=0x00007fd2db403840) + 5968 at postmaster.c:1309
frame #18: 0x000000010169507f postgres`main(argc=3, argv=0x00007fd2db403840) + 751 at main.c:228
frame #19: 0x00007fff8d45c5ad libdyld.dylib`start + 1
(lldb) p state
(char) $1 = 'c'
(lldb) p tstate->state
(char) $2 = ‘c’
Also I’ve noted that some lsn position looks wrong on publisher:
postgres=# select restart_lsn, confirmed_flush_lsn from pg_replication_slots;
restart_lsn | confirmed_flush_lsn
-------------+---------------------
0/1530EF8 | 7FFF/5E7F6A30
(1 row)
postgres=# select sent_location, write_location, flush_location, replay_location from pg_stat_replication;
sent_location | write_location | flush_location | replay_location
---------------+----------------+----------------+-----------------
0/1530F30 | 7FFF/5E7F6A30 | 7FFF/5E7F6A30 | 7FFF/5E7F6A30
(1 row)
--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15/08/16 15:51, Stas Kelvich wrote:
On 11 Aug 2016, at 17:43, Petr Jelinek <petr@2ndquadrant.com> wrote:
* Also I wasn�t able actually to run replication itself =) While regression tests passes, TAP
tests and manual run stuck. pg_subscription_rel.substate never becomes �r�. I�ll investigate
that more and write again.Interesting, please keep me posted. It's possible for tables to stay in 's' state for some time if there is nothing happening on the server, but that should not mean anything is stuck.
Slightly played around, it seems that apply worker waits forever for substate change.
(lldb) bt
* thread #1: tid = 0x183e00, 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10
frame #1: 0x00000001017ca8a3 postgres`WaitEventSetWaitBlock(set=0x00007fd2dc816b30, cur_timeout=10000, occurred_events=0x00007fff5e7f67d8, nevents=1) + 51 at latch.c:1108
frame #2: 0x00000001017ca438 postgres`WaitEventSetWait(set=0x00007fd2dc816b30, timeout=10000, occurred_events=0x00007fff5e7f67d8, nevents=1) + 248 at latch.c:941
frame #3: 0x00000001017c9fde postgres`WaitLatchOrSocket(latch=0x000000010ab208a4, wakeEvents=25, sock=-1, timeout=10000) + 254 at latch.c:347
frame #4: 0x00000001017c9eda postgres`WaitLatch(latch=0x000000010ab208a4, wakeEvents=25, timeout=10000) + 42 at latch.c:302
* frame #5: 0x0000000101793352 postgres`wait_for_sync_status_change(tstate=0x0000000101e409b0) + 178 at tablesync.c:228
frame #6: 0x0000000101792bbe postgres`process_syncing_tables_apply(slotname="subbi", end_lsn=140734778796592) + 430 at tablesync.c:436
frame #7: 0x00000001017928c1 postgres`process_syncing_tables(slotname="subbi", end_lsn=140734778796592) + 81 at tablesync.c:518
frame #8: 0x000000010177b620 postgres`LogicalRepApplyLoop(last_received=140734778796592) + 704 at apply.c:1122
frame #9: 0x000000010177bef4 postgres`ApplyWorkerMain(main_arg=0) + 1044 at apply.c:1353
frame #10: 0x000000010174cb5a postgres`StartBackgroundWorker + 826 at bgworker.c:729
frame #11: 0x0000000101762227 postgres`do_start_bgworker(rw=0x00007fd2db700000) + 343 at postmaster.c:5553
frame #12: 0x000000010175d42b postgres`maybe_start_bgworker + 427 at postmaster.c:5761
frame #13: 0x000000010175bccf postgres`sigusr1_handler(postgres_signal_arg=30) + 383 at postmaster.c:4979
frame #14: 0x00007fff9ab2352a libsystem_platform.dylib`_sigtramp + 26
frame #15: 0x00007fff88c7e07b libsystem_kernel.dylib`__select + 11
frame #16: 0x000000010175d5ac postgres`ServerLoop + 252 at postmaster.c:1665
frame #17: 0x000000010175b2e0 postgres`PostmasterMain(argc=3, argv=0x00007fd2db403840) + 5968 at postmaster.c:1309
frame #18: 0x000000010169507f postgres`main(argc=3, argv=0x00007fd2db403840) + 751 at main.c:228
frame #19: 0x00007fff8d45c5ad libdyld.dylib`start + 1
(lldb) p state
(char) $1 = 'c'
(lldb) p tstate->state
(char) $2 = �c�
Hmm, not sure why is that, it might be related to the lsn reported being
wrong. Could you check what is the lsn there (either in tstate or or in
pg_subscription_rel)? Especially in comparison with what the
sent_location is.
Also I�ve noted that some lsn position looks wrong on publisher:
postgres=# select restart_lsn, confirmed_flush_lsn from pg_replication_slots;
restart_lsn | confirmed_flush_lsn
-------------+---------------------
0/1530EF8 | 7FFF/5E7F6A30
(1 row)postgres=# select sent_location, write_location, flush_location, replay_location from pg_stat_replication;
sent_location | write_location | flush_location | replay_location
---------------+----------------+----------------+-----------------
0/1530F30 | 7FFF/5E7F6A30 | 7FFF/5E7F6A30 | 7FFF/5E7F6A30
(1 row)
That's most likely result of the unitialized origin_startpos warning. I
am working on new version of patch where that part is fixed, if you want
to check this before I send it in, the patch looks like this:
diff --git a/src/backend/replication/logical/apply.c
b/src/backend/replication/logical/apply.c
index 581299e..7a9e775 100644
--- a/src/backend/replication/logical/apply.c
+++ b/src/backend/replication/logical/apply.c
@@ -1353,6 +1353,7 @@ ApplyWorkerMain(Datum main_arg)
originid = replorigin_by_name(myslotname, false);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ origin_startpos = replorigin_session_get_progress(false);
CommitTransactionCommand();
wrcapi->connect(wrchandle, MySubscription->conninfo, true,
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi all,
attaching updated version of the patch. Still very much WIP but it's
slowly getting there.
Changes since last time:
- Mostly rewrote publication handling in pgoutput which brings a)
ability to add FOR ALL TABLES publications, b) performs better (no need
to syscache lookup for every change like before), c) does correct
invalidation of publications on DDL
- added FOR TABLE and FOR ALL TABLES clause to both CREATE PUBLICATION
and ALTER PUBLICATION so that one can create publication directly with
table list, the FOR TABLE in ALTER PUBLICATION behaves like SET
operation (removes existing, adds new ones)
- fixed several issues with initial table synchronization (most of which
have been reported here)
- added pg_stat_subscription monitoring view
- updated docs to reflect all the changes, also removed the stuff that's
only planned from the docs (there is copy of the planned stuff docs in
the neighboring thread so no need to keep it in the patch)
- added documentation improvements suggested by Steve Singer and removed
the capitalization in the main doc
- added pg_dump support
- improved psql support (\drp+ shows list of tables)
- added flags to COMMIT message in the protocol so that we can add 2PC
support in the future
- fixed DROP SUBSCRIPTION issues and added tests for it
I decided to not deal with ACLs so far, assuming superuser/replication
role for now. We can always make it less restrictive later by adding the
grantable privileges.
FDW support is still TODO. I think TRUNCATE will have to be solved as
part of other DDL in the future. I do have some ideas what to do with
DDL but I don't plan to implement them in the initial patch.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0003-Define-logical-replication-protocol-and-output-plugi.patch.gzapplication/gzip; name=0003-Define-logical-replication-protocol-and-output-plugi.patch.gzDownload
���W 0003-Define-logical-replication-protocol-and-output-plugi.patch �=iw�F���_������C�e���"K��YY��t�y�<.4)�A A��M��VUw���,�I��ML�U]wU_�7Q0e{���m��?���h�{�9������cwm~���}�����|��!�>e;;��lwg�[h��kG�[��>����O'������W�3+��Y���l�}�� �=|��>�y��w ]�������9���dp���9����1�e^0qm�c=��@Rq`�|�I&1�d���z���3'�;"��&S��{�����+{��e�Uu���Y�"���P��X�)��e���1��(���"�>�n-&��v~����*F�c'R�� ){{��Ha����_���~��5P
�����+�1���^��*�H������-�N�C�����x�o-��&��=x�\_�q���&<c������3;������������z]����Agz�<�Tys�,W�Z����u��Y�5qcfuD������;�{f?;t�������8�4q�����������7�0���w�����3;�N!� f�6c����;� wIyX^guv�=�����1}e��rC�R2m��xU�f��{������d=u� �1b���r�����Z��(�VW0�:\��;��o�D��[��e��9�BX���^p�DlE��hF������^�'7�����E��d���<��&���]\
�&v�X6�F�l��L9t�x%� �����Gsq���00��(
�Q%��h2�T���%���G�o�S7v?��1b
ogb4%RP/�Km�E����lBO����=��I�J���%L�����N<�����V��z=W�� z�?�G3����T���~D@���1�#�*�1x�?�Lai��$�`Lo���������+RD" � �����1�B�)���+��};p@����dvI��1�e�S�� ��0��,r]h�7�@D��t�������q�� D�SQ�Y2�3B��(�H��Q#�6W���lC��5�����h_nT�m6��+nD��8��8�3����S<2��.�0*�W�����)��A����u@��O=fD���N��-/
�4�R�J�Y/��G�2YH�T��(rz��@S+�u^l�$X�����@���?�*�F|����o����xr��1����kv�|4)Y��D v��"�3o�<�M]|(��O)�+��2�
�.�v�����C)�I�Kr8�E��lx���� I�����G{^$p*>AGK���v#�t�-G��\J\�z[X��m��
@�5��]�4��0�!O�I�L'H�
g�b��
�TLMf�}aH@����!S��Z������$�������3�s,a
��#���> �?�
���C ���t��;�A�w{M>�2�p5~���G�F��3S�1�B��!�������=��������l�F���MB�>t������|��E��eO���B(NB<����@���"w����M"�!�"�MKRC�M�eojWEN�8&���|�m����:�
;�|�uBW�y�f��,���&�-���4 Y�� ����v�8��(D���`Np����dY!��1C
C�t<.C�%�:F������M)�����`������1��D����������\@�G��Rs��*���V�>UH���
������\J_�����@���#`
��@0[�]����l+{vQG5����w����9 �6e��r!����exK���K_o�t��JB�fxq��;����%��J������l�U~����B�=���h�j�[����/�M,Z�2�E�2/=����:����rk,�t~�)��`c��-9t����p����<8:�%?����xN�A��XV�,} b~�L���m�I-��
��� U�A%��H��F�P�k<MI�5��hLE&,.���\��B�[6'�������>-�`��F��)R9����B�x�,���4[�a��� ~b�0�6��������E��f�}, ������#�*�.,h�hz�Ig���9e��,J��~�S�����aS� ���n���'�KP��_O����,�eM�G�on�T�-������k�9����
.����g�/H�*�0������M����� ��shR�r�2�~���N��\���]�pI��' ����o>��������C��_�S}y�+9��P����G���d^�eE�1i�ym��*����E�44[i7�������G���O����������<��<�E�I��-zim��w�8�B�!�*�.C��������\�7&k�R��1�t���&�K���IT����*��X#�>�}��v����Z(�������N�Lqs
��/���3�#��?��������CnEr_�\�������g2}<�1�F^`X��G:o<k""�4$�<